C#中的HTTP操作

C#中的HTTP操作

System.Net.Http 命名空间提供用于现代 HTTP 应用程序的编程接口,可以用来开发桌面应用程序。如果要进行网络信息收集分析也离不开HTTP操作。最常用的操作就是发送GET和POST请求,以及POST上传内容。

发送GET请求

GET请求是最常用的访问方式,是直接通过URL发送的。如果有参数,则在url最后以?key1=value1&key2=value2&key3=value3这样的格式传递。

首先引入命名空间 using System.Net; 。接下来先要设计一个函数CreateGetHttpResponse()用来获取GET请求的HTTP响应。GET方法的参数写在URL中,所以如果有参数我们将之加在URL之后,然后创建请求。如果未能成功连接,就返回null,成功连接则返回HttpResponse。

class HttpFunctions
{
public static HttpWebResponse CreateGetHttpResponse(
    string url,
    IDictionary<string, string> parameters=null,
    string token = null)

{
    string urlWithGetParam = url;
    if (!(parameters == null || parameters.Count == 0))
    {
        StringBuilder buffer = new StringBuilder();
        int i = 0;
        foreach (string key in parameters.Keys)
        {
            if (i > 0)
            {
                urlWithGetParam += String.Format("&{0}={1}", key, parameters[key]);
            }
            else
            {
                urlWithGetParam += String.Format("?{0}={1}", key, parameters[key]);
                i++;
            }
        }
    }

    HttpWebRequest request = null;
    
    request = WebRequest.Create(urlWithGetParam) as HttpWebRequest;
    
    request.Method = "GET";

    //设置代理UserAgent和超时
    //request.UserAgent = userAgent;
    //request.Timeout = timeout; 

    if (token != null)
    {
        request.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token);
    }
    try
    {
        return request.GetResponse() as HttpWebResponse;
    }
    catch
    {
        return null;
    }
}
 
}

这样获得了HttpResponse。例如我们在主调中,用必应搜索传入q=搜索词的参数,来搜索”C#”这个关键词

class Program
{
    static void Main(string[] args)
    {
        Dictionary<string, string> myParams = new Dictionary<string, string>();
        myParams["q"] = "C#";
        var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams);
        Console.WriteLine(h1.StatusCode);
    }
}

可以看到状态码为200,说明HTTP响应正常。

这样有了HttpResponse,但还没能获取到HTTP的内容。要获取HTTP的内容,最好使用流式传输。在HttpFunctions类当中设计一个读取stream中文本的函数:

public static string responseText(HttpWebResponse h)
{
    if(h!=null)
    {
        System.IO.Stream receiveStream = h.GetResponseStream();
        Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
        System.IO.StreamReader readStream = new System.IO.StreamReader(receiveStream, encode);
        Char[] read = new Char[256];
        // Reads 256 characters at a time.    
        int count = readStream.Read(read, 0, 256);
        string ret = "";
        while (count > 0)
        {
            String str = new String(read, 0, count);
            ret += str;
            count = readStream.Read(read, 0, 256);
        }
        readStream.Close();
        return ret;
    }
    else
    {
        return "";
    }
}

我们请求的是HTML页面,内容往往会很多,我们在主调函数中测试可以只看一下前100个字符:

class Program
{
    static void Main(string[] args)
    {
        Dictionary<string, string> myParams = new Dictionary<string, string>();
        myParams["q"] = "C#";
        var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams);
        Console.WriteLine(h1.StatusCode);
        Console.WriteLine(HttpFunctions.responseText(h1).Substring(0,100));
        h1.Dispose();
    }
}

输出了一行HTML结果,说明也成功读取到了HTML的内容

<!DOCTYPE html><html lang="zh" xml:lang="zh" xmlns="http://www.w3.org/1999/xhtml" xmlns:Web="http://

发送POST请求进行登录操作

URL传递有长度限制,如果数据比较长,则使用GET方法传参数就不合适了。 GET方法明文传输,并且可以被缓存,其安全性较差。像登录系统这种敏感数据的操作不适合使用GET方法。而POST请求将数据与URL分离,在传输数据时更常用。

在使用POST方法提交数据到服务端时,有多种编码供选择,默认是application/x-www-form-urlencoded,此时所有非字母数字类型的字符都需要转换为十六进制的ASCII值。但是如果表单中包含大量非字母数字时,这种编码的效率就非常低,比如处理二进制文件上传时就存在该问题,此时就需要定义multipart/form-data作为编码类型,使用这种类型时不会对输入进行编码,而是使用MIME协议将之作为多个部分进行发送,和邮件传输的标准相同。

例如网址 https://www.runoob.com/try/ajax/demo_post2.php 接收POST方法的两个参数fname和lname,将返回一句问候语。创建POST的HttpResponse与GET方法类似,只是参数parameters需要用Stream的方法以字节写入。

public static HttpWebResponse CreatePostHttpResponse(
    string url,
    IDictionary<string, string> parameters=null)
{
    HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;

    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";

    if (!(parameters == null || parameters.Count == 0))
    {
        var paraString = "";
        int i = 0;
        foreach(var it in parameters)
        {
            if(i>0)
            {
                paraString += "&";
            }
            paraString += String.Format("{0}={1}", it.Key,it.Value);
            i += 1;
        }
        byte[] byteArray = Encoding.Default.GetBytes(paraString);
        System.IO.Stream stream1 = request.GetRequestStream();
        stream1.Write(byteArray, 0, byteArray.Length);//写入参数
                stream1.Close();
    }

    try
    {
        return request.GetResponse() as HttpWebResponse;
    }
    catch
    {
        return null;
    }

主函数中调用

Dictionary<string, string> myParams = new Dictionary<string, string>();
string url = "https://www.runoob.com/try/ajax/demo_post2.php";
myParams["fname"] = "Henry";
myParams["lname"] = "Lord";
var h1 = HttpFunctions.CreatePostHttpResponse(url, myParams);
var htContent = HttpFunctions.responseText(h1);
Console.WriteLine(htContent);
h1.Dispose();

则可以看到结果

<p style='color:red;'>你好,Henry Lord,今天过得怎么样?</p>

利用COOKIES模拟登录

接下来为了演示POST请求的方法,将以《用Python编写网络爬虫》一书提供的测试网站 :http://example.webscraping.com/places/default/user/login 为示例,自己注册后,进行模拟登录系统。

进入我们要登入的网站,检查这个登录的表单,可以发现除了图中有的这些条目,还有display:none;的一组数据,这些是网站用来验证用户的,如果C#模拟登录只传输电子邮件和密码,是不能登录成功的,我们需要把form中所有input条目都撮出来。在C#中,我们可以先请求到这个网页,用正则表达式将<input />字段的内容提取出来:

private static List<string> showMatch(string text, string expr)
{
    System.Text.RegularExpressions.MatchCollection mc = System.Text.RegularExpressions.Regex.Matches(text, expr);
    List<string> ret = new List<string>();
    foreach (System.Text.RegularExpressions.Match m in mc)
    {
        ret.Add(m.ToString());
    }
    return ret;
}
<input class="string" id="auth_user_email" name="email" type="text" value="" />
<input class="password" id="auth_user_password" name="password" type="password" value="" />
<input class="boolean" id="auth_user_remember_me" name="remember_me" type="checkbox" value="on" />
<input type="submit" value="Log In" />
<input name="_next" type="hidden" value="/places/default/index" />
<input name="_formkey" type="hidden" value="fe10396d-3a5f-4d8c-b03e-2960a2820cac" />
<input name="_formname" type="hidden" value="login" />

除了type=submit的提交按钮,另外6个name-type键值对则是我们需要post发送的数据。

private static Dictionary<string,string> getinputParameters(string message)
{
    Dictionary<string, string> ret = new Dictionary<string, string>();
    var r = showMatch(message, @"<input .*?/>");
    foreach (var it in r)
    {
        System.Text.RegularExpressions.MatchCollection lineMatchKey =
            System.Text.RegularExpressions.Regex.Matches(it, "name=\\\".*?\\\"");
        if (lineMatchKey.Count > 0)
        {
            System.Text.RegularExpressions.MatchCollection lineMatchValue =
                System.Text.RegularExpressions.Regex.Matches(it, "value=\\\".*?\\\"");
            if (lineMatchValue.Count > 0)
            {
                ret[lineMatchKey[0].ToString().Substring(6, lineMatchKey[0].Length - 7)]
                    = lineMatchValue[0].ToString().Substring(7, lineMatchValue[0].Length - 8);
            }
        }
    }
    return ret;
}

这样得到了所有的参数,就可以发送POST请求了

class Program
{
    static void Main(string[] args)
    {
        Dictionary<string, string> myParams = new Dictionary<string, string>();
        string url = "http://example.webscraping.com/places/default/user/login";
        var h1 = HttpFunctions.CreateGetHttpResponse(url);
        var htContent = HttpFunctions.responseText(h1);
        h1.Dispose();
        myParams = getinputParameters(htContent);
        myParams["email"] = "com0@malic.xyz";
        myParams["password"] = "88888888";
        var h2 = HttpFunctions.CreatePostHttpResponse(url, myParams);
        htContent = HttpFunctions.responseText(h2);
        Console.WriteLine(htContent);
        h2.Dispose();
    }

}

传的参数都是正确的,但是返回的HTML内容却没有登录信息(登录处的HTML仍显示的Log In而不是用户名),这是因为网页会话信息保存在Cookies中,当前的程序并没有为网页的头部信息添加Cookies,这就不能保持登录状态。

针对我们要登录的网站,可以看到它有两个cookie字段。我们就编写函数读取cookie

public static string getCookies(HttpWebResponse h1)
{
    if(h1!=null)
    {
        string ret = "";
        ret += h1.Headers.GetValues("Set-Cookie")[0].Split(';')[0];
        ret += "; ";
        ret += h1.Headers.GetValues("Set-Cookie")[1].Split(';')[0];
        return ret;
    }
    else
    {
        return "";
    }
}

然后在刚才post方法的函数添加参数 string cookies=null,函数中添加

if (cookies != null)
{
	request.Headers.Add(HttpRequestHeader.Cookie, cookies);
}

就能在POST时将COOKIES发送过去。

这里getCookies(HttpWebResponse h1)方法因服务端而异,并不具通用性,换到其它网站上进行操作就要重要分析HTML结构并根据其结构来提取字段。这里只是用于展示添加cookie的方法。

读取图片内容

HTTP的返回结果在C#中都以stream形式展示,例如使用url表示的是服务端的一张图片,我们只需要使用Image.FromStream()即可:

public static void downloadPicture(string url)
{
  var h1 = CreateGetHttpResponse(url);
  if (h1 != null)
  {
    System.Drawing.Image image = System.Drawing.Image.FromStream(h1.GetResponseStream());
    image.Save("download.png",System.Drawing.Imaging.ImageFormat.Png);
  }
}

如果URL正确,就可以看到在程序的路径下保存下了download.png

如果进行的是C#控制台程序设计,那么是没有System.Drawing的,需要手动添加对System.Drawing.Common.dll的引用

上传图片

向服务端发送文件往往采用Content-type为”multipart/form-data”的post方法,添加POST参数时按照 multipart/form-data 类型的规范进行编写。

public static string Sys_uploadStudentPhoto(
	string url,
	string imageName,
	IDictionary<string, string> stringDict)
{
    return HttpPostData(url,
        "file",
        imageName,
        stringDict);
}

private static string HttpPostData(
	string url,  
	string fileKeyName,
	string filePath, 
	IDictionary<string, string> stringDict)
{
    string responseContent;
    var memStream = new MemoryStream();
    var request = (HttpWebRequest)WebRequest.Create(url);
    var boundary = "---------------" + DateTime.Now.Ticks.ToString("x");
    var beginBoundary = Encoding.ASCII.GetBytes("--" + boundary + "\r\n");
    var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
    var endBoundary = Encoding.ASCII.GetBytes("--" + boundary + "--\r\n");

    request.Method = "POST";
    request.ContentType = "multipart/form-data; boundary=" + boundary;

    const string filePartHeader =
        "Content-Disposition: form-data; name=\"{0}\"; filename=\"{1}\"\r\n" +
         "Content-Type: application/octet-stream\r\n\r\n";
    var header = string.Format(filePartHeader, fileKeyName, filePath);
    var headerbytes = Encoding.UTF8.GetBytes(header);

    memStream.Write(beginBoundary, 0, beginBoundary.Length);
    memStream.Write(headerbytes, 0, headerbytes.Length);

    var buffer = new byte[1024];
    int bytesRead; // =0  

    while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0)
    {
        memStream.Write(buffer, 0, bytesRead);
    }

    var stringKeyHeader = "\r\n--" + boundary +
                           "\r\nContent-Disposition: form-data; name=\"{0}\"" +
                           "\r\n\r\n{1}\r\n";

    foreach (byte[] formitembytes in from string key in stringDict.Keys
                                     select string.Format(stringKeyHeader, key, stringDict[key])
                                         into formitem
                                     select Encoding.UTF8.GetBytes(formitem))
    {
        memStream.Write(formitembytes, 0, formitembytes.Length);
    }

    memStream.Write(endBoundary, 0, endBoundary.Length);

    request.ContentLength = memStream.Length;

    var requestStream = request.GetRequestStream();

    memStream.Position = 0;
    var tempBuffer = new byte[memStream.Length];
    memStream.Read(tempBuffer, 0, tempBuffer.Length);
    memStream.Close();
    requestStream.Write(tempBuffer, 0, tempBuffer.Length);
    requestStream.Close();
    responseContent = responseText(request.GetResponse());    
    fileStream.Close();
    httpWebResponse.Close();
    request.Abort();

    return responseContent;
}

发表评论

电子邮件地址不会被公开。 必填项已用*标注