C#中的HTTP操作
[malicTOC]
System.Net.Http 命名空间提供用于现代 HTTP 应用程序的编程接口,可以用来开发桌面应用程序。如果要进行网络信息收集分析也离不开HTTP操作。最常用的操作就是发送GET和POST请求,以及POST上传内容。
发送GET请求
GET请求是最常用的访问方式,是直接通过URL发送的。如果有参数,则在url最后以?key1=value1&key2=value2&key3=value3这样的格式传递。
首先引入命名空间 using System.Net; 。接下来先要设计一个函数CreateGetHttpResponse()用来获取GET请求的HTTP响应。GET方法的参数写在URL中,所以如果有参数我们将之加在URL之后,然后创建请求。如果未能成功连接,就返回null,成功连接则返回HttpResponse。
class HttpFunctions
{
public static HttpWebResponse CreateGetHttpResponse(
string url,
IDictionary<string, string> parameters=null,
string token = null)
{
string urlWithGetParam = url;
if (!(parameters == null || parameters.Count == 0))
{
StringBuilder buffer = new StringBuilder();
int i = 0;
foreach (string key in parameters.Keys)
{
if (i > 0)
{
urlWithGetParam += String.Format("&{0}={1}", key, parameters[key]);
}
else
{
urlWithGetParam += String.Format("?{0}={1}", key, parameters[key]);
i++;
}
}
}
HttpWebRequest request = null;
request = WebRequest.Create(urlWithGetParam) as HttpWebRequest;
request.Method = "GET";
//设置代理UserAgent和超时
//request.UserAgent = userAgent;
//request.Timeout = timeout;
if (token != null)
{
request.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token);
}
try
{
return request.GetResponse() as HttpWebResponse;
}
catch
{
return null;
}
}
}这样获得了HttpResponse。例如我们在主调中,用必应搜索传入q=搜索词的参数,来搜索”C#”这个关键词
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> myParams = new Dictionary<string, string>();
myParams["q"] = "C#";
var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams);
Console.WriteLine(h1.StatusCode);
}
}可以看到状态码为200,说明HTTP响应正常。
这样有了HttpResponse,但还没能获取到HTTP的内容。要获取HTTP的内容,最好使用流式传输。在HttpFunctions类当中设计一个读取stream中文本的函数:
public static string responseText(HttpWebResponse h)
{
if(h!=null)
{
System.IO.Stream receiveStream = h.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
System.IO.StreamReader readStream = new System.IO.StreamReader(receiveStream, encode);
Char[] read = new Char[256];
// Reads 256 characters at a time.
int count = readStream.Read(read, 0, 256);
string ret = "";
while (count > 0)
{
String str = new String(read, 0, count);
ret += str;
count = readStream.Read(read, 0, 256);
}
readStream.Close();
return ret;
}
else
{
return "";
}
}我们请求的是HTML页面,内容往往会很多,我们在主调函数中测试可以只看一下前100个字符:
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> myParams = new Dictionary<string, string>();
myParams["q"] = "C#";
var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams);
Console.WriteLine(h1.StatusCode);
Console.WriteLine(HttpFunctions.responseText(h1).Substring(0,100));
h1.Dispose();
}
}输出了一行HTML结果,说明也成功读取到了HTML的内容
<!DOCTYPE html><html lang="zh" xml:lang="zh" xmlns="http://www.w3.org/1999/xhtml" xmlns:Web="http://
发送POST请求进行登录操作
URL传递有长度限制,如果数据比较长,则使用GET方法传参数就不合适了。 GET方法明文传输,并且可以被缓存,其安全性较差。像登录系统这种敏感数据的操作不适合使用GET方法。而POST请求将数据与URL分离,在传输数据时更常用。
在使用POST方法提交数据到服务端时,有多种编码供选择,默认是application/x-www-form-urlencoded,此时所有非字母数字类型的字符都需要转换为十六进制的ASCII值。但是如果表单中包含大量非字母数字时,这种编码的效率就非常低,比如处理二进制文件上传时就存在该问题,此时就需要定义multipart/form-data作为编码类型,使用这种类型时不会对输入进行编码,而是使用MIME协议将之作为多个部分进行发送,和邮件传输的标准相同。
例如网址 https://www.runoob.com/try/ajax/demo_post2.php 接收POST方法的两个参数fname和lname,将返回一句问候语。创建POST的HttpResponse与GET方法类似,只是参数parameters需要用Stream的方法以字节写入。
public static HttpWebResponse CreatePostHttpResponse(
string url,
IDictionary<string, string> parameters=null)
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
if (!(parameters == null || parameters.Count == 0))
{
var paraString = "";
int i = 0;
foreach(var it in parameters)
{
if(i>0)
{
paraString += "&";
}
paraString += String.Format("{0}={1}", it.Key,it.Value);
i += 1;
}
byte[] byteArray = Encoding.Default.GetBytes(paraString);
System.IO.Stream stream1 = request.GetRequestStream();
stream1.Write(byteArray, 0, byteArray.Length);//写入参数
stream1.Close();
}
try
{
return request.GetResponse() as HttpWebResponse;
}
catch
{
return null;
}主函数中调用
Dictionary<string, string> myParams = new Dictionary<string, string>(); string url = "https://www.runoob.com/try/ajax/demo_post2.php"; myParams["fname"] = "Henry"; myParams["lname"] = "Lord"; var h1 = HttpFunctions.CreatePostHttpResponse(url, myParams); var htContent = HttpFunctions.responseText(h1); Console.WriteLine(htContent); h1.Dispose();
则可以看到结果
<p style='color:red;'>你好,Henry Lord,今天过得怎么样?</p>
利用COOKIES模拟登录
接下来为了演示POST请求的方法,将以《用Python编写网络爬虫》一书提供的测试网站 :http://example.webscraping.com/places/default/user/login 为示例,自己注册后,进行模拟登录系统。

进入我们要登入的网站,检查这个登录的表单,可以发现除了图中有的这些条目,还有display:none;的一组数据,这些是网站用来验证用户的,如果C#模拟登录只传输电子邮件和密码,是不能登录成功的,我们需要把form中所有input条目都撮出来。在C#中,我们可以先请求到这个网页,用正则表达式将<input />字段的内容提取出来:
private static List<string> showMatch(string text, string expr)
{
System.Text.RegularExpressions.MatchCollection mc = System.Text.RegularExpressions.Regex.Matches(text, expr);
List<string> ret = new List<string>();
foreach (System.Text.RegularExpressions.Match m in mc)
{
ret.Add(m.ToString());
}
return ret;
}<input class="string" id="auth_user_email" name="email" type="text" value="" /> <input class="password" id="auth_user_password" name="password" type="password" value="" /> <input class="boolean" id="auth_user_remember_me" name="remember_me" type="checkbox" value="on" /> <input type="submit" value="Log In" /> <input name="_next" type="hidden" value="/places/default/index" /> <input name="_formkey" type="hidden" value="fe10396d-3a5f-4d8c-b03e-2960a2820cac" /> <input name="_formname" type="hidden" value="login" />
除了type=submit的提交按钮,另外6个name-type键值对则是我们需要post发送的数据。
private static Dictionary<string,string> getinputParameters(string message)
{
Dictionary<string, string> ret = new Dictionary<string, string>();
var r = showMatch(message, @"<input .*?/>");
foreach (var it in r)
{
System.Text.RegularExpressions.MatchCollection lineMatchKey =
System.Text.RegularExpressions.Regex.Matches(it, "name=\\\".*?\\\"");
if (lineMatchKey.Count > 0)
{
System.Text.RegularExpressions.MatchCollection lineMatchValue =
System.Text.RegularExpressions.Regex.Matches(it, "value=\\\".*?\\\"");
if (lineMatchValue.Count > 0)
{
ret[lineMatchKey[0].ToString().Substring(6, lineMatchKey[0].Length - 7)]
= lineMatchValue[0].ToString().Substring(7, lineMatchValue[0].Length - 8);
}
}
}
return ret;
}这样得到了所有的参数,就可以发送POST请求了
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> myParams = new Dictionary<string, string>();
string url = "http://example.webscraping.com/places/default/user/login";
var h1 = HttpFunctions.CreateGetHttpResponse(url);
var htContent = HttpFunctions.responseText(h1);
h1.Dispose();
myParams = getinputParameters(htContent);
myParams["email"] = "[email protected]";
myParams["password"] = "88888888";
var h2 = HttpFunctions.CreatePostHttpResponse(url, myParams);
htContent = HttpFunctions.responseText(h2);
Console.WriteLine(htContent);
h2.Dispose();
}
}
传的参数都是正确的,但是返回的HTML内容却没有登录信息(登录处的HTML仍显示的Log In而不是用户名),这是因为网页会话信息保存在Cookies中,当前的程序并没有为网页的头部信息添加Cookies,这就不能保持登录状态。
针对我们要登录的网站,可以看到它有两个cookie字段。我们就编写函数读取cookie
public static string getCookies(HttpWebResponse h1)
{
if(h1!=null)
{
string ret = "";
ret += h1.Headers.GetValues("Set-Cookie")[0].Split(';')[0];
ret += "; ";
ret += h1.Headers.GetValues("Set-Cookie")[1].Split(';')[0];
return ret;
}
else
{
return "";
}
}然后在刚才post方法的函数添加参数 string cookies=null,函数中添加
if (cookies != null)
{
request.Headers.Add(HttpRequestHeader.Cookie, cookies);
}就能在POST时将COOKIES发送过去。
这里getCookies(HttpWebResponse h1)方法因服务端而异,并不具通用性,换到其它网站上进行操作就要重要分析HTML结构并根据其结构来提取字段。这里只是用于展示添加cookie的方法。
读取图片内容
HTTP的返回结果在C#中都以stream形式展示,例如使用url表示的是服务端的一张图片,我们只需要使用Image.FromStream()即可:
public static void downloadPicture(string url)
{
var h1 = CreateGetHttpResponse(url);
if (h1 != null)
{
System.Drawing.Image image = System.Drawing.Image.FromStream(h1.GetResponseStream());
image.Save("download.png",System.Drawing.Imaging.ImageFormat.Png);
}
}如果URL正确,就可以看到在程序的路径下保存下了download.png
如果进行的是C#控制台程序设计,那么是没有
System.Drawing的,需要手动添加对System.Drawing.Common.dll的引用
上传图片
向服务端发送文件往往采用Content-type为”multipart/form-data”的post方法,添加POST参数时按照 multipart/form-data 类型的规范进行编写。
public static string Sys_uploadStudentPhoto(
string url,
string imageName,
IDictionary<string, string> stringDict)
{
return HttpPostData(url,
"file",
imageName,
stringDict);
}
private static string HttpPostData(
string url,
string fileKeyName,
string filePath,
IDictionary<string, string> stringDict)
{
string responseContent;
var memStream = new MemoryStream();
var request = (HttpWebRequest)WebRequest.Create(url);
var boundary = "---------------" + DateTime.Now.Ticks.ToString("x");
var beginBoundary = Encoding.ASCII.GetBytes("--" + boundary + "\r\n");
var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
var endBoundary = Encoding.ASCII.GetBytes("--" + boundary + "--\r\n");
request.Method = "POST";
request.ContentType = "multipart/form-data; boundary=" + boundary;
const string filePartHeader =
"Content-Disposition: form-data; name=\"{0}\"; filename=\"{1}\"\r\n" +
"Content-Type: application/octet-stream\r\n\r\n";
var header = string.Format(filePartHeader, fileKeyName, filePath);
var headerbytes = Encoding.UTF8.GetBytes(header);
memStream.Write(beginBoundary, 0, beginBoundary.Length);
memStream.Write(headerbytes, 0, headerbytes.Length);
var buffer = new byte[1024];
int bytesRead; // =0
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0)
{
memStream.Write(buffer, 0, bytesRead);
}
var stringKeyHeader = "\r\n--" + boundary +
"\r\nContent-Disposition: form-data; name=\"{0}\"" +
"\r\n\r\n{1}\r\n";
foreach (byte[] formitembytes in from string key in stringDict.Keys
select string.Format(stringKeyHeader, key, stringDict[key])
into formitem
select Encoding.UTF8.GetBytes(formitem))
{
memStream.Write(formitembytes, 0, formitembytes.Length);
}
memStream.Write(endBoundary, 0, endBoundary.Length);
request.ContentLength = memStream.Length;
var requestStream = request.GetRequestStream();
memStream.Position = 0;
var tempBuffer = new byte[memStream.Length];
memStream.Read(tempBuffer, 0, tempBuffer.Length);
memStream.Close();
requestStream.Write(tempBuffer, 0, tempBuffer.Length);
requestStream.Close();
responseContent = responseText(request.GetResponse());
fileStream.Close();
httpWebResponse.Close();
request.Abort();
return responseContent;
}