winform模拟登陆网页_【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)...

news/2024/5/10 2:07:47/文章来源:https://blog.csdn.net/weixin_30895003/article/details/113005644

之前已经介绍过了网络相关的一些基础知识了:

以及简单的网页内容抓取,用C#是如何实现的:

现在接着来介绍,以模拟登陆百度首页:

为例,说明如何通过C#模拟登陆网站。

不过,此处需要介绍一下此文前提:

假定你已经看完了:

了解了基本的网络相关基本概念;

看完了:

知道了如何使用IE9的F12等工具去分析网页执行的过程。

1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑

此想要通过程序,即C#代码,实现模拟登陆百度首页之前。

你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。

而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:

2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑

看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。

注:

(1)关于在C#中如何利用cookie,不熟悉的,先去看:

(2)对于正则表达式不熟悉的,去参考:

(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:

此处,再把分析出来的流程,贴出来,以便方便和代码对照:顺序

访问地址

访问类型

发送的数据

需要获得/提取的返回的值

1GET无返回的cookie中的BAIDUID

3POST一堆的post data,其中token的值是之前提取出来的需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID

然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。

【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】

其中,通过UI中,点击“获取cookie BAIDUID”:

1f200ef81320734f6a823edeebd75d3a.png

然后调用下面这部分代码:private void btnGetBaiduid_Click(object sender, EventArgs e)

{

//http://www.baidu.com/

string baiduMainUrl = txbBaiduMainUrl.Text;

//generate http request

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);

//add follow code to handle cookies

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

req.Method = "GET";

//use request to get response

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

txbGotBaiduid.Text = "";

foreach (Cookie ck in resp.Cookies)

{

txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;

if (ck.Name == "BAIDUID")

{

gotCookieBaiduid = true;

}

}

if (gotCookieBaiduid)

{

//store cookies

curCookies = resp.Cookies;

}

else

{

MessageBox.Show("错误:没有找到cookie BAIDUID !");

}

}

获得上述所看到的BAIDUID这个cookie的值了。

然后接着点击“获取token值”,然后调用下面的代码:private void btnGetToken_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid)

{

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);

//add previously got cookies

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

req.Method = "GET";

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

StreamReader sr = new StreamReader(resp.GetResponseStream());

string respHtml = sr.ReadToEnd();

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass\.api\.params\.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);

if (foundTokenVal.Success)

{

//extracted the token value

txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;

extractTokenValueOK = true;

}

else

{

txbExtractedTokenVal.Text = "错误:没有找到token的值!";

}

}

else

{

MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");

}

}

就可以获取对应的token的值了:

a97d5ae5429a5660d6871d871d7a42f7.png

接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid && extractTokenValueOK)

{

string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info

Dictionary postDict = new Dictionary();

//postDict.Add("ppui_logintime", "");

postDict.Add("charset", "utf-8");

//postDict.Add("codestring", "");

postDict.Add("token", txbExtractedTokenVal.Text);

postDict.Add("isPhone", "false");

postDict.Add("index", "0");

//postDict.Add("u", "");

//postDict.Add("safeflg", "0");

postDict.Add("staticpage", staticpage);

postDict.Add("loginType", "1");

postDict.Add("tpl", "mn");

postDict.Add("callback", "parent.bdPass.api.login._postCallback");

postDict.Add("username", txbBaiduUsername.Text);

postDict.Add("password", txbBaiduPassword.Text);

//postDict.Add("verifycode", "");

postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);

//add cookie

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

//set to POST

req.Method = "POST";

req.ContentType = "application/x-www-form-urlencoded";

//prepare post data

string postDataStr = quoteParas(postDict);

byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);

req.ContentLength = postBytes.Length;

//send post data

Stream postDataStream = req.GetRequestStream();

postDataStream.Write(postBytes, 0, postBytes.Length);

postDataStream.Close();

//got response

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//got returned html

StreamReader sr = new StreamReader(resp.GetResponseStream());

string loginBaiduRespHtml = sr.ReadToEnd();

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};

foreach (String cookieToCheck in cookiesNameList)

{

cookieCheckDict.Add(cookieToCheck, false);

}

foreach (Cookie singleCookie in resp.Cookies)

{

if (cookieCheckDict.ContainsKey(singleCookie.Name))

{

cookieCheckDict[singleCookie.Name] = true;

}

}

bool allCookiesFound = true;

foreach (bool foundCurCookie in cookieCheckDict.Values)

{

allCookiesFound = allCookiesFound && foundCurCookie;

}

loginBaiduOk = allCookiesFound;

if (loginBaiduOk)

{

txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";

}

else

{

txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";

txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();

txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}

}

else

{

MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");

}

}

如果用户名和密码都是正确的话,即可成功登陆:

87f4532f1b9d16c4002f6d1588a2206c.png

当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:

1a141ef56ec59ce9b1eb5d620eda05e0.png

完整的C#模拟登陆百度首页的代码,如下:using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Text;

using System.Windows.Forms;

using System.Net;

using System.IO;

using System.Text.RegularExpressions;

using System.Web;

namespace emulateLoginBaidu

{

public partial class frmEmulateLoginBaidu : Form

{

CookieCollection curCookies = null;

bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;

public frmEmulateLoginBaidu()

{

InitializeComponent();

}

private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)

{

//init

curCookies = new CookieCollection();

gotCookieBaiduid = false;

extractTokenValueOK = false;

loginBaiduOk = false;

}

/******************************************************************************

functions in crifanLib.cs

*******************************************************************************/

//quote the input dict values

//note: the return result for first para no '&'

public string quoteParas(Dictionary paras)

{

string quotedParas = "";

bool isFirst = true;

string val = "";

foreach (string para in paras.Keys)

{

if (paras.TryGetValue(para, out val))

{

if (isFirst)

{

isFirst = false;

quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);

}

else

{

quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);

}

}

else

{

break;

}

}

return quotedParas;

}

/******************************************************************************

Demo emulate login baidu related functions

*******************************************************************************/

private void btnGetBaiduid_Click(object sender, EventArgs e)

{

//http://www.baidu.com/

string baiduMainUrl = txbBaiduMainUrl.Text;

//generate http request

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);

//add follow code to handle cookies

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

req.Method = "GET";

//use request to get response

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

txbGotBaiduid.Text = "";

foreach (Cookie ck in resp.Cookies)

{

txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;

if (ck.Name == "BAIDUID")

{

gotCookieBaiduid = true;

}

}

if (gotCookieBaiduid)

{

//store cookies

curCookies = resp.Cookies;

}

else

{

MessageBox.Show("错误:没有找到cookie BAIDUID !");

}

}

private void btnGetToken_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid)

{

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);

//add previously got cookies

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

req.Method = "GET";

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

StreamReader sr = new StreamReader(resp.GetResponseStream());

string respHtml = sr.ReadToEnd();

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass\.api\.params\.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);

if (foundTokenVal.Success)

{

//extracted the token value

txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;

extractTokenValueOK = true;

}

else

{

txbExtractedTokenVal.Text = "错误:没有找到token的值!";

}

}

else

{

MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");

}

}

private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid && extractTokenValueOK)

{

string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info

Dictionary postDict = new Dictionary();

//postDict.Add("ppui_logintime", "");

postDict.Add("charset", "utf-8");

//postDict.Add("codestring", "");

postDict.Add("token", txbExtractedTokenVal.Text);

postDict.Add("isPhone", "false");

postDict.Add("index", "0");

//postDict.Add("u", "");

//postDict.Add("safeflg", "0");

postDict.Add("staticpage", staticpage);

postDict.Add("loginType", "1");

postDict.Add("tpl", "mn");

postDict.Add("callback", "parent.bdPass.api.login._postCallback");

postDict.Add("username", txbBaiduUsername.Text);

postDict.Add("password", txbBaiduPassword.Text);

//postDict.Add("verifycode", "");

postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);

//add cookie

req.CookieContainer = new CookieContainer();

req.CookieContainer.Add(curCookies);

//set to POST

req.Method = "POST";

req.ContentType = "application/x-www-form-urlencoded";

//prepare post data

string postDataStr = quoteParas(postDict);

byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);

req.ContentLength = postBytes.Length;

//send post data

Stream postDataStream = req.GetRequestStream();

postDataStream.Write(postBytes, 0, postBytes.Length);

postDataStream.Close();

//got response

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//got returned html

StreamReader sr = new StreamReader(resp.GetResponseStream());

string loginBaiduRespHtml = sr.ReadToEnd();

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};

foreach (String cookieToCheck in cookiesNameList)

{

cookieCheckDict.Add(cookieToCheck, false);

}

foreach (Cookie singleCookie in resp.Cookies)

{

if (cookieCheckDict.ContainsKey(singleCookie.Name))

{

cookieCheckDict[singleCookie.Name] = true;

}

}

bool allCookiesFound = true;

foreach (bool foundCurCookie in cookieCheckDict.Values)

{

allCookiesFound = allCookiesFound && foundCurCookie;

}

loginBaiduOk = allCookiesFound;

if (loginBaiduOk)

{

txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";

}

else

{

txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";

txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();

txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}

}

else

{

MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");

}

}

private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)

{

string emulateLoginTutorialUrl = "https://www.crifan.com/emulate_login_website_using_csharp";

System.Diagnostics.Process.Start(emulateLoginTutorialUrl);

}

private void btnClearAll_Click(object sender, EventArgs e)

{

curCookies = new CookieCollection();

gotCookieBaiduid = false;

extractTokenValueOK = false;

loginBaiduOk = false;

txbGotBaiduid.Text = "";

txbExtractedTokenVal.Text = "";

txbBaiduUsername.Text = "";

txbBaiduPassword.Text = "";

txbEmulateLoginResult.Text = "";

}

}

}

对应的,完整的VS2010的C#项目,可以去这里下载:

【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】

后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。

下面是完整的,利用到crifanLib.cs的版本,的C#代码:using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Text;

using System.Windows.Forms;

using System.Net;

using System.IO;

using System.Text.RegularExpressions;

using System.Web;

namespace emulateLoginBaidu

{

public partial class frmEmulateLoginBaidu : Form

{

CookieCollection curCookies = null;

bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;

public frmEmulateLoginBaidu()

{

InitializeComponent();

}

private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)

{

this.AcceptButton = this.btnEmulateLoginBaidu;

//init for crifanLib.cs

curCookies = new CookieCollection();

//init for demo login

gotCookieBaiduid = false;

extractTokenValueOK = false;

loginBaiduOk = false;

}

/******************************************************************************

functions in crifanLib.cs

Online browser: http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs

Download: http://code.google.com/p/crifanlib/

*******************************************************************************/

//quote the input dict values

//note: the return result for first para no '&'

public string quoteParas(Dictionary paras)

{

string quotedParas = "";

bool isFirst = true;

string val = "";

foreach (string para in paras.Keys)

{

if (paras.TryGetValue(para, out val))

{

if (isFirst)

{

isFirst = false;

quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);

}

else

{

quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);

}

}

else

{

break;

}

}

return quotedParas;

}

/*********************************************************************/

/* cookie */

/*********************************************************************/

//add a single cookie to cookies, if already exist, update its value

public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies, bool overwriteDomain)

{

bool found = false;

if (cookies.Count > 0)

{

foreach (Cookie originalCookie in cookies)

{

if (originalCookie.Name == toAdd.Name)

{

// !!! for different domain, cookie is not same,

// so should not set the cookie value here while their domains is not same

// only if it explictly need overwrite domain

if ((originalCookie.Domain == toAdd.Domain) ||

((originalCookie.Domain != toAdd.Domain) && overwriteDomain))

{

//here can not force convert CookieCollection to HttpCookieCollection,

//then use .remove to remove this cookie then add

// so no good way to copy all field value

originalCookie.Value = toAdd.Value;

originalCookie.Domain = toAdd.Domain;

originalCookie.Expires = toAdd.Expires;

originalCookie.Version = toAdd.Version;

originalCookie.Path = toAdd.Path;

//following fields seems should not change

//originalCookie.HttpOnly = toAdd.HttpOnly;

//originalCookie.Secure = toAdd.Secure;

found = true;

break;

}

}

}

}

if (!found)

{

if (toAdd.Domain != "")

{

// if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!

cookies.Add(toAdd);

}

}

}//addCookieToCookies

//add singel cookie to cookies, default no overwrite domain

public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies)

{

addCookieToCookies(toAdd, ref cookies, false);

}

//check whether the cookies contains the ckToCheck cookie

//support:

//ckTocheck is Cookie/string

//cookies is Cookie/string/CookieCollection/string[]

public bool isContainCookie(object ckToCheck, object cookies)

{

bool isContain = false;

if ((ckToCheck != null) && (cookies != null))

{

string ckName = "";

Type type = ckToCheck.GetType();

//string typeStr = ckType.ToString();

//if (ckType.FullName == "System.string")

if (type.Name.ToLower() == "string")

{

ckName = (string)ckToCheck;

}

else if (type.Name == "Cookie")

{

ckName = ((Cookie)ckToCheck).Name;

}

if (ckName != "")

{

type = cookies.GetType();

// is single Cookie

if (type.Name == "Cookie")

{

if (ckName == ((Cookie)cookies).Name)

{

isContain = true;

}

}

// is CookieCollection

else if (type.Name == "CookieCollection")

{

foreach (Cookie ck in (CookieCollection)cookies)

{

if (ckName == ck.Name)

{

isContain = true;

break;

}

}

}

// is single cookie name string

else if (type.Name.ToLower() == "string")

{

if (ckName == (string)cookies)

{

isContain = true;

}

}

// is cookie name string[]

else if (type.Name.ToLower() == "string[]")

{

foreach (string name in ((string[])cookies))

{

if (ckName == name)

{

isContain = true;

break;

}

}

}

}

}

return isContain;

}//isContainCookie

// update cookiesToUpdate to localCookies

// if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate

public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies, object omitUpdateCookies)

{

if (cookiesToUpdate.Count > 0)

{

if (localCookies == null)

{

localCookies = cookiesToUpdate;

}

else

{

foreach (Cookie newCookie in cookiesToUpdate)

{

if (isContainCookie(newCookie, omitUpdateCookies))

{

// need omit process this

}

else

{

addCookieToCookies(newCookie, ref localCookies);

}

}

}

}

}//updateLocalCookies

//update cookiesToUpdate to localCookies

public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies)

{

updateLocalCookies(cookiesToUpdate, ref localCookies, null);

}

/*********************************************************************/

/* HTTP */

/*********************************************************************/

/* get url's response */

public HttpWebResponse getUrlResponse(string url,

Dictionary headerDict,

Dictionary postDict,

int timeout,

string postDataStr)

{

//CookieCollection parsedCookies;

HttpWebResponse resp = null;

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

req.AllowAutoRedirect = true;

req.Accept = "*/*";

//const string gAcceptLanguage = "en-US"; // zh-CN/en-US

//req.Headers["Accept-Language"] = gAcceptLanguage;

req.KeepAlive = true;

//IE8

//const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";

//IE9

//const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64

const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86

//Chrome

//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";

//Mozilla Firefox

//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";

req.UserAgent = gUserAgent;

req.Headers["Accept-Encoding"] = "gzip, deflate";

req.AutomaticDecompression = DecompressionMethods.GZip;

req.Proxy = null;

if (timeout > 0)

{

req.Timeout = timeout;

}

if (curCookies != null)

{

req.CookieContainer = new CookieContainer();

req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain

req.CookieContainer.Add(curCookies);

}

if (headerDict != null)

{

foreach (string header in headerDict.Keys)

{

string headerValue = "";

if (headerDict.TryGetValue(header, out headerValue))

{

// following are allow the caller overwrite the default header setting

if (header.ToLower() == "referer")

{

req.Referer = headerValue;

}

else if (header.ToLower() == "allowautoredirect")

{

bool isAllow = false;

if (bool.TryParse(headerValue, out isAllow))

{

req.AllowAutoRedirect = isAllow;

}

}

else if (header.ToLower() == "accept")

{

req.Accept = headerValue;

}

else if (header.ToLower() == "keepalive")

{

bool isKeepAlive = false;

if (bool.TryParse(headerValue, out isKeepAlive))

{

req.KeepAlive = isKeepAlive;

}

}

else if (header.ToLower() == "accept-language")

{

req.Headers["Accept-Language"] = headerValue;

}

else if (header.ToLower() == "useragent")

{

req.UserAgent = headerValue;

}

else

{

req.Headers[header] = headerValue;

}

}

else

{

break;

}

}

}

if (postDict != null || postDataStr != "")

{

req.Method = "POST";

req.ContentType = "application/x-www-form-urlencoded";

if (postDict != null)

{

postDataStr = quoteParas(postDict);

}

//byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);

byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);

req.ContentLength = postBytes.Length;

Stream postDataStream = req.GetRequestStream();

postDataStream.Write(postBytes, 0, postBytes.Length);

postDataStream.Close();

}

else

{

req.Method = "GET";

}

//may timeout, has fixed in:

//https://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/

resp = (HttpWebResponse)req.GetResponse();

updateLocalCookies(resp.Cookies, ref curCookies);

return resp;

}

public HttpWebResponse getUrlResponse(string url,

Dictionary headerDict,

Dictionary postDict)

{

return getUrlResponse(url, headerDict, postDict, 0, "");

}

public HttpWebResponse getUrlResponse(string url)

{

return getUrlResponse(url, null, null, 0, "");

}

// valid charset:"GB18030"/"UTF-8", invliad:"UTF8"

public string getUrlRespHtml(string url,

Dictionary headerDict,

string charset,

Dictionary postDict,

int timeout,

string postDataStr)

{

string respHtml = "";

//HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);

HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);

//long realRespLen = resp.ContentLength;

StreamReader sr;

if ((charset != null) && (charset != ""))

{

Encoding htmlEncoding = Encoding.GetEncoding(charset);

sr = new StreamReader(resp.GetResponseStream(), htmlEncoding);

}

else

{

sr = new StreamReader(resp.GetResponseStream());

}

respHtml = sr.ReadToEnd();

return respHtml;

}

public string getUrlRespHtml(string url, Dictionary headerDict, string charset, Dictionary postDict, string postDataStr)

{

return getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);

}

public string getUrlRespHtml(string url, Dictionary headerDict, Dictionary postDict)

{

return getUrlRespHtml(url, headerDict, "", postDict, "");

}

public string getUrlRespHtml(string url, Dictionary headerDict)

{

return getUrlRespHtml(url, headerDict, null);

}

public string getUrlRespHtml(string url, string charset, int timeout)

{

return getUrlRespHtml(url, null, charset, null, timeout, "");

}

public string getUrlRespHtml(string url, string charset)

{

return getUrlRespHtml(url, charset, 0);

}

public string getUrlRespHtml(string url)

{

return getUrlRespHtml(url, "");

}

/******************************************************************************

Demo emulate login baidu related functions

*******************************************************************************/

private void btnGetBaiduid_Click(object sender, EventArgs e)

{

//http://www.baidu.com/

string baiduMainUrl = txbBaiduMainUrl.Text;

HttpWebResponse resp = getUrlResponse(baiduMainUrl);

txbGotBaiduid.Text = "";

foreach (Cookie ck in resp.Cookies)

{

txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;

if (ck.Name == "BAIDUID")

{

gotCookieBaiduid = true;

}

}

if (gotCookieBaiduid)

{

//store cookies

curCookies = resp.Cookies;

}

else

{

MessageBox.Show("错误:没有找到cookie BAIDUID !");

}

}

private void btnGetToken_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid)

{

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

string respHtml = getUrlRespHtml(getapiUrl);

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass\.api\.params\.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);

if (foundTokenVal.Success)

{

//extracted the token value

txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;

extractTokenValueOK = true;

}

else

{

txbExtractedTokenVal.Text = "错误:没有找到token的值!";

}

}

else

{

MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");

}

}

private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)

{

if (gotCookieBaiduid && extractTokenValueOK)

{

string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info

Dictionary postDict = new Dictionary();

//postDict.Add("ppui_logintime", "");

postDict.Add("charset", "utf-8");

//postDict.Add("codestring", "");

postDict.Add("token", txbExtractedTokenVal.Text);

postDict.Add("isPhone", "false");

postDict.Add("index", "0");

//postDict.Add("u", "");

//postDict.Add("safeflg", "0");

postDict.Add("staticpage", staticpage);

postDict.Add("loginType", "1");

postDict.Add("tpl", "mn");

postDict.Add("callback", "parent.bdPass.api.login._postCallback");

postDict.Add("username", txbBaiduUsername.Text);

postDict.Add("password", txbBaiduPassword.Text);

//postDict.Add("verifycode", "");

postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

string loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl, null, postDict);

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};

foreach (String cookieToCheck in cookiesNameList)

{

cookieCheckDict.Add(cookieToCheck, false);

}

foreach (Cookie singleCookie in curCookies)

{

if (cookieCheckDict.ContainsKey(singleCookie.Name))

{

cookieCheckDict[singleCookie.Name] = true;

}

}

bool allCookiesFound = true;

foreach (bool foundCurCookie in cookieCheckDict.Values)

{

allCookiesFound = allCookiesFound && foundCurCookie;

}

loginBaiduOk = allCookiesFound;

if (loginBaiduOk)

{

txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";

}

else

{

txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}

}

else

{

MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");

}

}

private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)

{

string emulateLoginTutorialUrl = "https://www.crifan.com/emulate_login_website_using_csharp";

System.Diagnostics.Process.Start(emulateLoginTutorialUrl);

}

private void btnClearAll_Click(object sender, EventArgs e)

{

curCookies = new CookieCollection();

gotCookieBaiduid = false;

extractTokenValueOK = false;

loginBaiduOk = false;

txbGotBaiduid.Text = "";

txbExtractedTokenVal.Text = "";

txbBaiduUsername.Text = "";

txbBaiduPassword.Text = "";

txbEmulateLoginResult.Text = "";

}

}

}

完整的VS2010的项目,可去这里下载:

关于crifanLib.cs:

【总结】

可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比用Python实现出来,要复杂的多。

主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。

所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。

【后记 2013-09-11】

1.经过研究:

的确是:

之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;

2.现已找到原因并修复。

原因是:

.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:

H_PS_PSSID

失效,导致后续操作都异常了。

而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;

3.修复后的代码:

供下载:

(1)模拟百度登陆 独立完整代码版本 .NET 4.0

(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0

(抽空再上传上面两个文件,因为此处上传出错:xxx.7z:

unknown Bytes complete FAILED!

:Upload canceled

: VIRUS DETECTED!

(Heuristics.Broken.Executable FOUND)

抽空换个时间上传试试。还是同样错误的话,再去解决。)

【总结】

.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:

一直就是狗屎,bug一堆。

详见:

以后,能少用那个resp.Cookies,就少用吧。

否则被C#玩死,都不知道怎么死的。

还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。

详见:

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.luyixian.cn/news_show_700910.aspx

如若内容造成侵权/违法违规/事实不符,请联系dt猫网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

15岁天才创办4chan匿名网站,如今因股权分配不均,与谷歌不欢而散!

点击上方“视学算法”,选择加"星标"或“置顶”重磅干货,第一时间送达视学算法报道 转载自:新智元来源:cnbc编辑:LZY【新智元导读】15岁创立在线社区4chan,这位「天才男孩」Chris Poole 现离职谷…

学以致用七---Centos7.2+python3.6.2+django2.1.1 --搭建一个网站(补充)

补充:上一节出现的报错提示 可在settings.py 里,改成 ‘*’ ,这样所有的主机都可以访问了。 打开网页 注意红色框出来的 hello 是和 urls.py里的hello对应 urls.py 里的 views.hello 和 app下views.py 里的函数对应 关系对应图 关系图随着…

小学六年级,自学计算机,会爬虫,搞崩过学校网站,还有女朋友...

点击上方“视学算法”,选择加"星标"或“置顶”重磅干货,第一时间送达我在上周遇到一个很奇怪的读者,他的头像是电影《V字仇杀队》里的面具。感觉上去是一个黑客爱好者,不是一个好惹的家伙,小林看了瑟瑟发抖。…

优词词根词典mdx_推荐|一波好用的在线英语词典网站

【01】Definitions:http://t.cn/RcL6CcH;在线多语言词典查询网是一个免费的英文单词解释的多语言翻译网站,内容涵盖了人名、地名、热门词汇、技术、历史名词等丰富的词汇,支持中文语言,是英语爱好者查询词语解释的必备…

网站优化基础教程:发布外链常见的五种方式!

想要做好网站优化,外链的发布也是很重要的一环,如果您还没有做,建议您抓紧去做一下。 这篇文章发迹创业网就分享一下,常见的几种发布外链的方法。 1,锚文本 又叫做超链接,是指给关键词加一个链接&#xff0…

一个可提供html5制作服务的网站

2019独角兽企业重金招聘Python工程师标准>>> 【TechWeb报道】最近网上出现了一个专门基于HTML5/CSS3制作服务的组织 P2H.cn. 就是专门提供网站切图的一项服务。特别在哪儿呢 ,P2H.cn 可以制作出完美的兼容的html5/css3的页面。 王大利/文 如果你不知…

两个网站做到同一个服务器,两个网站放在同一个服务器 备案

两个网站放在同一个服务器 备案 内容精选换一换没有,华为云包含企业邮箱服务,具体请参考: https://www.huaweicloud.com/marketplace/activity/mail.html。您可享受咨询解答、退换货和软件维护、升级等服务。云速建站提供的以下四种形式的帮助…

爬虫之selenium爬取斗鱼网站

爬虫之selenium爬取斗鱼网站 示例代码: from selenium import webdriver import timeclass Douyu(object):def __init__(self):self.url https://www.douyu.com/directory/allself.driver webdriver.Chrome()def parse_data(self):time.sleep(3)room_list self…

献上程序员大学四年珍藏的30个宝藏网站,全部拿出来

目录一、程序员视频学习网站1.哔哩哔哩2.慕课网3.学堂在线二、编程学习网站1.菜鸟教程2.W3cSchool3.实验楼三、刷题网站1.力扣2.牛客网——在线编程模块3.CodeTop4.赛码网四、实用工具1.Processon2.ioDraw3.在线JSON解析4.在线进制转换五、博客 、论坛1.CSDN2.掘金3.简书4.博客…

七个合法学习黑客技术的网站,让你从萌新成为大佬

合法的学习网站,以下这些网站,虽说不上全方位的满足你的需求,但是大部分也都能。能带你了解到黑客有关的技术,视频,电子书,实践,工具,数据库等等相关学习内容。以上这些网站我都是用…

java中的Executors简介与多线程在网站上逐步优化的运用案例

提供Executor的工厂类 忽略了自定义的ThreadFactory、callable和unconfigurable相关的方法newFixedxxx:在任意时刻,最多有nThreads个线程在处理task;如果所有线程都在运行时来了新的任务,它会被扔入队列;如果有线程在执行期间因某…

爬取网站图片并保存到本地

第一步:模拟浏览器发出请求,获取网页数据 import requests# 目标网站 url https://baijiahao.baidu.com/s?id1687278509395553439&wfrspider&forpc # 头部伪装 headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Ge…

php网站安全狗绕过,最新安全狗绕过姿势 - Azeng呐的个人空间 - OSCHINA - 中文开源技术交流社区...

安全狗是让大家最头疼的安全防护软件,然后我给大家带来最新的安全狗绕过,也不知道能活多久。攻防永无止境吧。最新版本安全狗从官网下载的,我来说一下思路。要想绕过安全狗首先你要知道,安全狗是怎么防护的,过滤的是什…

使用C#的HttpWebRequest模拟登陆网站

很久没有写新的东西了,今天在工作中遇到的一个问题,感觉很有用,有种想记下来的冲动。 这篇文章是有关模拟登录网站方面的。 实现步骤; 启用一个web会话发送模拟数据请求(POST或者GET)获取会话的CooKie 并根…

Scrapy框架模拟Github网站登陆

1. 以往的模拟登陆的方法 1.1 requests模块是如何实现模拟登陆的? 直接携带cookies请求页面找url地址,发送post请求存储cookie 1.2 selenium是如何模拟登陆的? 找到对应的input标签,输入文本点击登陆 1.3 scrapy的模拟登陆 直…

Python爬虫并自制新闻网站,太好玩了

来源 | 凹凸数据(ID:alltodata)我们总是在爬啊爬,爬到了数据难道只是为了做一个词云吗?当然不!这次我就利用flask为大家呈现一道小菜。Flask是python中一个轻量级web框架,相对于其他web框架来说…

Spring Boot 2.X整合Spring-cache,让你的网站速度飞起来

计算机领域有人说过一句名言:“计算机科学领域的任何问题都可以通过增加一个中间层来解决”,今天我们就用Spring-cache给网站添加一层缓存,让你的网站速度飞起来。本文目录 一、Spring Cache介绍二、缓存注解介绍三、Spring BootCache实战1、…

一步步构建大型网站架构

之前我简单向大家介绍了各个知名大型网站的架构,MySpace的五个里程碑、Flickr的架构、YouTube的架构、PlentyOfFish的架构、WikiPedia的架构。这几个都很典型,我们可以从中获取很多有关网站架构方面的知识,看了之后你会发现你原来的想法很可能…

利用WxJava实现PC网站集成微信登录功能,核心代码竟然不超过10行

最近网站PC端集成微信扫码登录,踩了不少坑,在此记录下实现过程和注意事项。本文目录 一、微信开放平台操作步骤1.创建“网站应用”2.获取AppID和AppSecret二、开发指南三、开发实战1、pom.xml引入jar包2、配置文件添加对应的配置3、初始化配置4、控制层核…

你为什么应该经常访问招聘网站?招聘网站至少有4个方面的价值!

一、缘起读大学的时候,有时候会感到很迷茫,不知道毕业之后可以做什么,自己能拿到多少的月薪。于是,就想到去参加一些公司的招聘。大二大三的时候,就去武大参加了武汉中地数码等3个公司的笔试。但是,没有交答…