Anonymous驻中国办事处主任,私下搞了一个叫做“大嘴巴巴”的色*情网站。
http://dazui88.com/
这个网站烂的一逼,大家没事可以搞一搞它。今天我们试着爬取一下网站内容,回头交给网监×××姐。
1.首先我们观察大嘴巴巴最大的色请板块“轻松一刻”的URL
2.发现下面规律
3.然后编辑下面代码
##-*- coding:utf-8 -*-
import urllib2def load_page(url):'''send url return html_page'''user_agent = "User-Agent:Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0;"headers = {"User-Agent":user_agent}req = urllib2.Request(url,headers = headers)response = urllib2.urlopen(req)html = response.read()return htmldef write_to_file(file_name,txt):'''put txt into file_name'''print "writing file" + file_namef = open(file_name,'w')f.write(txt)f.closedef tiaba_spider(url,begin_page,end_page):'''fuck dazuibaba'''for i in range(begin_page,end_page + 1):pn = 442870 - i'''http://dazui88.com/qsyk/20180102442869.htmlhttp://dazui88.com/qsyk/20180102442868.htmlhttp://dazui88.com/qsyk/20180102442867.html........i = 1 ,pn = 442870 -1 = 442869'''dazui88_url = url + str(pn) + '.html'#print "dazui88'url:"#print dazui88_urlhtml = load_page(dazui88_url)#print "================%d==================" %(i)#print html#print "===================================="file_name = str(i) + ".html"write_to_file(file_name,html)#main
if __name__ == "__main__":url = raw_input("please input dazui88'URL:")#print urlbegin_page = int(raw_input("please input begin_page:"))end_page = int(raw_input("please input end_page:"))#print begin_page#print end_pagetiaba_spider(url,begin_page,end_page)
4.然后执行python fuck-dazui88.py测试一下
5.成功爆夏主任菊花一次,可以愉快的去找网警×××姐举报他了:)
转载于:https://blog.51cto.com/hackerwang/2057398