python 爬网站上的图片

news/2024/5/10 3:43:30/文章来源:https://blog.csdn.net/dcxhun3/article/details/52485498

最近在做bot的动物识别，最后根据大会给出来的测试数据，发现对简笔画的动物识别处于懵圈状态，识别效果很差~故我需要自己爬取一些简笔画的图片~
手写学习了某一网站的图片爬取：
这里写图片描述
附上代码：

# -*- coding: utf-8 -*-
import urllib
import re
import time
import os#显示下载进度
def schedule(a,b,c):'''''a:已经下载的数据块b:数据块的大小c:远程文件的大小'''per = 100.0 * a * b / cif per > 100 :per = 100print '%.2f%%' % perdef getHtml(url):page = urllib.urlopen(url)html = page.read()return htmldef downloadImg(html):reg = r'src="(.+?\.jpg)" pic_ext'imgre = re.compile(reg)imglist = re.findall(imgre, html)#定义文件夹的名字t = time.localtime(time.time())foldername = str(t.__getattribute__("tm_year"))+"-"+str(t.__getattribute__("tm_mon"))+"-"+str(t.__getattribute__("tm_mday"))picpath = 'H:\\getpic\\pic\\%s' % (foldername) #下载到的本地目录if not os.path.exists(picpath):   #路径不存在时创建一个os.makedirs(picpath)   x = 0for imgurl in imglist:target = picpath+'\\%s.jpg' % xprint 'Downloading image to location: ' + target + '\nurl=' + imgurlimage = urllib.urlretrieve(imgurl, target, schedule)x += 1return image;if __name__ == '__main__':print '''         ***************************************      Welcome to use Spider   ****     Created on  2016-09-08   ****       @author:dcx         ***************************************'''html = getHtml("http://tieba.baidu.com/p/2460150866")downloadImg(html)print "Download has finished."