更多详情请查看Honker
Python | 使用Python爬取Wallhaven网站壁纸并上传百度网盘
给大家推荐一款超好用的壁纸下载网站—— wallhaven
第一次知道这个网站的时候,惊为天人。顿时有一种挖到宝藏的feel。给用户带来的是丝滑的体验。壁纸全都是免费下载。对比国内相关壁纸网站,可谓是业界良心。
壁纸这么多,当然就要用Python下载。
如何存储?本地空间不够,当然网盘来凑。
如何持续爬取?部署服务器
编程序
见博文
上传百度网盘
因为需要上传百度网盘,需加入相关代码:
class Adapter:"""bypy 适配器前提运行 bypy info 登陆成功"""def __init__(self):self._bp = ByPy()def upload(self,localpath,remotepath,**kwargs):"""上传:param localpath::param remotepath: /videos 实际路径/bypy/videos:param kwargs::return:"""self._bp.upload(localpath=localpath,remotepath=remotepath,**kwargs)
!!!注意:代码运行的前提是 bypy info运行成功
并修改函数 down_pic(image_url):
def down_pic(image_url):try:path = 'temporary data/{}'.format((image_title.split('/')[-1]) + (image_url.split('/')[-1]))print(path)opener = request.build_opener()opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36')]request.install_opener(opener)request.urlretrieve(image_url, path)adapter.upload(localpath=path, remotepath='image/wallhaven/')os.remove(path)except Exception as m:print(m)
- !!!注意: 需提前在程序工作目录 创建文件夹 temporary data 。
- **!!!注意: **需提前在百度网盘 创建文件夹 image/wallhaven/ 。
- os.remove():既然已经上传,就可以删除本地壁纸啦(认为本地存储足够的,可以删去此代码)
最终的代码
from requests_html import HTMLSession # 用于数据请求、数据提取、相较于其他库更加简洁方便
from urllib import request # 本例中该库只用于下载保存图片
import os
from bypy import ByPyclass Adapter:"""bypy 适配器前提运行 bypy info 登陆成功"""def __init__(self):self._bp = ByPy()def upload(self,localpath,remotepath,**kwargs):"""上传:param localpath::param remotepath: /videos 实际路径/bypy/videos:param kwargs::return:"""self._bp.upload(localpath=localpath,remotepath=remotepath,**kwargs)headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'} # 请求头,用于反反爬session = HTMLSession()urls = []num_int = 2
for i in range(1, num_int)# r = session.get('https://wallhaven.cc/toplist?page={}'.format(i))try:r = session.get('https://wallhaven.cc/search?categories=110&purity=100&topRange=1y&sorting=toplist&order=desc&page={}'.format(i))urls.extend(list(r.html.links))print(i, len(list(r.html.links)))except Exception as m:print(m)
print(len(urls))adapter = Adapter()
def down_pic(image_url):try:path = 'temporary data/{}'.format((image_title.split('/')[-1]) + (image_url.split('/')[-1]))print(path)opener = request.build_opener()opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36')]request.install_opener(opener)request.urlretrieve(image_url, path)adapter.upload(localpath=path, remotepath='image/wallhaven/')os.remove(path)except Exception as m:print(m)for url in urls:try:session1 = HTMLSession()r1 = session1.get(url)sr = r1.html.find("img#wallpaper", first=True)image_url = sr.attrs['src']image_title = sr.attrs['alt']print(image_url)print(image_title)down_pic(image_url)except BaseException as e:print(e)
部署服务器
- 登录服务器
- 上传程序
- 创建文件夹temporary data
- 输入命令 “nohup python3 程序名.py &”
- 优雅地去睡觉,睡等壁纸装满网盘
成果展示
爬取的壁纸下载
链接(提取码: 7p8q)
一晚上爬取了两千多个,还在持续爬取ing