朋友在某网站学习,每次都要打开网页,想保存在本地播放,却没有下载方式。于是找我试试能不能下载下来。
1.分析一下网页,打开,按下F12,获取相关信息。
2.视频信息一般保存在m3u8文件中,因此直接搜索查找。
3.观察m3u8文件发现,ts文件为aes-128加密,key的获取方法就是直接从url获取。
4.从文件中提取出来url信息后,获取key值,将key转换为十六进制,尝试解密,发现成功。那么此种方法可行,只需要将m3u8中的ts链接获取,根据秘钥一个一个解密就可以了。
5.ts文件爬取完成后,在dos窗口使用命令行直接将文件合并就可以了。
copy /b *.ts videos.mp4
6.完整代码
# -*- coding: utf-8 -*-
"""
Time : 2021-04-02 13:53
Name : 茅十八
File : spider_shipin.py
Topic : 爬取中国会计网视频
"""
import requests
import re
import json
import base64
from Crypto.Cipher import AESclass Kuai_ji(object):def __init__(self):self.header = {'Host': 'elearning.chinaacc.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2','Accept-Encoding': 'gzip, deflate, br','Referer': 'https://member.chinaacc.com/','Connection': 'keep-alive','Cookie': 'hd_uid=CjsAJWBj98UjEPREAzzTAg==; clct_nuID=16171642301299115; bdp_uuid=25a70cc998-1ceae87102-e8be58078c; zg_did=%7B%22did%22%3A%20%22178867fdea4119-0154b88d38575c-4c3f237d-144000-178867fdea523d%22%7D; zg_9b4551cf447148b0845f31f91e8a524d=%7B%22sid%22%3A%201618188966620%2C%22updated%22%3A%201618189325497%2C%22info%22%3A%201617860882274%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%2282819233%22%7D; Hm_lvt_f1ca44b62370e4b7dc11d5937e51c2d6=1617929078,1617929186,1617955478,1618188971; _pk_id.member.chinaacc.com.e1fb=1cf2a9176d4ec1af.1617348690.1.1617348690.1617348690.; lastloginuser=m7677_41968; SelCourse=a|; _pk_id.www.chinaacc.com.e1fb=0be54a9159621948.1617929192.3.1618189311.1618189311.; _pk_ref.www.chinaacc.com.e1fb=%5B%22%22%2C%22%22%2C1618189311%2C%22https%3A%2F%2Fwww.chinaacc.com%2F%22%5D; _pk_id.www.chinaacc.com.eab1=2f935fa43caa4d22.1617929210.3.1618189325.1618189317.; _pk_ref.www.chinaacc.com.eab1=%5B%22%22%2C%22%22%2C1618189317%2C%22https%3A%2F%2Fwww.chinaacc.com%2F%22%5D; zg_ffaecff2118841b9866c8c549ea3c5a9=%7B%22sid%22%3A%201617958599807%2C%22updated%22%3A%201617960522067%2C%22info%22%3A%201617929338497%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22elearning.chinaacc.com%22%7D; trackerSdkVisitor_isNew=true; trackerSdkData={%22uid%22:%2281263452%22%2C%22platform_source%22:%22web%22%2C%22time%22:1618189325622%2C%22bdp_uuid%22:%2225a70cc998-1ceae87102-e8be58078c%22}; BIGipServermember.chinaacc.com=654392074.20480.0000; clientID=qJQvIBoumlK4DmXSbdByMvqZukMzCNGRuQHxZQDNXpBN8dzrqLDzUr99QJXCu3mladZpXkW9DwNR%0D%0A0gFt_OGTGbltFnm7o8c3MtWmo7jdFDY%0D%0A; client_ucToken=9F7D029DA3ADB8A8F60EBDE0D85B0312-6bab3e37de618bbc456b1a315b7ddfb1-01; Hm_lpvt_f1ca44b62370e4b7dc11d5937e51c2d6=1618189326; sid=e27f6440-c9bd-4a64-af08-1bcb7733c338; cdeluid=81263452; username=m7677_41968; JSESSIONID=5701254F6527C1F5E24B334186D48BE2; _pk_ses.www.chinaacc.com.e1fb=*; _pk_ses.www.chinaacc.com.eab1=*','Upgrade-Insecure-Requests': '1','Cache-Control': 'max-age=0',}self.url = "https://www.chinaacc.com/demo/h5/2/198/cware-39252/video-901.html"self.m3u8_url = self.getm3u8()def getm3u8(self):rous = requests.get(self.url, headers=self.header)data = re.findall(r"JSON.parse\('(.*)'\)", rous.text)[0].replace('\\', '')json_data = json.loads(data).get('videoPath')m3u8_url = 'https:' + json_datareturn m3u8_urldef get_data(self):print(self.m3u8_url)data = requests.get(self.m3u8_url).textaes_url = re.findall(r'#EXT-X-KEY:METHOD=AES-128,URI="(.*)"', data)[0]keys = requests.get(aes_url).textkey = base64.b64decode(keys)iv = '00000000000000000000000000000000'iv = bytes.fromhex(iv)ts_datas = re.findall(r'(/ssec.chinaacc.com/.*)\n', data)i = 0for ts_data in ts_datas:ts_url = 'http:/' + ts_dataprint(ts_url)ts_rous = requests.get(ts_url)file_path = str(i) + ".ts"to_file_path = 'videos\\' + str(i) + '.ts'with open(file_path, 'wb') as f:f.write(ts_rous.content)with open(file_path, 'rb') as f:cryptor = AES.new(key, AES.MODE_CBC, iv) # 创建实例plain_data = cryptor.decrypt(f.read()) # 放入需要解密的东西with open(to_file_path, 'wb') as w:w.write(plain_data)i += 1def main():kuai_ji = Kuai_ji()kuai_ji.get_data()if __name__ == '__main__':main()