使用了20几分钟,爬取了zol相关的热门手机型号、特点、价格、上市时间、屏幕大小相关信息。对最新的热门手机做了一个简单的统计。如果你想知道任何其他的信息,可以给我留言。我已经把我的相关代码传导了github上。欢迎下载。另附其中还有关于LOL英雄数据统计的demo和看看豆网站的数据统计demo。
zol官方网站:http://mobile.zol.com.cn/
我的github:https://github.com/XiaoTommy/phpspider
相关爬虫代码
<?php
ini_set("memory_limit", "1024M");
require dirname(__FILE__).'/../core/init.php';/* Do NOT delete this comment */
/* 不要删除这段注释 */$configs = array('name' => 'ZOL','log_show' => false,'tasknum' => 1,//'save_running_state' => true,'domains' => array('detail.zol.com.cn'),'scan_urls' => array('http://detail.zol.com.cn/cell_phone_index/subcate57_list_1.html'),'list_url_regexes' => array("http://detail.zol.com.cn/cell_phone_index/subcate57_0_list_1_0_1_2_0_\d.html"),'content_url_regexes' => array("http://detail.zol.com.cn/cell_phone/index\d+.shtml",),'max_try' => 5,//'export' => array(//'type' => 'csv',//'file' => PATH_DATA.'/qiushibaike.csv',//),//'export' => array(//'type' => 'sql',//'file' => PATH_DATA.'/qiushibaike.sql',//'table' => 'content',//),'export' => array('type' => 'db','table' => 'zol',),'fields' => array(array('name' => "mobile_name",'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'page-title')]/h1",'required' => true,),array('name' => "mobile_intro",'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'page-title')]/div[contains(@class,'subtitle')]",'required' => true,),array('name' => "consult_price",'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'price price-normal')]//b[contains(@class,'price-type')]/text()",'required' => true,),array('name' => "showdate",'selector' => "//div[contains(@class,'config-section')]//span[contains(@class,'showdate')]",'required' => true,),array('name' => "score",'selector' => "//*[@id=\"totalPoint\"]//div[contains(@class,'score')]/strong",'required' => true,),array('name' => "screen_size",'selector' => "//span[contains(@class,'param-value')]",'required' => true,),array('name' => "brand",'selector' => "//div[contains(@class,'breadcrumb')]/a[3]",'required' => true,),),
);$spider = new phpspider($configs);$spider->start();
这里我只分析了一个数据。只是简单的做一下示例
下图表示了热门产品中评分与价格的关系。2000以下的热门产品尤为多,而且评分普遍在3.5-4.5之间徘徊。