Piwik现已改名为Matomo,这是一套国外著名的开源网站统计系统,类似于百度统计、Google Analytics等系统。最大的区别就是可以看到其中的源码,这正合我意。因为我一直对统计的系统很好奇,很想知道里面的运行原理是怎么样的,碰巧了解到有这么一个系统,因此马上尝试了一下。国内关于该系统的相关资料比较匮乏,大多是分享怎么安装的,并没有找到有关源码分析的文章。下面先对其做个初步的分析,后面会越来越详细,本人目前的职位是前端,因此会先分析脚本代码,而后再分析后台代码。
一、整体概况
Piwik的官网是matomo.org,使用PHP编写的,而我以前就是PHP工程师,因此看代码不会有障碍。目前最新版本是3.6,Github地址是matomo-org/matomo,打开地址将会看到下图中的内容(只截取了关键部分)。
打开js文件夹,里面的piwik.js就是本次要分析的脚本代码(如下图红色框出部分),内容比较多,有7838行代码。
先把系统的代码都下载下来,然后在本地配置虚拟目录,再开始安装。在安装的时候可以选择语言,该系统支持简体中文(注意下图中红色框出的部分)。系统会执行一些操作(注意看下图左边部分),包括检查当前环境能否安装、建立数据库等,按照提示一步一步来就行,比较简单,没啥难度。
安装完后就会自动跳转到后台界面(如下图所示),有图表,有分析,和常用的统计系统差不多。功能还没细看,只做了初步的了解,界面的友好度还是蛮不错的。
嵌到页面中的JavaScript代码与其它统计系统也类似,如下所示,也是用异步加载的方式,只是发送的请求地址没有伪装成图像地址(注意看标红的那句代码)。
<script type="text/javascript">var _paq = _paq || [];/* tracker methods like "setCustomDimension" should be called before "trackPageView" */_paq.push(['trackPageView']);_paq.push(['enableLinkTracking']);(function() {var u="//loc.piwik.cn/"; //自定义_paq.push(['setTrackerUrl', u+'piwik.php']);_paq.push(['setSiteId', '1']);var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];g.type='text/javascript'; g.async=true; g.defer=true; g.src='piwik.js'; s.parentNode.insertBefore(g,s);})(); </script>
在页面中嵌入这段脚本后,页面在刷新的时候,会有下图中的请求。在请求中带了一大堆的参数,在后面的内容中会对每个参数做释义。
二、脚本拆分
7000多行的脚本,当然不能一行一行的读,需要先拆分,拆成一个一个的模块,然后再逐个分析。脚本之所以这么大,是因为里面编写了大量代码来兼容各个版本的浏览器,这其中甚至包括IE4、Firefox1.0、Netscape等骨灰级的浏览器。接下来我把源码拆分成6个部分,分别是json、private、query、content-overlay、tracker和piwik,如下图红线框出的所示,piwik-all中包含了全部代码,便于对比。代码已上传到Github。
json.js是一个开源插件JSON3,为了兼容不支持JSON对象的浏览器而设计的,这里面的代码可以单独研究。private.js包含了一些用于全局的私有变量和私有函数,例如定义系统对象的别名、判断类型等。query.js中包含了很多操作HTML元素的方法,例如设置元素属性、查询某个CSS类的元素等,它类似于一个微型的jQuery库,不过有许多独特的功能。content-overlay.js有两部分组成,一部分包含内容追踪以及URL拼接等功能,另一部分是用来处理嵌套的页面,这里面具体没有细看。tracker.js中只有一个Tracker()函数,不过内容最多,有4700多行,主要的统计逻辑都在这里了。piwik.js中内容不多,包含一些初始化和插件的钩子等功能,钩子具体怎么运作的还没细看。
虽然分成了6部分,但是各部分的内容还是蛮多的,并且内容之间是有联系的,因此短时间的话,很难搞清楚其中所有的门道。我就挑了一点我个人感觉最重要的先做分析。
1)3种传送数据的方式
我原先只知道两种传送数据的方式,一种是通过Ajax的方式,另一种是创建一个Image对象,然后为其定义src属性,数据作为URL的参数传递给后台,这种方式很通用,并且还能完美解决跨域问题。我以前编写的一个性能参数搜集的插件primus.js,也是这么传送数据的。在阅读源码的时候,发现了第三种传送数据的方式,使用Navigator对象的sendBeacon()。
MDN上说:“此方法可用于通过HTTP将少量数据异步传输到Web服务器”。虽然这个方法有兼容问题,但我还是被震撼到了。它很适合统计的场景,MDN上又讲到:“统计代码会在页面关闭(window.onunload)之前向web服务器发送数据,但过早的发送数据可能错过收集数据的机会。然而, 要保证在页面关闭期间发送数据一直比较困难,因为浏览器通常会忽略在卸载事件中产生的异步请求 。在使用sendBeacon()方法后,能使浏览器在有机会时异步地向服务器发送数据,同时不会延迟页面的卸载或影响下一页的载入。这就解决了提交分析数据时的所有的问题:使它可靠,异步并且不会影响下一页面的加载,并且代码更简单”。下面是代码片段(注意看标红的那句代码),存在于tracker.js中。
function sendPostRequestViaSendBeacon(request) {var supportsSendBeacon ="object" === typeof navigatorAlias &&"function" === typeof navigatorAlias.sendBeacon &&"function" === typeof Blob;if (!supportsSendBeacon) {return false;}var headers = {type: "application/x-www-form-urlencoded; charset=UTF-8"};var success = false;try {var blob = new Blob([request], headers);success = navigatorAlias.sendBeacon(configTrackerUrl, blob);// returns true if the user agent is able to successfully queue the data for transfer,// Otherwise it returns false and we need to try the regular way} catch (e) {return false;}return success; }
2)参数释义
下面的方法(存在于tracker.js中)专门用于搜集页面中的统计数据,将它们拼接成指定链接的参数,而这条链接中的参数最终将会发送给服务器。
/*** Returns the URL to call piwik.php,* with the standard parameters (plugins, resolution, url, referrer, etc.).* Sends the pageview and browser settings with every request in case of race conditions.*/ function getRequest(request, customData, pluginMethod, currentEcommerceOrderTs) {var i,now = new Date(),nowTs = Math.round(now.getTime() / 1000),referralTs,referralUrl,referralUrlMaxLength = 1024,currentReferrerHostName,originalReferrerHostName,customVariablesCopy = customVariables,cookieSessionName = getCookieName("ses"),cookieReferrerName = getCookieName("ref"),cookieCustomVariablesName = getCookieName("cvar"),cookieSessionValue = getCookie(cookieSessionName),attributionCookie = loadReferrerAttributionCookie(),currentUrl = configCustomUrl || locationHrefAlias,campaignNameDetected,campaignKeywordDetected;if (configCookiesDisabled) {deleteCookies();}if (configDoNotTrack) {return "";}var cookieVisitorIdValues = getValuesFromVisitorIdCookie();if (!isDefined(currentEcommerceOrderTs)) {currentEcommerceOrderTs = "";}// send charset if document charset is not utf-8. sometimes encoding// of urls will be the same as this and not utf-8, which will cause problems// do not send charset if it is utf8 since it's assumed by default in Piwikvar charSet = documentAlias.characterSet || documentAlias.charset;if (!charSet || charSet.toLowerCase() === "utf-8") {charSet = null;}campaignNameDetected = attributionCookie[0];campaignKeywordDetected = attributionCookie[1];referralTs = attributionCookie[2];referralUrl = attributionCookie[3];if (!cookieSessionValue) {// cookie 'ses' was not found: we consider this the start of a 'session'// here we make sure that if 'ses' cookie is deleted few times within the visit// and so this code path is triggered many times for one visit,// we only increase visitCount once per Visit window (default 30min)var visitDuration = configSessionCookieTimeout / 1000;if (!cookieVisitorIdValues.lastVisitTs ||nowTs - cookieVisitorIdValues.lastVisitTs > visitDuration) {cookieVisitorIdValues.visitCount++;cookieVisitorIdValues.lastVisitTs = cookieVisitorIdValues.currentVisitTs;}// Detect the campaign information from the current URL// Only if campaign wasn't previously set// Or if it was set but we must attribute to the most recent one// Note: we are working on the currentUrl before purify() since we can parse the campaign parameters in the hash tagif (!configConversionAttributionFirstReferrer ||!campaignNameDetected.length) {for (i in configCampaignNameParameters) {if (Object.prototype.hasOwnProperty.call(configCampaignNameParameters, i)) {campaignNameDetected = getUrlParameter(currentUrl,configCampaignNameParameters[i]);if (campaignNameDetected.length) {break;}}}for (i in configCampaignKeywordParameters) {if (Object.prototype.hasOwnProperty.call(configCampaignKeywordParameters,i)) {campaignKeywordDetected = getUrlParameter(currentUrl,configCampaignKeywordParameters[i]);if (campaignKeywordDetected.length) {break;}}}}// Store the referrer URL and time in the cookie;// referral URL depends on the first or last referrer attributioncurrentReferrerHostName = getHostName(configReferrerUrl);originalReferrerHostName = referralUrl.length? getHostName(referralUrl): "";if (currentReferrerHostName.length && // there is a referrer!isSiteHostName(currentReferrerHostName) && // domain is not the current domain(!configConversionAttributionFirstReferrer || // attribute to last known referrer!originalReferrerHostName.length || // previously empty isSiteHostName(originalReferrerHostName))) {// previously set but in current domainreferralUrl = configReferrerUrl;}// Set the referral cookie if we have either a Referrer URL, or detected a Campaign (or both)if (referralUrl.length || campaignNameDetected.length) {referralTs = nowTs;attributionCookie = [campaignNameDetected,campaignKeywordDetected,referralTs,purify(referralUrl.slice(0, referralUrlMaxLength))];setCookie(cookieReferrerName,JSON_PIWIK.stringify(attributionCookie),configReferralCookieTimeout,configCookiePath,configCookieDomain);}}// build out the rest of the requestrequest +="&idsite=" +configTrackerSiteId +"&rec=1" +"&r=" +String(Math.random()).slice(2, 8) + // keep the string to a minimum"&h=" +now.getHours() +"&m=" +now.getMinutes() +"&s=" +now.getSeconds() +"&url=" +encodeWrapper(purify(currentUrl)) +(configReferrerUrl.length? "&urlref=" + encodeWrapper(purify(configReferrerUrl)): "") +(configUserId && configUserId.length? "&uid=" + encodeWrapper(configUserId): "") +"&_id=" +cookieVisitorIdValues.uuid +"&_idts=" +cookieVisitorIdValues.createTs +"&_idvc=" +cookieVisitorIdValues.visitCount +"&_idn=" +cookieVisitorIdValues.newVisitor + // currently unused (campaignNameDetected.length? "&_rcn=" + encodeWrapper(campaignNameDetected): "") +(campaignKeywordDetected.length? "&_rck=" + encodeWrapper(campaignKeywordDetected): "") +"&_refts=" +referralTs +"&_viewts=" +cookieVisitorIdValues.lastVisitTs +(String(cookieVisitorIdValues.lastEcommerceOrderTs).length? "&_ects=" + cookieVisitorIdValues.lastEcommerceOrderTs: "") +(String(referralUrl).length? "&_ref=" +encodeWrapper(purify(referralUrl.slice(0, referralUrlMaxLength))): "") +(charSet ? "&cs=" + encodeWrapper(charSet) : "") +"&send_image=0";// browser featuresfor (i in browserFeatures) {if (Object.prototype.hasOwnProperty.call(browserFeatures, i)) {request += "&" + i + "=" + browserFeatures[i];}}var customDimensionIdsAlreadyHandled = [];if (customData) {for (i in customData) {if (Object.prototype.hasOwnProperty.call(customData, i) &&/^dimension\d+$/.test(i)) {var index = i.replace("dimension", "");customDimensionIdsAlreadyHandled.push(parseInt(index, 10));customDimensionIdsAlreadyHandled.push(String(index));request += "&" + i + "=" + customData[i];delete customData[i];}}}if (customData && isObjectEmpty(customData)) {customData = null;// we deleted all keys from custom data }// custom dimensionsfor (i in customDimensions) {if (Object.prototype.hasOwnProperty.call(customDimensions, i)) {var isNotSetYet =-1 === indexOfArray(customDimensionIdsAlreadyHandled, i);if (isNotSetYet) {request += "&dimension" + i + "=" + customDimensions[i];}}}// custom dataif (customData) {request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(customData));} else if (configCustomData) {request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(configCustomData));}// Custom Variables, scope "page"function appendCustomVariablesToRequest(customVariables, parameterName) {var customVariablesStringified = JSON_PIWIK.stringify(customVariables);if (customVariablesStringified.length > 2) {return ("&" + parameterName + "=" + encodeWrapper(customVariablesStringified));}return "";}var sortedCustomVarPage = sortObjectByKeys(customVariablesPage);var sortedCustomVarEvent = sortObjectByKeys(customVariablesEvent);request += appendCustomVariablesToRequest(sortedCustomVarPage, "cvar");request += appendCustomVariablesToRequest(sortedCustomVarEvent, "e_cvar");// Custom Variables, scope "visit"if (customVariables) {request += appendCustomVariablesToRequest(customVariables, "_cvar");// Don't save deleted custom variables in the cookiefor (i in customVariablesCopy) {if (Object.prototype.hasOwnProperty.call(customVariablesCopy, i)) {if (customVariables[i][0] === "" || customVariables[i][1] === "") {delete customVariables[i];}}}if (configStoreCustomVariablesInCookie) {setCookie(cookieCustomVariablesName,JSON_PIWIK.stringify(customVariables),configSessionCookieTimeout,configCookiePath,configCookieDomain);}}// performance trackingif (configPerformanceTrackingEnabled) {if (configPerformanceGenerationTime) {request += ">_ms=" + configPerformanceGenerationTime;} else if (performanceAlias &&performanceAlias.timing &&performanceAlias.timing.requestStart &&performanceAlias.timing.responseEnd) {request +=">_ms=" +(performanceAlias.timing.responseEnd -performanceAlias.timing.requestStart);}}if (configIdPageView) {request += "&pv_id=" + configIdPageView;}// update cookiescookieVisitorIdValues.lastEcommerceOrderTs =isDefined(currentEcommerceOrderTs) && String(currentEcommerceOrderTs).length? currentEcommerceOrderTs: cookieVisitorIdValues.lastEcommerceOrderTs;setVisitorIdCookie(cookieVisitorIdValues);setSessionCookie();// tracker plugin hookrequest += executePluginMethod(pluginMethod, {tracker: trackerInstance,request: request});if (configAppendToTrackingUrl.length) {request += "&" + configAppendToTrackingUrl;}if (isFunction(configCustomRequestContentProcessing)) {request = configCustomRequestContentProcessing(request);}return request; }
统计代码每次都会传送数据,而每次请求都会带上一大串的参数,这些参数都是简写,下面做个简单说明(如有不正确的地方,欢迎指正),部分参数还没作出合适的解释,例如UUID的生成规则等。首先将这些参数分为两部分,第一部分如下所列:
1、idsite:网站ID
2、rec:1(写死)
3、r:随机码
4、h:当前小时
5、m:当前分钟
6、s:当前秒数
7、url:当前纯净地址,只留域名和协议
8、_id:UUID
9、_idts:访问的时间戳
10、_idvc:访问数
11、_idn:新访客(目前尚未使用)
12、_refts:访问来源的时间戳
13、_viewts:上一次访问的时间戳
14、cs:当前页面的字符编码
15、send_image:是否用图像请求方式传输数据
16、gt_ms:内容加载消耗的时间(响应结束时间减去请求开始时间)
17、pv_id:唯一性标识
再列出第二部分,用于统计浏览器的功能,通过Navigator对象的属性(mimeTypes、javaEnabled等)和Screen对象的属性(width与height)获得。
1、pdf:是否支持pdf文件类型
2、qt:是否支持QuickTime Player播放器
3、realp:是否支持RealPlayer播放器
4、wma:是否支持MPlayer播放器
5、dir:是否支持Macromedia Director
6、fla:是否支持Adobe FlashPlayer
7、java:是否激活了Java
8、gears:是否安装了Google Gears
9、ag:是否安装了Microsoft Silverlight
10、cookie:是否启用了Cookie
11、res:屏幕的宽和高(未正确计算高清显示器)
上面这11个参数的获取代码,可以参考下面这个方法(同样存在于tracker.js中),注意看代码中的pluginMap变量(已标红),它保存了多个MIME类型,用来检测是否安装或启用了指定的插件或功能。
/* * Browser features (plugins, resolution, cookies) */ function detectBrowserFeatures() {var i,mimeType,pluginMap = {// document typespdf: "application/pdf",// media playersqt: "video/quicktime",realp: "audio/x-pn-realaudio-plugin",wma: "application/x-mplayer2",// interactive multimediadir: "application/x-director",fla: "application/x-shockwave-flash",// RIAjava: "application/x-java-vm",gears: "application/x-googlegears",ag: "application/x-silverlight"};// detect browser features except IE < 11 (IE 11 user agent is no longer MSIE)if (!new RegExp("MSIE").test(navigatorAlias.userAgent)) {// general plugin detectionif (navigatorAlias.mimeTypes && navigatorAlias.mimeTypes.length) {for (i in pluginMap) {if (Object.prototype.hasOwnProperty.call(pluginMap, i)) {mimeType = navigatorAlias.mimeTypes[pluginMap[i]];browserFeatures[i] = mimeType && mimeType.enabledPlugin ? "1" : "0";}}}// Safari and Opera// IE6/IE7 navigator.javaEnabled can't be aliased, so test directly// on Edge navigator.javaEnabled() always returns `true`, so ignore itif (!new RegExp("Edge[ /](\\d+[\\.\\d]+)").test(navigatorAlias.userAgent) &&typeof navigator.javaEnabled !== "unknown" &&isDefined(navigatorAlias.javaEnabled) &&navigatorAlias.javaEnabled()) {browserFeatures.java = "1";}// Firefoxif (isFunction(windowAlias.GearsFactory)) {browserFeatures.gears = "1";}// other browser featuresbrowserFeatures.cookie = hasCookies();}var width = parseInt(screenAlias.width, 10);var height = parseInt(screenAlias.height, 10);browserFeatures.res = parseInt(width, 10) + "x" + parseInt(height, 10); }
除了上述20多个参数之外,在系统官网上可点击“Tracking HTTP API”查看到所有的参数,只不过都是英文的。
上面用到的代码已上传至https://github.com/pwstrick/mypiwik,如有需要,可自行下载。