Jump to content

WikiCharts: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
LeonWeber (talk | contribs)
No edit summary
Invadibot (talk | contribs)
m Bot: Applying protocol-relative URLs, https://meta.wikimedia.org?diff=5737506
 
(11 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''WikiCharts''' is a tool written by [[:de:User:LeonWeber]] for finding the most popular (most viewed) pages on a wiki. This is done using [[:w:JavaScript|javascript]] ([[:de:MediaWiki:Pagecounter.js|source code]]) for relaying the name of the viewed page to the [[m:toolserver|toolserver]]. The script is only executed with a probability of 1/''counter_factor'', to avoid overloading the toolserver. The data gathered using the javascript code is collected into a log file, IP addresses are not recorded. An entry of the log looks something like this:
'''WikiCharts''' was a tool written by [[:de:User:Leon|User:LeonWeber]] (and [[:de:User:Duesentrieb|User:Duesentrieb]]) for finding the most popular (most viewed) pages on a wiki. This was done using [[:w:JavaScript|javascript]] ([[:de:MediaWiki:Pagecounter.js|source code]]) for relaying the name of the viewed page to the [[m:toolserver|toolserver]]. The script is only executed with a probability of 1/''counter_factor'', to avoid overloading the toolserver. The data gathered using the javascript code is collected into a log file, IP addresses are not recorded. An entry of the log looks something like this:


[25/Aug/2006:10:54:35 +0000] "GET /index.png?ns=User&title=Anneke%20Wolf&factor=601&wiki=dewiki HTTP/1.1"
[25/Aug/2006:10:54:35 +0000] "GET /index.png?ns=User&title=Anneke%20Wolf&factor=601&wiki=dewiki HTTP/1.1"
Line 13: Line 13:
+----------+----------+---------------+------------+---------------+
+----------+----------+---------------+------------+---------------+


stats_hits is the number of occurences in the log multiplied by the factor, stats_samples is the number of occurences in the log.
stats_hits is the number of occurrences in the log multiplied by the factor, stats_samples is the number of occurrences in the log.


The insert script:
The insert script:
Line 58: Line 58:
This works, it's fast and I understand it :)
This works, it's fast and I understand it :)


The web page is created by a PHP-script, its source can be viewed here: [http://tools.wikimedia.de/~leon/stats/wikicharts/?code].
The web page is created by a PHP-script, its source could be viewed there: [http://tools.wikimedia.de/~leon/stats/wikicharts/?code].


I can enable this tool for any wiki that requests it. All you need is an admin on that wiki and about ten minutes on [[w:Wikipedia:Chat|IRC]] for us to determine the right setup.
I can enable this tool for any wiki that requests it. All you need is an admin on that wiki and about ten minutes on [[w:Wikipedia:Chat|IRC]] for us to determine the right setup.


== The live source ==
== List of wikis which have WikiCharts ==
*[http://tools.wikimedia.de/~leon/stats/wikicharts/?wiki=alswiki alswiki]
... could be viewed there: [http://tools.wikimedia.de/~leon/source/rawview.php/wikicharts]

*[http://tools.wikimedia.de/~leon/stats/wikicharts/ dewiki]
== Current successors ==
*[http://tools.wikimedia.de/~leon/stats/wikicharts/?wiki=enwiktionary enwiktionary]
* [//lists.wikimedia.org/pipermail/wikitech-l/2007-December/035435.html WikiStats] ''(raw data)''
*[http://tools.wikimedia.de/~leon/stats/wikicharts/?wiki=enwikiversity enwikiversity]
** [http://stats.grok.se/ Wikipedia article traffic stats] ''(per page)''
** [http://wikistics.falsikon.de/latest/ THEwikiStics] ''(toplists)''

[[Category:Statistics]]

Latest revision as of 18:57, 22 August 2013

WikiCharts was a tool written by User:LeonWeber (and User:Duesentrieb) for finding the most popular (most viewed) pages on a wiki. This was done using javascript (source code) for relaying the name of the viewed page to the toolserver. The script is only executed with a probability of 1/counter_factor, to avoid overloading the toolserver. The data gathered using the javascript code is collected into a log file, IP addresses are not recorded. An entry of the log looks something like this:

[25/Aug/2006:10:54:35 +0000] "GET /index.png?ns=User&title=Anneke%20Wolf&factor=601&wiki=dewiki HTTP/1.1"

On the toolserver (Zedler) I have an awk script which inserts the data from the log into a mysql database every hour.

The DB schema:

+----------+----------+---------------+------------+---------------+
| stats_id | stats_ns | stats_title   | stats_hits | stats_samples |
+----------+----------+---------------+------------+---------------+
|   177616 | User     | Anneke%20Wolf |        601 |             1 |
+----------+----------+---------------+------------+---------------+

stats_hits is the number of occurrences in the log multiplied by the factor, stats_samples is the number of occurrences in the log.

The insert script:

<?
$wikis = array(
        'dewiki' => '[1-9][0-9][0-9]',
        'enwiktionary' => '[1-9][0-9]',
        'enwikiversity' => '[1-9]'
        );
foreach( $wikis as $wiki => $factor_regexp )
{
        $cmd = "/usr/local/bin/gawk 
        'BEGIN { print 
                \"prepare pageviews_$wiki from \\\"INSERT INTO u_leon.wikicharts_cur_$wiki 
                SET stats_ns = ?, stats_title = ?, stats_hits = ?, stats_samples = 1 
                ON DUPLICATE KEY  UPDATE stats_hits=stats_hits+?, stats_samples=stats_samples+?;\\\";\\n\"
        }
        /\/index.png\?ns=.*\&title=.*\&factor=$factor_regexp\&wiki=$wiki/
        {a[$4]++}END{for(i in a) { 
                x=i;sub(/^.*\/index.png\?ns=/,\"\", x); 
                b=x; sub(/^.*&title=/,\"\", b);
                y=x;sub(/^.*&factor=/,\"\", y);
                sub(/&wiki=.*/, \"\", y);
                sub(/&title=.*/,\"\",x);
                sub(/&factor=.*/, \"\",b);
                sub(/\?.*/, \"\", x);
                gsub( \"/\\\"/\", \"%22\", x);
                gsub( \"\\\\\", \"%5C\", x);
                gsub( \"/\\\"/\", \"%22\", b);
                gsub( \"\\\\\", \"%5C\", b);
                print \"set @title = \\\"\"b\"\\\"; set @ns = \\\"\"x\"\\\";
                        set @hits = \"a[i]*y\"; set @factor = \"y\"; set @samples = \"a[i]\";
                        execute pageviews_$wiki using @ns, @title, @factor, @hits, @samples;\" 
        } }' /tmp/pgcount/access.log";
        $cmd .= " | /usr/local/mysql/bin/mysql";
        echo shell_exec( $cmd );
}
        $cmd = "cat /tmp/pgcount/access.log >> /home/leon/access.log; echo '' > /tmp/pgcount/access.log; echo '' > /tmp/pgcount/error.log;";
        echo shell_exec( $cmd );
?>

This works, it's fast and I understand it :)

The web page is created by a PHP-script, its source could be viewed there: [1].

I can enable this tool for any wiki that requests it. All you need is an admin on that wiki and about ten minutes on IRC for us to determine the right setup.

The live source[edit]

... could be viewed there: [2]

Current successors[edit]