User:Dcljr/Article counts: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
→‎Other possible article counting criteria: again, rethinking some things...
→‎Wiktionaries: starting complete rewrite of this section
Line 6: Line 6:


=== Wiktionaries ===
=== Wiktionaries ===
<div style="color:darkred">''Note:'' This information needs to be completely rewritten to reflect my latest understanding of how article counting is done.</div>


The table below can be sorted by any column (initial order of the rows is by "stats articles pct Δ"). You can also "hover" over any of the column headings in the table to get an explanation of that column (these mostly match the explanations in the key below, but the percents are explained in more detail). The wiki names are linked to the wikis themselves, while the dates are linked to subpages of this one containing the "raw statistics" upon which these numbers are based.
The table below can be sorted by any column (initial order of the rows is by "stats articles pct chg"). You can also "hover" over any of the column headings in the table to get an explanation of that column (these mostly match the explanations in the key below, but the percents are explained in more detail). The wiki names are linked to the wikis themselves, while the dates are linked to subpages of this one containing the "raw statistics" upon which these numbers are based<span style="color:darkred"> (which have not yet been updated to reflect all the stats now shown in the table)</span>.

<div style="margin-left:1em;font-size:smaller">
<div style="margin-left:1em;font-size:smaller">
''Table key:''
''Table key:''
* '''wiki''' – Wiki name
* '''wiki''' – Wiki name, linked to the wiki itself
* <span style="color:black; background-color:#ffa">'''date before'''</span> / <span style="color:black; background-color:#cfc">'''date after'''</span> – Dates dumps were made before / after the May 10th running of the "updateArticleCount.php" script (all other "before" and "after" cells are colored using the same scheme); note that dumps that happened on May 10th [e.g., Pashto] are included in the table as "before" or "after" only if their "stats articles" count [see next two items] was very close to a dump made on one side of May 10th and very different from the dump made on the other side of May 10th)
* <span style="color:black; background-color:#ffa">'''date before'''</span> / <span style="color:black; background-color:#cfc">'''date after'''</span> – Dates dumps were made before / after the May 10th running of the "updateArticleCount.php" script (linked to subpages showing "raw" stats); all other "before" and "after" cells are colored using the same scheme; note that dumps that happened on May 10th [e.g., Pashto] are classified as "before" or "after" based on the similarity of their "stats articles" count [see next two items] to dumps made on either side of May 10th(in most cases it was very obvious whether "updateArticleCount.php" had already been run by the time the May 10th dump was made)
* '''stats pages''' – Total page count given in "[[mw:Manual:Site_stats table|site_stats.sql]]" dump, which matches the on-wiki count of "Pages" in [[Special:Statistics]] and the one given by {{[[mw:Help:Magic words#Statistics|NUMBEROFPAGES]]}} on the wiki itself
* '''stats pages''' – Total page count given in "[[mw:Manual:Site_stats table|site_stats.sql]]" dump, which matches the on-wiki count of "Pages" in [[Special:Statistics]] and the one given by {{[[mw:Help:Magic words#Statistics|NUMBEROFPAGES]]}} on the wiki itself
* '''stats articles''' – Article count given in "site_stats.sql" dump, which matches the on-wiki count of "Content pages" in [[Special:Statistics]] and the one given by {{[[mw:Help:Magic words#Statistics|NUMBEROFARTICLES]]}} on the wiki itself
* '''stats articles''' – Article count given in the "site_stats.sql" dump, which matches the on-wiki count of "Content pages" in [[Special:Statistics]] and the one given by {{[[mw:Help:Magic words#Statistics|NUMBEROFARTICLES]]}} on the wiki itself
* '''% of pgs''' – Percent of pages that were non-redirects
* '''(%)''' – Unless otherwise explained below, the percents are based on the numbers in the previous two columns (e.g., the first one is the percent of "stats pages" that were "stats articles")
* '''stats articles Δ''' – Change in "stats articles" count from before to after May 10th
* '''stats articles chg''' – Change in "stats articles" count from before to after May 10th
* '''stats articles pct Δ''' – Percent change in "stats articles" count from before to after May 10th.
* '''stats articles pct chg''' – Percent change in "stats articles" count from before to after May 10th (as percent of count before).
* '''dumped pages''' – Total number of pages seen in "[[mw:Manual:Page table|page.sql]]" dump
* '''dumped pages''' – Total number of pages seen in "[[mw:Manual:Page table|page.sql]]" dump (note: this and all of the remaining stats are for after May 10th)
* '''ns0 pages''' – Number of pages in "page.sql" marked as being in the main namespace (which, BTW, matches the number of titles seen in the "[[mw:Special:Search/all-titles-in-ns0|all-titles-in-ns0]]" dump)
* '''ns0 pages''' – Number of pages in "page.sql" marked as being in the main namespace (the only content namespace for all Wiktionaries)
* '''ns0 non-redirs''' – Number of main-namespace pages in "page.sql" marked as not being redirects (in either that dump or "[[mw:Manual:Redirect table|redirect.sql]]")
* '''ns0 non-redirs''' – Number of main-namespace pages in "page.sql" marked as not being redirects (in either that dump or "[[mw:Manual:Redirect table|redirect.sql]]")
* '''std. article count''' – Article (i.e., entry) count based on currently used criteria: "non-redirect with at least one <nowiki>[[wikilink]]</nowiki> of any type"{{fn|1}} (this and the other "article count" percents are all out of the "ns0 non-redirs" [since all the wikis listed here consider only the main namespace as "content"])
* '''ns0 articles''' – Number of articles (i.e., entries) seen in "[[mw:Manual:Page table|page.sql]]" dump according to the current definition of an article: "content-namespace non-redirect containing at least one <nowiki>[[wikilink]]</nowiki> to another page on the same wiki"
* '''% stats off''' – Percent difference between "stats articles" count and "std. article count" (as percent of "std. article count")
* '''% of nrs''' – Percent of non-redirects that counted as articles
* '''off stats''' – "ns0 articles" count minus "stats articles after" count (should be zero, since they're counting the same thing)
* '''conserv. article count''' – Article count based on more conservative criteria: "non-redirect linked to another page on the same wiki or placed in a category" (percent is out of "ns0 non-redirs")
* '''liberal article count''' – Article count based on more liberal criteria: "non-redirect containing at least one of the following: link to another page on the same wiki, image/file link, category link, interlanguage link, interwiki link, or template call" [see note below for further explanation/context] (percent is out of "ns0 non-redirs")
* '''alt article count''' – Number of articles based on alternate definition: "content-namespace non-redirect containing at least one wikilink to another page on the same wiki, image/file link, category link, interlanguage link, or interwiki link"{{fn|1}} (this and all other "article count" percents are all out of the "ns0 non-redirs" count)
* '''pct off stats''' – Percent difference between "alt article count" and "stats articles after" count, as percent of "stats articles after"
* '''altern. article count''' – Article count based on Yet Another set of criteria: "non-redirect linked to another page on same wiki, placed in a category, or containing an image/file or a template call" (percent is out of "ns0 non-redirs")
* '''page-cat article count''' – Article count based on: "content-namespace non-redirect linked to another page on the same wiki or placed in a category"
* '''wl-templ article count''' – Article count based on: "content-namespace non-redirect containing at least one wikilink of any type (see "alt. article count") or <nowiki>{{template call}}</nowiki>"


{{fnb|1}} Note that the "wikilink" in the current method of counting articles can apparently be anything starting with the string "<nowiki>[[</nowiki>", including a regular <nowiki>[[link]]</nowiki> to another page on the same wiki, a <nowiki>[[Category:]]</nowiki> link, an <nowiki>[[Image:]]</nowiki> (or <nowiki>[[File:]]</nowiki>) link, an interlanguage link (e.g., <nowiki>[[de:]]</nowiki>), an interwiki link (e.g., <nowiki>[[species:]]</nowiki>), or even a "hidden" link inside of an <nowiki><!-- HTML comment --></nowiki> or (perhaps?) one "deactivated" by <nowiki><nowiki></nowiki> tags. '''This analysis counts only "real" wikilinks''' (i.e., not "hidden" or "deactivated" links). To accomplish this, the dumps "[[mw:Manual:Pagelinks table|pagelinks.sql]]", "[[mw:Manual:Categorylinks table|categorylinks.sql]]", "[[mw:Manual:Imagelinks table|imagelinks.sql]]", "[[mw:Manual:Langlinks table|langlinks.sql]]", "[[mw:Manual:Iwlinks table|iwlinks.sql]]", and (only for the "liberal" and "alternate" counting methods) "[[mw:Manual:Templatelinks table|templatelinks.sql]]" were also examined.
{{fnb|1}} This definition is similar, but not identical, to the old method of counting articles, which included any page whose wikitext contained the string "<nowiki>[[</nowiki>" a very loose definition including regular <nowiki>[[links]]</nowiki> to other pages on the same wiki, <nowiki>[[Category:]]</nowiki> links, <nowiki>[[Image:]]</nowiki> (or <nowiki>[[File:]]</nowiki>) links, interlanguage links (e.g., <nowiki>[[de:]]</nowiki>), and interwiki links (e.g., <nowiki>[[species:]]</nowiki>), as well as "hidden" links inside of <nowiki><!-- HTML comments --></nowiki> and links "deactivated" by <nowiki><nowiki></nowiki> tags (indeed, even text containing "<nowiki>[[</nowiki>" but not "<nowiki>]]</nowiki>" was included). The new method of counting articles only counts page links. Because it is based on database dumps and not the raw wikitext of each page, '''this analysis counts only the 5 types of "real" wikilinks''' (not "hidden", "deactivated", or "incomplete" links). The different kinds of links have been accounted for by parsing the dumps "[[mw:Manual:Pagelinks table|pagelinks.sql]]", "[[mw:Manual:Categorylinks table|categorylinks.sql]]", "[[mw:Manual:Imagelinks table|imagelinks.sql]]", "[[mw:Manual:Langlinks table|langlinks.sql]]", "[[mw:Manual:Iwlinks table|iwlinks.sql]]", and (only for the so-called "liberal" method) "[[mw:Manual:Templatelinks table|templatelinks.sql]]".
</div>
</div>


{| class="wikitable sortable collapsible collapsed" style="text-align:right"
{| class="wikitable sortable collapsible collapsed" style="text-align:right"
|+ Wiktionary statistics before and after May 10
|+ style="background-color:lightgray; border:solid 1pt darkgray" | Wiktionary statistics before and after running of "updateArticleCount.php" script on May 10
|-
|-
! id="wN" style="color:black; background-color:#ccc" | <span title="Wiki name">wiki</span>
! id="wN" style="color:black; background-color:#ccb" | <span title="Wiki name">wiki</span>
! id="dB" style="color:black; background-color:#ffa" | <span title="Date of last dump before May 10">date before</span>
! id="dB" style="color:black; background-color:#ffa" | <span title="Date of last dump before May 10 article-count update">date before</span>
! id="dA" style="color:black; background-color:#cfc" | <span title="Date of first dump after May 10">date after</span>
! id="dA" style="color:black; background-color:#cfc" | <span title="Date of first dump after May 10 article-count update">date after</span>
! id="spB" style="color:black; background-color:#ffa" | <span title="On-wiki total page count (={NUMBEROFPAGES}) before May 10">stats pages before</span>
! id="spB" style="color:black; background-color:#ffa" | <span title="On-wiki total page count (={NUMBEROFPAGES}) before May 10">stats pages before</span>
! id="saB" style="color:black; background-color:#ffa" | <span title="On-wiki article count (={NUMBEROFARTICLES}) before May 10">stats articles before</span>
! id="saB" style="color:black; background-color:#ffa" | <span title="On-wiki article count (={NUMBEROFARTICLES}) before May 10">stats articles before</span>
! id="sapB" style="color:black; background-color:#ffa" | <span title="Percent of pages that were articles, according to on-wiki stats, before May 10">(%)</span>
! id="sapB" style="color:black; background-color:#ffa" | <span title="Percent of pages that were articles, according to on-wiki stats, before May 10">% of pgs</span>
! id="spA" style="color:black; background-color:#cfc" | <span title="On-wiki total page count (={NUMBEROFPAGES}) after May 10">stats pages after</span>
! id="spA" style="color:black; background-color:#cfc" | <span title="On-wiki total page count (={NUMBEROFPAGES}) after May 10">stats pages after</span>
! id="saA" style="color:black; background-color:#cfc" | <span title="On-wiki article count (={NUMBEROFARTICLES}) after May 10">stats articles after</span>
! id="saA" style="color:black; background-color:#cfc" | <span title="On-wiki article count (={NUMBEROFARTICLES}) after May 10">stats articles after</span>
! id="sapA" style="color:black; background-color:#cfc" | <span title="Percent of pages that were articles, according to on-wiki stats, after May 10">(%)</span>
! id="sapA" style="color:black; background-color:#cfc" | <span title="Percent of pages that were articles, according to on-wiki stats, after May 10">% of pgs</span>
! id="saC" style="color:black; background-color:#ccc" | <span title="Change in 'stats articles' count from before to after May 10">stats articles Δ</span>
! id="saC" style="color:black; background-color:#ccb" | <span title="Change in 'stats articles' count from before to after May 10">stats articles chg</span>
! id="sapC" style="color:black; background-color:#ccc" | <span title="Percent change in 'stats articles' count from before to after May 10">stats articles pct Δ</span>
! id="sapC" style="color:black; background-color:#ccb" | <span title="Percent change in 'stats articles' count from before to after May 10">stats articles pct chg</span>
! id="dpB" style="color:black; background-color:#ffa" | <span title="Number of pages seen in 'page.sql' dump before May 10">dumped pages before</span>
! id="dpA" style="color:black; background-color:#cfc" | <span title="Number of pages seen in 'page.sql' dump after May 10">dumped pages</span>
! id="n0pB" style="color:black; background-color:#ffa" | <span title="Number of pages in 'page.sql' dump marked as being in main namespace, before May 10">ns0 pages before</span>
! id="n0pA" style="color:black; background-color:#cfc" | <span title="Number of pages in 'page.sql' dump marked as being in main namespace, after May 10">ns0 pages</span>
! id="n0nrB" style="color:black; background-color:#ffa" | <span title="Number of pages in 'page.sql' dump marked as not being redirects, either in that dump or in 'redirect.sql', before May 10">ns0 non-redirs before</span>
! id="n0nrA" style="color:black; background-color:#cfc" | <span title="Number of pages in 'page.sql' dump marked as not being redirects, either in that dump or in 'redirect.sql', after May 10">ns0 non-redirs</span>
! id="n0nrpB" style="color:black; background-color:#ffa" | <span title="Percent of pages in main namespace that were not redirects, before May 10">(%)</span>
! id="n0nrpA" style="color:black; background-color:#cfc" | <span title="Percent of pages in main namespace that were not redirects, after May 10">% of pgs</span>
! id="dpA" style="color:black; background-color:#cfc" | <span title="Number of pages seen in 'page.sql' dump after May 10">dumped pages after</span>
! id="n0aA" style="color:black; background-color:#cfc" | <span title="Count of articles by current definition 'non-redirect with wikilink to page on same wiki', using 'pagelinks.sql' dump, after May 10">ns0 articles</span>
! id="n0pA" style="color:black; background-color:#cfc" | <span title="Number of pages in 'page.sql' dump marked as being in main namespace, after May 10">ns0 pages after</span>
! id="n0apA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by current definition, after May 10">% of nrs</span>
! id="n0nrA" style="color:black; background-color:#cfc" | <span title="Number of pages in 'page.sql' dump marked as not being redirects, either in that dump or in 'redirect.sql', after May 10">ns0 non-redirs after</span>
! id="n0aoA" style="color:black; background-color:#cfc" | <span title="Difference between 'ns0 articles' count and 'stats articles after' count (should be zero), after May 10">off stats</span>
! id="n0nrpA" style="color:black; background-color:#cfc" | <span title="Percent of pages in main namespace that were not redirects, after May 10">(%)</span>
! id="aaA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by alternate definition 'non-redirect with wikilink of any type', based on relevant dumps (see Note above) after May 10">alt article count</span>
! id="asB" style="color:black; background-color:#ffa" | <span title="Count of articles in main namespace by current criteria 'non-redirect with wikilink of any type', based on relevant dumps (see Note above) before May 10">std. article count before</span>
! id="aapA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by alternate defintion, after May 10">% of nrs</span>
! id="aspB" style="color:black; background-color:#ffa" | <span title="Percent of non-redirects in main namespace that qualified as articles by current criteria, before May 10">(%)</span>
! id="aapdA" style="color:black; background-color:#cfc" | <span title="Percent difference between 'alt article count' and 'stats articles after' count (as percent of 'stats articles after'), after May 10">pct off stats</span>
! id="sopB" style="color:black; background-color:#ffa" | <span title="Percent difference between 'stats articles' count and 'std. article count' (as percent of 'std. article count'), before May 10">% stats off before</span>
! id="apcA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by definition 'non-redirect with page wikilink or category link', based on relevant dumps (see Note above) after May 10">page-cat article count</span>
! id="asA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by current criteria 'non-redirect with wikilink of any type', based on relevant dumps (see Note above) after May 10">std. article count after</span>
! id="apcpA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by 'page-cat' definition, after May 10">% of nrs</span>
! id="aspA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by current criteria, after May 10">(%)</span>
! id="apcpdA" style="color:black; background-color:#cfc" | <span title="Percent difference between 'page-cat article count' and 'stats articles after' (as percent of 'stats articles after'), after May 10">pct off stats</span>
! id="sopA" style="color:black; background-color:#cfc" | <span title="Percent difference between 'stats articles' count and 'std. article count' (as percent of 'std. article count'), after May 10">% stats off after</span>
! id="awtA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by definition 'non-redirect with wikilink of any type or template call', based on relevant dumps (see Note above) after May 10">wl-templ article count</span>
! id="acB" style="color:black; background-color:#ffa" | <span title="Count of articles in main namespace by more conservative criteria 'non-redirect with page wikilink or category', based on dumps before May 10">conserv. article count before</span>
! id="awtpA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by 'wl-templ' definition, after May 10">% of nrs</span>
! id="acpB" style="color:black; background-color:#ffa" | <span title="Percent of non-redirects in main namespace that qualified as articles by more conservative criteria, before May 10">(%)</span>
! id="awtpdA" style="color:black; background-color:#cfc" | <span title="Percent difference between 'wl-templ article count' and 'stats articles after' (as percent of 'stats articles after'), after May 10">pct off stats</span>
! id="acA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by more conservative criteria 'non-redirect with page wikilink or category', based on dumps after May 10">conserv. article count after</span>
! id="acpA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by more conservative criteria, after May 10">(%)</span>
! id="alB" style="color:black; background-color:#ffa" | <span title="Count of articles in main namespace by more liberal criteria 'non-redirect with any wikilink or template call', based on dumps before May 10">liberal article count before</span>
! id="alpB" style="color:black; background-color:#ffa" | <span title="Percent of non-redirects in main namespace that qualified as articles by more liberal criteria, before May 10">(%)</span>
! id="alA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by more liberal criteria 'non-redirect with any wikilink or template call', based on dumps after May 10">liberal article count after</span>
! id="alpA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by more liberal criteria, after May 10">(%)</span>
! id="aaB" style="color:black; background-color:#ffa" | <span title="Count of articles in main namespace by alternative criteria 'non-redirect with page, category, or image wikilink, or template call', based on dumps before May 10">alternate article count before</span>
! id="aapB" style="color:black; background-color:#ffa" | <span title="Percent of non-redirects in main namespace that qualified as articles by alternative criteria, before May 10">(%)</span>
! id="aaA" style="color:black; background-color:#cfc" | <span title="Count of articles in main namespace by alternative criteria 'non-redirect with page, category, or image wikilink, or template call', based on dumps after May 10">alternate article count after</span>
! id="aapA" style="color:black; background-color:#cfc" | <span title="Percent of non-redirects in main namespace that qualified as articles by alternative criteria, after May 10">(%)</span>
|-
|-
| headers="wN" | [[wikt:ne:|Nepali Wiktionary]]
| headers="wN" | [[wikt:ne:|Nepali Wiktionary]]
Line 82: Line 74:
| headers="saC" | −{{formatnum:4748}}
| headers="saC" | −{{formatnum:4748}}
| headers="sapC" | −98%
| headers="sapC" | −98%
| headers="dpB" | {{formatnum:5675}}
| headers="n0pB" | {{formatnum:4955}}
| headers="n0nrB" | {{formatnum:4937}}
| headers="n0nrpB" | 99.6%
| headers="dpA" | {{formatnum:5680}}
| headers="dpA" | {{formatnum:5680}}
| headers="n0pA" | {{formatnum:4955}}
| headers="n0pA" | {{formatnum:4955}}
| headers="n0nrA" | {{formatnum:4937}}
| headers="n0nrA" | {{formatnum:4937}}
| headers="n0nrpA" | 99.6%
| headers="n0nrpA" | 99.6%
| headers="asB" | {{formatnum:4833}}
| headers="n0aA" |
| headers="aspB" | 98%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | −0.00%
| headers="n0aoA" |
| headers="asA" | {{formatnum:4833}}
| headers="aaA" | {{formatnum:4833}}
| headers="aspA" | 98%
| headers="aapA" | 98%
| headers="sopA" style="background-color:#fdd" | −98%
| headers="aapdA" |
| headers="acB" | {{formatnum:4800}}
| headers="apcA" | {{formatnum:4800}}
| headers="acpB" | 97%
| headers="apcpA" | 97%
| headers="acA" | {{formatnum:4800}}
| headers="apcpdA" |
| headers="acpA" | 97%
| headers="awtA" | {{formatnum:4840}}
| headers="alB" | {{formatnum:4840}}
| headers="awtpA" | 98%
| headers="alpB" | 98%
| headers="awtpdA" |
| headers="alA" | {{formatnum:4840}}
| headers="alpA" | 98%
| headers="aaB" | {{formatnum:4808}}
| headers="aapB" | 97%
| headers="aaA" | {{formatnum:4808}}
| headers="aapA" | 97%
|-
|-
| headers="wN" | [[wikt:ps:|Pashto Wiktionary]]
| headers="wN" | [[wikt:ps:|Pashto Wiktionary]]
Line 120: Line 102:
| headers="saC" | −{{formatnum:3882}}
| headers="saC" | −{{formatnum:3882}}
| headers="sapC" | −95%
| headers="sapC" | −95%
| headers="dpB" | {{formatnum:9040}}
| headers="n0pB" | {{formatnum:8175}}
| headers="n0nrB" | {{formatnum:7640}}
| headers="n0nrpB" | 93%
| headers="dpA" | {{formatnum:9045}}
| headers="dpA" | {{formatnum:9045}}
| headers="n0pA" | {{formatnum:8179}}
| headers="n0pA" | {{formatnum:8179}}
| headers="n0nrA" | {{formatnum:7644}}
| headers="n0nrA" | {{formatnum:7644}}
| headers="n0nrpA" | 93%
| headers="n0nrpA" | 93%
| headers="asB" | {{formatnum:6074}}
| headers="n0aA" |
| headers="aspB" | 80%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | −33%
| headers="n0aoA" |
| headers="asA" | {{formatnum:6076}}
| headers="aaA" | {{formatnum:6076}}
| headers="aspA" | 79%
| headers="aapA" | 79%
| headers="sopA" style="background-color:#fdd" | −97%
| headers="aapdA" |
| headers="acB" | {{formatnum:1347}}
| headers="apcA" | {{formatnum:1347}}
| headers="acpB" | 18%
| headers="apcpA" | 18%
| headers="acA" | {{formatnum:1347}}
| headers="apcpdA" |
| headers="acpA" | 18%
| headers="awtA" | {{formatnum:6400}}
| headers="alB" | {{formatnum:6398}}
| headers="awtpA" | 84%
| headers="alpB" | 84%
| headers="awtpdA" |
| headers="alA" | {{formatnum:6400}}
| headers="alpA" | 84%
| headers="aaB" | {{formatnum:4049}}
| headers="aapB" | 53%
| headers="aaA" | {{formatnum:4049}}
| headers="aapA" | 53%
|-
|-
| headers="wN" | [[wikt:ts:|Tsonga Wiktionary]]
| headers="wN" | [[wikt:ts:|Tsonga Wiktionary]]
Line 158: Line 130:
| headers="saC" | −344
| headers="saC" | −344
| headers="sapC" | −95%
| headers="sapC" | −95%
| headers="dpB" | 796
| headers="n0pB" | 347
| headers="n0nrB" | 345
| headers="n0nrpB" | 99.4%
| headers="dpA" | 798
| headers="dpA" | 798
| headers="n0pA" | 347
| headers="n0pA" | 347
| headers="n0nrA" | 345
| headers="n0nrA" | 345
| headers="n0nrpA" | 99.4%
| headers="n0nrpA" | 99.4%
| headers="asB" | 88
| headers="n0aA" |
| headers="aspB" | 26%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +313%
| headers="n0aoA" |
| headers="asA" | 88
| headers="aaA" | 88
| headers="aspA" | 26%
| headers="aapA" | 26%
| headers="sopA" style="background-color:#fdd" | −78%
| headers="aapdA" |
| headers="acB" | 19
| headers="apcA" | 19
| headers="acpB" | 6%
| headers="apcpA" | 6%
| headers="acA" | 19
| headers="apcpdA" |
| headers="acpA" | 6%
| headers="awtA" | 88
| headers="alB" | 88
| headers="awtpA" | 26%
| headers="alpB" | 26%
| headers="awtpdA" |
| headers="alA" | 88
| headers="alpA" | 26%
| headers="aaB" | 19
| headers="aapB" | 6%
| headers="aaA" | 19
| headers="aapA" | 6%
|-
|-
| headers="wN" | [[wikt:hi:|Hindi Wiktionary]]
| headers="wN" | [[wikt:hi:|Hindi Wiktionary]]
Line 196: Line 158:
| headers="saC" | −{{formatnum:99171}}
| headers="saC" | −{{formatnum:99171}}
| headers="sapC" | −94%
| headers="sapC" | −94%
| headers="dpB" | {{formatnum:119877}}
| headers="n0pB" | {{formatnum:112386}}
| headers="n0nrB" | {{formatnum:112199}}
| headers="n0nrpB" | 99.8%
| headers="dpA" | {{formatnum:119881}}
| headers="dpA" | {{formatnum:119881}}
| headers="n0pA" | {{formatnum:112389}}
| headers="n0pA" | {{formatnum:112389}}
| headers="n0nrA" | {{formatnum:112202}}
| headers="n0nrA" | {{formatnum:112202}}
| headers="n0nrpA" | 99.8%
| headers="n0nrpA" | 99.8%
| headers="asB" | {{formatnum:105380}}
| headers="n0aA" |
| headers="aspB" | 94%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | −0.00%
| headers="n0aoA" |
| headers="asA" | {{formatnum:105384}}
| headers="aaA" | {{formatnum:105384}}
| headers="aspA" | 94%
| headers="sopA" style="background-color:#fdd" | −94%
| headers="acB" | {{formatnum:105302}}
| headers="acpB" | 94%
| headers="acA" | {{formatnum:105304}}
| headers="acpA" | 94%
| headers="alB" | {{formatnum:105380}}
| headers="alpB" | 94%
| headers="alA" | {{formatnum:105384}}
| headers="alpA" | 94%
| headers="aaB" | {{formatnum:105306}}
| headers="aapB" | 94%
| headers="aaA" | {{formatnum:105308}}
| headers="aapA" | 94%
| headers="aapA" | 94%
| headers="aapdA" |
| headers="apcA" | {{formatnum:105304}}
| headers="apcpA" | 94%
| headers="apcpdA" |
| headers="awtA" | {{formatnum:105384}}
| headers="awtpA" | 94%
| headers="awtpdA" |
|-
|-
| headers="wN" | [[wikt:ti:|Tigrinya Wiktionary]]
| headers="wN" | [[wikt:ti:|Tigrinya Wiktionary]]
Line 234: Line 186:
| headers="saC" | −673
| headers="saC" | −673
| headers="sapC" | −86%
| headers="sapC" | −86%
| headers="dpB" | {{formatnum:1039}}
| headers="n0pB" | 564
| headers="n0nrB" | 561
| headers="n0nrpB" | 99.5%
| headers="dpA" | {{formatnum:1040}}
| headers="dpA" | {{formatnum:1040}}
| headers="n0pA" | 564
| headers="n0pA" | 564
| headers="n0nrA" | 561
| headers="n0nrA" | 561
| headers="n0nrpA" | 99.5%
| headers="n0nrpA" | 99.5%
| headers="asB" | 388
| headers="n0aA" |
| headers="aspB" | 69%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +102%
| headers="n0aoA" |
| headers="asA" | 388
| headers="aaA" | 388
| headers="aspA" | 69%
| headers="aapA" | 69%
| headers="sopA" style="background-color:#fdd" | +45%
| headers="aapdA" |
| headers="acB" | 198
| headers="apcA" | 198
| headers="acpB" | 35%
| headers="apcpA" | 35%
| headers="acA" | 198
| headers="apcpdA" |
| headers="acpA" | 35%
| headers="awtA" | 388
| headers="alB" | 388
| headers="awtpA" | 69%
| headers="alpB" | 69%
| headers="awtpdA" |
| headers="alA" | 388
| headers="alpA" | 69%
| headers="aaB" | 198
| headers="aapB" | 35%
| headers="aaA" | 198
| headers="aapA" | 35%
|-
|-
| headers="wN" | [[wikt:ky:|Kyrgyz Wiktionary]]
| headers="wN" | [[wikt:ky:|Kyrgyz Wiktionary]]
Line 272: Line 214:
| headers="saC" | −{{formatnum:2515}}
| headers="saC" | −{{formatnum:2515}}
| headers="sapC" | −75%
| headers="sapC" | −75%
| headers="dpB" | {{formatnum:3909}}
| headers="n0pB" | {{formatnum:3271}}
| headers="n0nrB" | {{formatnum:3250}}
| headers="n0nrpB" | 99.4%
| headers="dpA" | {{formatnum:3929}}
| headers="dpA" | {{formatnum:3929}}
| headers="n0pA" | {{formatnum:3276}}
| headers="n0pA" | {{formatnum:3276}}
| headers="n0nrA" | {{formatnum:3255}}
| headers="n0nrA" | {{formatnum:3255}}
| headers="n0nrpA" | 99.4%
| headers="n0nrpA" | 99.4%
| headers="asB" | {{formatnum:3207}}
| headers="n0aA" |
| headers="aspB" | 99%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +5%
| headers="n0aoA" |
| headers="asA" | {{formatnum:3209}}
| headers="aaA" | {{formatnum:3209}}
| headers="aspA" | 99%
| headers="aapA" | 99%
| headers="sopA" style="background-color:#fdd" | −73%
| headers="aapdA" |
| headers="acB" | {{formatnum:3186}}
| headers="apcA" | {{formatnum:3188}}
| headers="acpB" | 98%
| headers="apcpA" | 98%
| headers="acA" | {{formatnum:3188}}
| headers="apcpdA" |
| headers="acpA" | 98%
| headers="awtA" | {{formatnum:3210}}
| headers="alB" | {{formatnum:3208}}
| headers="awtpA" | 99%
| headers="alpB" | 99%
| headers="awtpdA" |
| headers="alA" | {{formatnum:3210}}
| headers="alpA" | 99%
| headers="aaB" | {{formatnum:3191}}
| headers="aapB" | 98%
| headers="aaA" | {{formatnum:3193}}
| headers="aapA" | 98%
|-
|-
| headers="wN" | [[wikt:bn:|Bengali Wiktionary]]
| headers="wN" | [[wikt:bn:|Bengali Wiktionary]]
Line 310: Line 242:
| headers="saC" | −477
| headers="saC" | −477
| headers="sapC" | −67%
| headers="sapC" | −67%
| headers="dpB" | {{formatnum:2253}}
| headers="n0pB" | 838
| headers="n0nrB" | 800
| headers="n0nrpB" | 95%
| headers="dpA" | {{formatnum:2261}}
| headers="dpA" | {{formatnum:2261}}
| headers="n0pA" | 840
| headers="n0pA" | 840
| headers="n0nrA" | 802
| headers="n0nrA" | 802
| headers="n0nrpA" | 95%
| headers="n0nrpA" | 95%
| headers="asB" | 721
| headers="n0aA" |
| headers="aspB" | 90%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | −1%
| headers="n0aoA" |
| headers="asA" | 721
| headers="aaA" | 721
| headers="aspA" | 90%
| headers="aapA" | 90%
| headers="sopA" style="background-color:#fdd" | −67%
| headers="aapdA" |
| headers="acB" | 699
| headers="apcA" | 699
| headers="acpB" | 87%
| headers="apcpA" | 87%
| headers="acA" | 699
| headers="apcpdA" |
| headers="acpA" | 87%
| headers="awtA" | 722
| headers="alB" | 722
| headers="awtpA" | 90%
| headers="alpB" | 90%
| headers="awtpdA" |
| headers="alA" | 722
| headers="alpA" | 90%
| headers="aaB" | 700
| headers="aapB" | 88%
| headers="aaA" | 700
| headers="aapA" | 87%
|-
|-
| headers="wN" | [[wikt:te:|Telugu Wiktionary]]
| headers="wN" | [[wikt:te:|Telugu Wiktionary]]
Line 348: Line 270:
| headers="saC" | −{{formatnum:26611}}
| headers="saC" | −{{formatnum:26611}}
| headers="sapC" | −54%
| headers="sapC" | −54%
| headers="dpB" | {{formatnum:58128}}
| headers="n0pB" | {{formatnum:54469}}
| headers="n0nrB" | {{formatnum:47959}}
| headers="n0nrpB" | 88%
| headers="dpA" | {{formatnum:58326}}
| headers="dpA" | {{formatnum:58326}}
| headers="n0pA" | {{formatnum:54527}}
| headers="n0pA" | {{formatnum:54527}}
| headers="n0nrA" | {{formatnum:47999}}
| headers="n0nrA" | {{formatnum:47999}}
| headers="n0nrpA" | 88%
| headers="n0nrpA" | 88%
| headers="asB" | {{formatnum:47670}}
| headers="n0aA" |
| headers="aspB" | 99.4%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +4%
| headers="n0aoA" |
| headers="asA" | {{formatnum:47715}}
| headers="aaA" | {{formatnum:47715}}
| headers="aspA" | 99.4%
| headers="sopA" style="background-color:#fdd" | −51%
| headers="acB" | {{formatnum:47642}}
| headers="acpB" | 99.3%
| headers="acA" | {{formatnum:47688}}
| headers="acpA" | 99.4%
| headers="alB" | {{formatnum:47672}}
| headers="alpB" | 99.4%
| headers="alA" | {{formatnum:47717}}
| headers="alpA" | 99.4%
| headers="aaB" | {{formatnum:47660}}
| headers="aapB" | 99.4%
| headers="aaA" | {{formatnum:47704}}
| headers="aapA" | 99.4%
| headers="aapA" | 99.4%
| headers="aapdA" |
| headers="apcA" | {{formatnum:47688}}
| headers="apcpA" | 99.4%
| headers="apcpdA" |
| headers="awtA" | {{formatnum:47717}}
| headers="awtpA" | 99.4%
| headers="awtpdA" |
|-
|-
| headers="wN" | [[wikt:am:|Amharic Wiktionary]]
| headers="wN" | [[wikt:am:|Amharic Wiktionary]]
Line 386: Line 298:
| headers="saC" | −200
| headers="saC" | −200
| headers="sapC" | −54%
| headers="sapC" | −54%
| headers="dpB" | {{formatnum:1147}}
| headers="n0pB" | 446
| headers="n0nrB" | 398
| headers="n0nrpB" | 89%
| headers="dpA" | {{formatnum:1154}}
| headers="dpA" | {{formatnum:1154}}
| headers="n0pA" | 446
| headers="n0pA" | 446
| headers="n0nrA" | 398
| headers="n0nrA" | 398
| headers="n0nrpA" | 89%
| headers="n0nrpA" | 89%
| headers="asB" | 390
| headers="n0aA" |
| headers="aspB" | 98%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | −4%
| headers="n0aoA" |
| headers="asA" | 390
| headers="aaA" | 390
| headers="aspA" | 98%
| headers="aapA" | 98%
| headers="sopA" style="background-color:#fdd" | −56%
| headers="aapdA" |
| headers="acB" | 301
| headers="apcA" | 301
| headers="acpB" | 76%
| headers="apcpA" | 76%
| headers="acA" | 301
| headers="apcpdA" |
| headers="acpA" | 76%
| headers="awtA" | 390
| headers="alB" | 390
| headers="awtpA" | 98%
| headers="alpB" | 98%
| headers="awtpdA" |
| headers="alA" | 390
| headers="alpA" | 98%
| headers="aaB" | 301
| headers="aapB" | 76%
| headers="aaA" | 301
| headers="aapA" | 76%
|-
|-
| headers="wN" | [[wikt:chr:|Cherokee Wiktionary]]
| headers="wN" | [[wikt:chr:|Cherokee Wiktionary]]
Line 424: Line 326:
| headers="saC" | −198
| headers="saC" | −198
| headers="sapC" | −53%
| headers="sapC" | −53%
| headers="dpB" | 989
| headers="n0pB" | 354
| headers="n0nrB" | 302
| headers="n0nrpB" | 85%
| headers="dpA" | 996
| headers="dpA" | 996
| headers="n0pA" | 354
| headers="n0pA" | 354
| headers="n0nrA" | 302
| headers="n0nrA" | 302
| headers="n0nrpA" | 85%
| headers="n0nrpA" | 85%
| headers="asB" | 301
| headers="n0aA" |
| headers="aspB" | 99.7%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +25%
| headers="n0aoA" |
| headers="asA" | 301
| headers="aaA" | 301
| headers="aspA" | 99.7%
| headers="aapA" | 99.7%
| headers="sopA" style="background-color:#fdd" | −41%
| headers="aapdA" |
| headers="acB" | 225
| headers="apcA" | 225
| headers="acpB" | 75%
| headers="apcpA" | 75%
| headers="acA" | 225
| headers="apcpdA" |
| headers="acpA" | 75%
| headers="awtA" | 301
| headers="alB" | 301
| headers="awtpA" | 99.7%
| headers="alpB" | 99.7%
| headers="awtpdA" |
| headers="alA" | 301
| headers="alpA" | 99.7%
| headers="aaB" | 227
| headers="aapB" | 75%
| headers="aaA" | 227
| headers="aapA" | 75%
|-
|-
| headers="wN" | [[wikt:dv:|Divehi Wiktionary]]
| headers="wN" | [[wikt:dv:|Divehi Wiktionary]]
Line 462: Line 354:
| headers="saC" | −59
| headers="saC" | −59
| headers="sapC" | −43%
| headers="sapC" | −43%
| headers="dpB" | 672
| headers="n0pB" | 185
| headers="n0nrB" | 172
| headers="n0nrpB" | 93%
| headers="dpA" | 673
| headers="dpA" | 673
| headers="n0pA" | 185
| headers="n0pA" | 185
| headers="n0nrA" | 172
| headers="n0nrA" | 172
| headers="n0nrpA" | 93%
| headers="n0nrpA" | 93%
| headers="asB" | 102
| headers="n0aA" |
| headers="aspB" | 59%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +35%
| headers="n0aoA" |
| headers="asA" | 102
| headers="aaA" | 102
| headers="aspA" | 59%
| headers="aapA" | 59%
| headers="sopA" style="background-color:#fdd" | −23%
| headers="aapdA" |
| headers="acB" | 81
| headers="apcA" | 81
| headers="acpB" | 47%
| headers="apcpA" | 47%
| headers="acA" | 81
| headers="apcpdA" |
| headers="acpA" | 47%
| headers="awtA" | 102
| headers="alB" | 102
| headers="awtpA" | 59%
| headers="alpB" | 59%
| headers="awtpdA" |
| headers="alA" | 102
| headers="alpA" | 59%
| headers="aaB" | 87
| headers="aapB" | 51%
| headers="aaA" | 87
| headers="aapA" | 51%
|-
|-
| headers="wN" | [[wikt:lo:|Lao Wiktionary]]
| headers="wN" | [[wikt:lo:|Lao Wiktionary]]
Line 500: Line 382:
| headers="saC" | −{{formatnum:23072}}
| headers="saC" | −{{formatnum:23072}}
| headers="sapC" | −38%
| headers="sapC" | −38%
| headers="dpB" | {{formatnum:61933}}
| headers="n0pB" | {{formatnum:61100}}
| headers="n0nrB" | {{formatnum:60689}}
| headers="n0nrpB" | 99.3%
| headers="dpA" | {{formatnum:61936}}
| headers="dpA" | {{formatnum:61936}}
| headers="n0pA" | {{formatnum:61100}}
| headers="n0pA" | {{formatnum:61100}}
| headers="n0nrA" | {{formatnum:60689}}
| headers="n0nrA" | {{formatnum:60689}}
| headers="n0nrpA" | 99.3%
| headers="n0nrpA" | 99.3%
| headers="asB" | {{formatnum:60672}}
| headers="n0aA" |
| headers="aspB" | 99.97%
| headers="n0apA" |
| headers="sopB" style="background-color:#fdd" | +0.00%
| headers="n0aoA" |
| headers="asA" | {{formatnum:60673}}
| headers="aaA" | {{formatnum:60673}}
| headers="aspA" | 99.97%
| headers="aapA" | 99.97%
| headers="sopA" style="background-color:#fdd" | −38%
| headers="aapdA" |
| headers="acB" | {{formatnum:60663}}
| headers="apcA" | {{formatnum:60663}}
| headers="acpB" | 99.96%
| headers="apcpA" | 99.96%
| headers="acA" | {{formatnum:60663}}
| headers="apcpdA" |
| headers="acpA" | 99.96%
| headers="awtA" | {{formatnum:60673}}
| headers="alB" | {{formatnum:60672}}
| headers="awtpA" | 99.97%
| headers="alpB" | 99.97%
| headers="awtpdA" |
| headers="alA" | {{formatnum:60673}}
| headers="alpA" | 99.97%
| headers="aaB" | {{formatnum:60663}}
| headers="aapB" | 99.96%
| headers="aaA" | {{formatnum:60663}}
| headers="aapA" | 99.96%
|}
|}
<!-- todo:
<!-- todo:

Revision as of 01:03, 1 July 2012

Original information

On May 10, 2012, a bug report requesting that the "updateArticleCount.php" maintenance script be run on all Wiktionaries and Wikisources was acted upon, resulting in 60 of those wikis surpassing or falling below one or more of the article-count milestones tracked at Wikimedia News. Some of the changes were quite large and therefore questionable.

The tables below show a great many statistics for the Wiktionaries and Wikisources that showed the largest percent changes in article counts (up or down) on May 10.

Wiktionaries

The table below can be sorted by any column (initial order of the rows is by "stats articles pct chg"). You can also "hover" over any of the column headings in the table to get an explanation of that column (these mostly match the explanations in the key below, but the percents are explained in more detail). The wiki names are linked to the wikis themselves, while the dates are linked to subpages of this one containing the "raw statistics" upon which these numbers are based (which have not yet been updated to reflect all the stats now shown in the table).

Table key:

  • wiki – Wiki name, linked to the wiki itself
  • date before / date after – Dates dumps were made before / after the May 10th running of the "updateArticleCount.php" script (linked to subpages showing "raw" stats); all other "before" and "after" cells are colored using the same scheme; note that dumps that happened on May 10th [e.g., Pashto] are classified as "before" or "after" based on the similarity of their "stats articles" count [see next two items] to dumps made on either side of May 10th(in most cases it was very obvious whether "updateArticleCount.php" had already been run by the time the May 10th dump was made)
  • stats pages – Total page count given in "site_stats.sql" dump, which matches the on-wiki count of "Pages" in Special:Statistics and the one given by {{NUMBEROFPAGES}} on the wiki itself
  • stats articles – Article count given in the "site_stats.sql" dump, which matches the on-wiki count of "Content pages" in Special:Statistics and the one given by {{NUMBEROFARTICLES}} on the wiki itself
  • % of pgs – Percent of pages that were non-redirects
  • stats articles chg – Change in "stats articles" count from before to after May 10th
  • stats articles pct chg – Percent change in "stats articles" count from before to after May 10th (as percent of count before).
  • dumped pages – Total number of pages seen in "page.sql" dump (note: this and all of the remaining stats are for after May 10th)
  • ns0 pages – Number of pages in "page.sql" marked as being in the main namespace (the only content namespace for all Wiktionaries)
  • ns0 non-redirs – Number of main-namespace pages in "page.sql" marked as not being redirects (in either that dump or "redirect.sql")
  • ns0 articles – Number of articles (i.e., entries) seen in "page.sql" dump according to the current definition of an article: "content-namespace non-redirect containing at least one [[wikilink]] to another page on the same wiki"
  • % of nrs – Percent of non-redirects that counted as articles
  • off stats – "ns0 articles" count minus "stats articles after" count (should be zero, since they're counting the same thing)
  • alt article count – Number of articles based on alternate definition: "content-namespace non-redirect containing at least one wikilink to another page on the same wiki, image/file link, category link, interlanguage link, or interwiki link"1 (this and all other "article count" percents are all out of the "ns0 non-redirs" count)
  • pct off stats – Percent difference between "alt article count" and "stats articles after" count, as percent of "stats articles after"
  • page-cat article count – Article count based on: "content-namespace non-redirect linked to another page on the same wiki or placed in a category"
  • wl-templ article count – Article count based on: "content-namespace non-redirect containing at least one wikilink of any type (see "alt. article count") or {{template call}}"

Note 1: This definition is similar, but not identical, to the old method of counting articles, which included any page whose wikitext contained the string "[[" — a very loose definition including regular [[links]] to other pages on the same wiki, [[Category:]] links, [[Image:]] (or [[File:]]) links, interlanguage links (e.g., [[de:]]), and interwiki links (e.g., [[species:]]), as well as "hidden" links inside of <!-- HTML comments --> and links "deactivated" by <nowiki> tags (indeed, even text containing "[[" but not "]]" was included). The new method of counting articles only counts page links. Because it is based on database dumps and not the raw wikitext of each page, this analysis counts only the 5 types of "real" wikilinks (not "hidden", "deactivated", or "incomplete" links). The different kinds of links have been accounted for by parsing the dumps "pagelinks.sql", "categorylinks.sql", "imagelinks.sql", "langlinks.sql", "iwlinks.sql", and (only for the so-called "liberal" method) "templatelinks.sql".

(20 more Wiktionaries to add to table)

Wikisources

Note: This information needs to be completely rewritten to reflect my latest understanding of how article counting is done.

As with the Wiktionaries table above, this table is initially sorted by the "stats articles pct Δ" column, and the dates are linked to the full statistics collected for each wiki. The explanation of the columns is mostly the same as for the table above, with certain additions noted below. Note that some stats included in the other table have been omitted here to limit the size of the table.

Table key: (differences from above)

  • dumped articles – Article count across all content namespaces (main [ns0] and, if appropriate, "author", "page" and "index" — see other items below) using current "non-redirect in content namespace with at least one wikilink"1 criteria, based on several relevant dumps (percent is out of "dumped pages" count)
  • author ns – Number of the namespace containing "Author:" pages, but only if that namespace exists and counts as content
  • author ns pages – Number of pages in "page.sql" dump marked as being in the author namespace (this and similar stats that follow are coded as "0" if the namespace is missing or not counted as content, so table sorting is not broken)
  • author ns non-redirs – Number of author-namespace pages in "page.sql" marked as not being redirects (in either that dump or "redirect.sql")
  • author ns articles – Number of author-namespace pages in "page.sql" that qualify as articles based on current criteria (percent is out of "author ns non-redirs")
  • page ns – Number of the namespace containing "Page:" pages, but only if that namespace exists and counts as content
  • page ns pages – Number of pages in "page.sql" dump marked as being in the page namespace
  • page ns non-redirs – Number of page-namespace pages in "page.sql" marked as not being redirects (in either that dump or "redirect.sql")
  • page ns articles – Number of page-namespace pages in "page.sql" that qualify as articles based on current criteria (percent is out of "page ns non-redirs")
  • index ns – Number of the namespace containing "Index:" pages (in some languages translated as "book"), but only if that namespace exists and counts as content
  • index ns pages – Number of pages in "page.sql" dump marked as being in the index namespace
  • index ns non-redirs – Number of index-namespace pages in "page.sql" marked as not being redirects (in either that dump or "redirect.sql")
  • index ns articles – Number of index-namespace pages in "page.sql" that qualify as articles based on current criteria (percent is out of "index ns non-redirs")

(11 more Wikisources to add to table)

Verifying the counts

How do I know that my script is giving correct article counts? Well, anyone who can program sufficiently well can redo the calculations themselves, based on the descriptions below of how I did it. For everyone else, I provide some "really raw" output showing the final, processed results of the "page hash" constructed by my script for the 2012-05-16 dump of the Tsonga Wiktionary:

/tswiktionary-20120516-raw

Some spot checks of this output didn't reveal any problems, as far as I could tell.

Note: Except there was a problem... I wasn't using the right definition of a "good" article. This "raw" output page will soon be updated to reflect the correct article count.


Complete rewrite...

On May 10, 2012, a bug report requesting that the "updateArticleCount.php" maintenance script be run on all Wiktionaries and Wikisources was acted upon, resulting in 60 of those wikis surpassing or falling below one or more of the article-count milestones tracked at Wikimedia News. Some of the changes were quite large and therefore questionable.

A preliminary investigation revealed only one obvious pattern in the count changes: most Wiktionaries lost articles while most Wikisources gained. The gains can be explained by the fact that most Wikisources now count more namespaces as "content" than they used to; in addition to articles in the main namespace ("ns0"), many Wikisources now count qualifying pages in 1, 2, or 3 additional namespaces (more about this later). The losses were harder to explain.

Neither the gains nor losses seemed to be related to the writing system the wiki was using (e.g., Latin script vs. Brahmic scripts, etc.), whether it was an older wiki or newer one, bigger or smaller, and so forth. Most worryingly, it wasn't at all clear whether the new or old counts were "more correct". I (User:Dcljr) tried to estimate the "true" article counts based on random samples of pages at each wiki (or as close to random as could be reasonably achieved). Sometimes the resulting count was closer to the new one, sometimes closer to the old, and sometimes it was right in the middle between them. This (incomplete) preliminary information is collected at Talk:Wikimedia News#May 10 article count updates.

To collect more in-depth and "reliable" information, I wrote a Perl script to download and parse relevant database dumps needed to count the articles for a given wiki. Initially it seemed that the "updateArticleCount.php" script was consistently undercounting articles, but it turns out I was using the wrong (or, more accurately, an out-of-date) definition of what counts as an article. Once I used the right definition, I started to get the same counts as those given by "updateArticleCount.php". (For more context, see bug 37291.)

But more about all that later. First, a summary of how article counts have been determined in the past, how they are determined now, and how the article counts actually changed when the "updateArticleCount.php" script was run on May 10, 2012.

How article counting used to be done

When wiki article counting first began, it was based on whether a page contained a comma or not. This worked fine for the English Wikipedia, but once other projects in other languages started up, people realized that this method would not work for all wikis. A very quick (one week!) discussion and vote was held here at Meta in March 2003, the details of which can be found at:

Based on the results of the vote, it was decided that a page would be counted as an article if it was:

a non-redirect in the main namespace (ns0), containing at least one [[wikilink]]

Unfortunately, the implementation of this definition left a little to be desired, and it ended up counting not only 5 different types of legitimate wikilinks (1–5 below), but two types of "false" wikilinks (6 and 7), and one type of non-wikilink (8):

  1. page links: e.g., [[Babel]] or [[Talk:Babel]], etc.
  2. category links: [[Category:Software]]
  3. image/file links: [[File:Yes.png]]
  4. interlanguage links: [[de:Wikipedia:Hauptseite]] or [[:de:Wikipedia:Hauptseite]]
  5. interwiki links: [[species:]]
  6. hidden links: <!-- [[don't look at me]] -->
  7. deactivated links: <nowiki>[[look at me]]</nowiki>
  8. any text containing the string "[[": wikilinks start with "[["...

(Note that links like [[:Category:Software]] and [[:File:Yes.png]], which start with an initial colon, are regular page links of type 1.)

In fact, number 8 describes exactly what was checked for to count a page as an article (assuming it wasn't a redirect and was in the main namespace)!

Eventually, this shortcoming led some wikis to routinely place "hidden" links (of type 6) on their main-namespace pages, just to get them counted as articles.

In June 2006, the $wgContentNamespaces configuration variable was introduced (in revision 14738) to enable namespaces other than the main one (ns0) to count as "content".

At this point, the de facto definition of an article was:

a non-redirect in a content namespace, containing the string "[["

In November 2007, bug 11868 was submitted requesting that links provided by templates be counted, too. In the course of the ensuing discussion, it was pointed out that links other than page links (types 2, 3, etc.) were being counted, and that in fact three different counting methods (all of which started with "non-redirect in a content namespace") were being employed at different places in the code:

  • every time a page was saved, the "[["-string criterion was used to see whether the page would count as an article
  • when the "initStats.php" maintenance script was run, it just checked to see whether the pages were non-empty
  • when the "updateArticleCount.php" maintenance script was run, it checked whether the "page.sql" table actually contained page links originating from each page in question (type 1 only, but also type 1 links provided by templates)

In addition, when pages were imported into a wiki, the article count was not updated correctly (see bugs 2483, 5703, and 6600).

These inconsistencies allowed the on-wiki article counts (e.g., {{NUMBEROFARTICLES}}) to diverge from the "correct" count (however that was defined!) over time.

At some point, the "meat" of the "updateArticleCount.php" script was moved elsewhere.

How article counting is done now

In May 2011, a developer finally acted to "rationalize" the way articles were counted, and in revision 88113 introduced the $wgArticleCountMethod configuration variable to specify which type of (non-empty) content-namespace non-redirect would count as articles: all such pages ("any"), only those containing a true page link ("link"), or only those containing a comma ("comma"). Article.php and SiteStats.php were modified to reflect this change.

So now, assuming $wgArticleCountMethod is set to "link" for a wiki (which it is for all but the English and Portuguese Wikibooks), a page counts as an article (presumably at all places in the MediaWiki code) if it is:

a non-redirect in a content namespace, containing (after parsing) at least one true [[wikilink]] to another page on the same wiki

Note how different this definition is from the one actually in effect before the change was made! Unfortunately, the extreme nature of the change wasn't apparent to most people until the article counts were recalculated on May 10, 2012.

Because of the "after parsing" part of the new definition, one can no longer tell whether a page will count as an article simply by examining its page source; if the page contains templates, it must be fully parsed first in order for any links created by those templates to be accounted for. Fortunately, this is done when pages are saved, so as long as the "page.sql" database is maintained correctly, the article count should no longer get "out of sync" as it did in the past.

Changes to article counts on May 10, 2012

Apart from isolated requests here and there (for example, bug 34184), the article counts of the various Wikimedia content wikis have not been updated to reflect all of these changes in how articles have been counted over time. The May 10 running of "updateArticleCount.php" on all the Wiktionaries and Wikisources was the first concerted effort to "fix" the article counts across an entire project. On that day, the changes seen in article counts for these two projects are shown in the tables below. (Note that none of these counts are based on database dumps; see key below for details.)

Key for both tables:

  • wiki name – linked to the Main Page of the wiki
  • articles before / articles after – on-wiki article count at c. 00:30 UTC on 2012-05-10 and c. 00:30 UTC on 2012-05-11, respectively (collected via API request, equivalent to {{NUMBEROFARTICLES}} and the count seen at Special:Statistics on the given wiki)
  • change – after minus before
  • pct change – relative change in article count, as a percentage of the "before" count
  • level before / level after – which milestone level (tracked at Wikimedia News) the wiki would be at based on the article count
  • level change – whether there was a change in milestone level

Note that the tables are initially shown "collapsed" (to expand one, select the "[show]" link) and are sorted by the "level after" column, then "level before", then (unfortunately) alphabetically by language code. To sort by a different column, click on the "up-down" arrows next to the column heading. For help with sorting on a "secondary sort key", see Help:Sorting#Secondary sortkey.

Wiktionary

Note: 8 Wiktionaries rose up to new milestone levels and 24 fell to lower milestone levels.

Wikisource

Note: 15 Wikisources rose up to new milestone levels and 13 fell to lower milestone levels.

Changes to article counts in other projects

Eventually the article counts will need to be updated on the other Wikimedia wikis. The tables below show the changes that would have occurred if the "updateArticleCount.php" script were run on each of the other "content wikis" on the day that wiki's database was most recently dumped (as of the time the tables were filled in). The columns are as in the previous section, except for the "date dumped" column, which should be self-explanatory. Unlike the tables above, initial sorting is by "articles before" in reverse numerical order.

Wikipedia

Note: Information to come...
Changes to Wikipedia article counts if they were updated on the indicated dates
wiki name date dumped articles before articles after change pct change level before level after level change

Note: The English Wikipedia is too large to include in this analysis.

Wikibooks

Things to note:

Wikiquote

Things to note:

  • 30 Wikiquotes would fall to lower milestone levels and none would rise to new levels.
  • The Alemannic Wikiquote exists as a separate namespace within that language's Wikipedia, so it is not included in this analysis.

Wikinews

Things to note:

  • 11 Wikinews languages would fall to lower milestone levels and none would rise to new levels.
  • The Alemannic Wikinews and Low German/Low Saxon Wikinews exist as separate namespaces within their respective language Wikipedias and so are not included in this analysis.

Wikiversity

Note: 1 Wikiversity would rise to a new milestone level and 2 would fall to lower levels.

Other possible article counting criteria

Clearly there are big differences between the old and new definitions of what constitutes an article. While the new definition may be closer to the original intent of the "Article count reform" voters (although even this is not entirely clear), people have gotten used to the old way of doing things and might be disturbed by large changes in article counts. In particular, some might consider it a "bug" in the new method that, say, category links are no longer considered.

For this reason, it might be time to think about what other criteria could be used to count articles.

Below is a table containing alternate article-count statistics for several (10 or 5) wikis from each project that either have shown (Wiktionaries and Wikisources) or would show if updated (the rest) the "most significant" changes in article counts. (The "significance" of a change in article count is defined as the percent change multiplied by the actual change; this is based on the idea that a large percentage change is more significant if it reflects a large actual change, and vice-versa.)

As alluded to earlier, these statistics were generated by a Perl script I (dcljr) wrote to download and parse relevant database dumps. Unlike the previous tables for Wiktionary and Wikisource, the statistics shown here are not based on API queries but instead are all based on dumps (the ones for the two aforementioned projects being before the May 10 change).

For convenience, I repeat here the list of different types of links (now including "template links") that have been used — or could possibly be used — to count articles, along with the associated SQL databases that currently track such links (note that "page.sql" contains the page IDs that each of these other databases refer to):

link type examples database
page (on same wiki) [[Babel]], [[Talk:Babel]], [[:Category:Software]], [[:Image:Cat.jpg]], [[:File:Cat.jpg]] pagelinks.sql
category [[Category:Software]] categorylinks.sql
image/file [[Image:Cat.jpg]], [[File:Cat.jpg]] imagelinks.sql
interlanguage [[de:Wikipedia:Hauptseite]], [[:de:Wikipedia:Hauptseite]] langlinks.sql
interwiki [[species:]], [[wookieepedia:]] iwlinks.sql
template {{fact}}, {{fact|date=June 2012}} templatelinks.sql
hidden× <!-- [[don't look at me]] --> (none)
deactivated× <nowiki>[[look at me]]</nowiki> (none)
any text containing "[["× Wikilinks start with two open-brackets (<tt>[[</tt>). (none)
Note ×: Not a real wikilink, so not contained in any "links" database.

Note that a "template link" does not mean a wikilink provided by a template; it simply refers to any {{template call}}, regardless of whether the template provides any wikilinks (or, indeed, any content at all, since the target template may not even exist).

Now for the various definitions of what might constitute an article — all of which should be understood to begin with the phrase "non-redirect in a content namespace, containing at least one…":

  • P"…page link" (the new definition, used since May 2011)
  • P-C"…page or category link"
  • P-C-F"…page, category, or image/file link"
  • ANY"…page, category, image/file, interlanguage, or interwiki link" (this more or less stands in for the "old" way of counting articles, although it's not the same — as explained above, as with all of these dump-based counting methods, it counts wikilinks that are provided via template calls, which the old method couldn't do, and doesn't count hidden or deactivated links, nor any text simply containing the string "[[")
  • P-C-T"…page or category link, or any template call" (regardless of whether the template provides any links — the idea behind this definition is that all of these somehow refer to other pages definitely on the same wiki)
  • P-C-L"…page, category, or interlanguage link" (the idea here is that since interlanguage links connect "equivalent" content in different languages, they should be treated similarly to links between articles on the same wiki [i.e., pagelinks])

If someone wants to suggest Yet Another definition, I can modify my Perl script to use it (as long as it uses some combination of the database-tracked link types listed above).

Article counts by various criteria for wikis with significant (actual or potential) article-count changes
project wiki name dump date stats articles P article count pct diff P-C article count pct diff P-C-F article count pct diff ANY article count pct diff P-C-T article count pct diff P-C-L article count pct diff