Wikipedia:Wikipedia Signpost/2010-09-13/Sister projects: Difference between revisions
that story actually wasn't published in the signpost, link to published version instead / seems the 70 were not actively "contributing" data |
m Protected "Wikipedia:Wikipedia Signpost/2010-09-13/Sister projects": old newspaper articles don't need to be continually updated, the only real edits expected here are from bots/scripts, and vandalism is extremely hard to monitor ([Edit=Require autoconfirmed or confirmed access] (indefinite) [Move=Require autoconfirmed or confirmed access] (indefinite)) |
||
(19 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
<noinclude>{{Wikipedia:Signpost/Template:Signpost-header||Opinion|}}</noinclude> |
<noinclude>{{Wikipedia:Signpost/Template:Signpost-header||Opinion|}}</noinclude> |
||
{{Wikipedia:Signpost/Template:Signpost-article-start|{{{1| |
{{Wikipedia:Signpost/Template:Signpost-article-start|{{{1|Update on the ''Death Anomalies'' collaboration}}}|By [[User:WereSpielChequers|WereSpielChequers]]|13 September 2010}} |
||
:''WereSpielChequers is an editor on the English Wikipedia and occasionally elsewhere. He has been actively involved in various Biography related projects this year and [http://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_requests&diff=prev&oldid=367208992 collaborated with Bot writer Merlissimo] to launch the Death Anomalies project.'' |
|||
⚫ | Just over a month ago, ''The Signpost'' [[Wikipedia:Wikipedia Signpost/2010-07-26/News and notes|published a story]] on the [[meta:Death anomalies table|Death Anomalies project]], which identifies anomalies where different language Wikipedias disagree as to whether an individual is dead or alive. The Project was started in June, with initially just the German and English language Wikipedias extracting reports of anomalies. Since then, the Latin, Swedish, and Slovenian Wikipedias have joined in, and hundreds of errors have been resolved. When ''The Signpost'' covered the project, readers pitched in and the number of anomalies on enwiki was slashed from 447 to 190 in just over a week. EN wiki still has more than a 100 anomalies on [[Wikipedia:Database reports/Living people on EN wiki who are dead on other wikis]], with new reports coming in daily. However, most of the backlog is down to differences in the way different projects treat missing people who (if alive) would be more than 100 years old, cross-wiki anomalies stemming from unreferenced articles showing a person as dead, and issues that probably require a native foreign-language speaker to resolve. |
||
⚫ | In July, only two projects were extracting data from the table, though it queried data from around 70. Subsequently these have been joined by the Swedish Wikipedia [http://sv.wikipedia.org/w/index.php?title=Wikipedia:Projekt_levande_personer/Eventuellt_avlidna&action=history which rapidly reduced 94 anomalies to 16], and the Latin wikipedia, which has managed to [http://la.wikipedia.org/w/index.php?title=Vicipaedia:Mortui_dicti&action=history reduce its anomalies to one]. Earlier this month the [[:sl:Wikipedija:Biografije živečih oseb/Domnevno umrli|Slovene Wikipedia]] became the fifth participating project, and went in a week from requesting a report to having cleared their backlog. |
||
⚫ | |||
⚫ | Biographies of living people (BLPs) inevitably need to be updated when the subject dies, so all these reports are expected to be ongoing maintenance tasks. Although the bot is processing data from millions of biographies across different Wikipedias, fewer than a thousand anomalies have been identified so far, relying on [[Interwiki links]] and categories that identify biographies as dead or living. Some projects are ineligible for the program because they don't organise their articles in such a way; for example, the Portuguese Wikipedia have lists of people who died in particular years (rather than categories). |
||
⚫ | In July, only two projects were extracting data from the table, though it |
||
⚫ | In the future, the number of languages from which data is extracted and number of languages requesting reports will hopefully increase; we have 66 Wikipedia language versions including French, Spanish, Japanese, Polish and Russian for whom reports could be extracted almost immediately. [[User:Merlissimo|Merlissimo]] (whom Jimbo Wales [http://en.wikipedia.org/w/index.php?title=User_talk%3AJimbo_Wales&action=historysubmit&diff=375576703&oldid=375575718 praised] as a "rock star" for his work on the project) has a bot that updates the reports daily, and is willing to produce reports for other projects. |
||
⚫ | |||
⚫ | In the future, the number of languages from which data is extracted and number of languages requesting reports will hopefully increase; we have 66 Wikipedia language versions including French, Spanish, Japanese, Polish and Russian for whom reports could be extracted almost immediately. [[User:Merlissimo|Merlissimo]] has a bot that updates the reports daily, and is willing to produce reports for other projects. |
||
=== User responses === |
=== User responses === |
||
{{ |
{{cquote|The '''Swedish Wikipedia''' is fertile ground for a project of this kind. After some years of rapid growth in the number articles, attention swung to quality and structure in 2008. Biographic articles were [[:sv:Kategori:Personer efter kön|exhaustively categorized by gender]] in the [northern autumn] of 2008, revealing that there are four male biographies for each female one, and by years of birth and death in 2009. This is also when the [[:sv:Kategori:Levande personer|category for living people]] and a [[:sv:Wikipedia:Projekt levande personer|WikiProject for living people]] were started. The "death anomalies" report was set up as a subpage to this WikiProject, named "[[:sv:Wikipedia:Projekt levande personer/Eventuellt avlidna|possibly deceased]]" people.... The Swedish Wikipedia has also benefited from [http://toolserver.org/~sk/cw/index.htm Check Wikipedia], a daily report of wiki-syntax errors, and would welcome similar projects. ([[User:LA2|LA2]])}} |
||
{{quote|Although the Latin wikipedia ([[:la:|la.wikipedia]]) uses a language with a long history, a large portion of its articles cover modern topics, including (of course) biographies of living people. In figures: Of the about 44000 articles available in the Latin wikipedia today, about 4300 (or roughly ten percent) are biographies of living people. |
|||
⚫ | |||
⚫ | {{cquote|Although the '''Latin Wikipedia''' ([[:la:|la.wikipedia]]) uses a language with a long history, a large portion of its articles cover modern topics, including (of course) BLPs.... [Of] about 44,000 articles available in the Latin wikipedia today, about 4300 (roughly ten percent) are BLPs. The death anomalies table adds an extra level of reliability to BLPs on the [English, German, Swedish and Latin Wikipedias]. It is great to see more and more tools are available that permit semantic checks and analyses of information ... the future is not just isolated wikitext articles, but a flexible repository of semantic information. The death anomalies table shows a glimpse of what might be possible in the future, when we will have at our disposal not only (wiki)text but also rich, usefully structured information and data. ([[User talk:UV|UV]])}} |
||
⚫ | {{ |
||
⚫ | {{cquote|The '''Slovenian Wikipedia''' has a relatively large proportion of biographies, of which there are more than 8,000 in BLPs (almost 10% of total article count). Many of those articles have been added semi-automatically and we have a small community of active contributors. Consequently, [many articles] aren't regularly maintained, which is why this tool will certainly prove extremely useful for easing the burden of keeping the content up-to-date. This means less work when the focus shifts from adding content to improving the quality one day, and improved reliability of the work until then. ([[User:Yerpo|Yerpo]])}} |
||
{{quote|German Wikipedia has more than 340.000 articles about persons containing also [[:de:Hilfe:Personendaten|machine readable data]] which can be used by external projects. The local report covers all people (not only living people) and is forwarded to 150 WikiProjects filtered by their subject area. |
|||
The |
{{cquote|The '''German Wikipedia''' has more than 340,000 articles about people that include [[:de:Hilfe:Personendaten|machine-readable data]] usable by external projects. The local report covers all people (not only living people) and is forwarded to 150 WikiProjects filtered by subject area. The script runs on the toolserver and uses the [[tswiki:Batch job scheduling|sun grid engine]] for efficient resource handling. About 1.9 million interwiki relations are checked every day for creating reports on five Wikipedias. ([[User:Merlissimo/Sig|Merl]])}} |
||
<noinclude>{{Wikipedia:Signpost/Template:Signpost-article-comments-end||2010-09 |
<noinclude>{{Wikipedia:Signpost/Template:Signpost-article-comments-end||2010-08-09|2011-01-17}}</noinclude> |
||
[[Category:Wikipedia Signpost archives 2010-09|13 Sister]] |
Latest revision as of 01:21, 6 January 2024
Update on the Death Anomalies collaboration
- WereSpielChequers is an editor on the English Wikipedia and occasionally elsewhere. He has been actively involved in various Biography related projects this year and collaborated with Bot writer Merlissimo to launch the Death Anomalies project.
Just over a month ago, The Signpost published a story on the Death Anomalies project, which identifies anomalies where different language Wikipedias disagree as to whether an individual is dead or alive. The Project was started in June, with initially just the German and English language Wikipedias extracting reports of anomalies. Since then, the Latin, Swedish, and Slovenian Wikipedias have joined in, and hundreds of errors have been resolved. When The Signpost covered the project, readers pitched in and the number of anomalies on enwiki was slashed from 447 to 190 in just over a week. EN wiki still has more than a 100 anomalies on Wikipedia:Database reports/Living people on EN wiki who are dead on other wikis, with new reports coming in daily. However, most of the backlog is down to differences in the way different projects treat missing people who (if alive) would be more than 100 years old, cross-wiki anomalies stemming from unreferenced articles showing a person as dead, and issues that probably require a native foreign-language speaker to resolve.
In July, only two projects were extracting data from the table, though it queried data from around 70. Subsequently these have been joined by the Swedish Wikipedia which rapidly reduced 94 anomalies to 16, and the Latin wikipedia, which has managed to reduce its anomalies to one. Earlier this month the Slovene Wikipedia became the fifth participating project, and went in a week from requesting a report to having cleared their backlog.
Biographies of living people (BLPs) inevitably need to be updated when the subject dies, so all these reports are expected to be ongoing maintenance tasks. Although the bot is processing data from millions of biographies across different Wikipedias, fewer than a thousand anomalies have been identified so far, relying on Interwiki links and categories that identify biographies as dead or living. Some projects are ineligible for the program because they don't organise their articles in such a way; for example, the Portuguese Wikipedia have lists of people who died in particular years (rather than categories).
In the future, the number of languages from which data is extracted and number of languages requesting reports will hopefully increase; we have 66 Wikipedia language versions including French, Spanish, Japanese, Polish and Russian for whom reports could be extracted almost immediately. Merlissimo (whom Jimbo Wales praised as a "rock star" for his work on the project) has a bot that updates the reports daily, and is willing to produce reports for other projects.
User responses
“ | The Swedish Wikipedia is fertile ground for a project of this kind. After some years of rapid growth in the number articles, attention swung to quality and structure in 2008. Biographic articles were exhaustively categorized by gender in the [northern autumn] of 2008, revealing that there are four male biographies for each female one, and by years of birth and death in 2009. This is also when the category for living people and a WikiProject for living people were started. The "death anomalies" report was set up as a subpage to this WikiProject, named "possibly deceased" people.... The Swedish Wikipedia has also benefited from Check Wikipedia, a daily report of wiki-syntax errors, and would welcome similar projects. (LA2) | ” |
“ | Although the Latin Wikipedia (la.wikipedia) uses a language with a long history, a large portion of its articles cover modern topics, including (of course) BLPs.... [Of] about 44,000 articles available in the Latin wikipedia today, about 4300 (roughly ten percent) are BLPs. The death anomalies table adds an extra level of reliability to BLPs on the [English, German, Swedish and Latin Wikipedias]. It is great to see more and more tools are available that permit semantic checks and analyses of information ... the future is not just isolated wikitext articles, but a flexible repository of semantic information. The death anomalies table shows a glimpse of what might be possible in the future, when we will have at our disposal not only (wiki)text but also rich, usefully structured information and data. (UV) | ” |
“ | The Slovenian Wikipedia has a relatively large proportion of biographies, of which there are more than 8,000 in BLPs (almost 10% of total article count). Many of those articles have been added semi-automatically and we have a small community of active contributors. Consequently, [many articles] aren't regularly maintained, which is why this tool will certainly prove extremely useful for easing the burden of keeping the content up-to-date. This means less work when the focus shifts from adding content to improving the quality one day, and improved reliability of the work until then. (Yerpo) | ” |
“ | The German Wikipedia has more than 340,000 articles about people that include machine-readable data usable by external projects. The local report covers all people (not only living people) and is forwarded to 150 WikiProjects filtered by subject area. The script runs on the toolserver and uses the sun grid engine for efficient resource handling. About 1.9 million interwiki relations are checked every day for creating reports on five Wikipedias. (Merl) | ” |
Discuss this story
WereSpielChequersthe author clearly was trying to force a humorous title where one does not belong. "Update on the Death Anomalies collaboration" would be a perfectly fine title on its own. Xenon54 (talk) 21:14, 13 September 2010 (UTC)[reply]Number of languages
Merlissimo has added another 8 languages this week, so that makes nearly 80 projects which are compared for anomalies, though currently only 5 are extracting reports. There are bound to be more anomalies emerging as more projects extract data or have data extracted from them, I also suspect that more anomalies will emerge as projects improve their categorisation - some projects have a lot of under-categorised articles. ϢereSpielChequers 12:19, 16 September 2010 (UTC)[reply]