Jump to content

Research talk:Automated classification of article importance: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 7 years ago by Nettrom in topic Importance of Wikidata Items
Content deleted Content added
→‎Importance of Wikidata Items: re, great comments, thanks… here are some thoughts
Line 2: Line 2:


It seems like the proposed notion of importance of articles to a language version of Wikipedia is restricted to articles that already exist in that language. But for smaller Wikipedias, there may be many important articles that have not yet been created. It would be really valuable to have a notion of importance for any entity in Wikidata that has an article in at least one language. In particular, it would be great to have a global, language independent list of Wikidata items ranked by importance as well as a separate importance ranking for each language. These rankings could be used to not only prioritize work on existing articles, but also prioritize work on creating new articles and filling knowledge gaps. [[User:Ewulczyn (WMF)|Ewulczyn (WMF)]] ([[User talk:Ewulczyn (WMF)|talk]]) 00:53, 2 February 2017 (UTC)
It seems like the proposed notion of importance of articles to a language version of Wikipedia is restricted to articles that already exist in that language. But for smaller Wikipedias, there may be many important articles that have not yet been created. It would be really valuable to have a notion of importance for any entity in Wikidata that has an article in at least one language. In particular, it would be great to have a global, language independent list of Wikidata items ranked by importance as well as a separate importance ranking for each language. These rankings could be used to not only prioritize work on existing articles, but also prioritize work on creating new articles and filling knowledge gaps. [[User:Ewulczyn (WMF)|Ewulczyn (WMF)]] ([[User talk:Ewulczyn (WMF)|talk]]) 00:53, 2 February 2017 (UTC)

: Hi [[User:Ewulczyn (WMF)|Ewulczyn (WMF)]], thank you for the insightful comments! You are right that the current proposal aims to determine importance of already existing articles. I agree that being able to determine importance for articles that do not exist, or for any Wikidata entity for that matter, would be valuable. Then some of the other notions of importance could potentially be a filter on top of that calculation (e.g. "any Wikidata entity with >= 1 article in >= 1 language"). However, I do not think we know much about how importance works across different scopes. Some of it is found in the discussions around [[List of articles every Wikipedia should have]], where contributors argue about whether the list reflects a global perspective. There are also some research papers that look into cultural differences in importance (e.g. [[Research:Newsletter/2014/June#"Interactions of cultures and top people of Wikipedia from ranking of 24 language editions"]] reviews a preprint of a PLOS ONE paper that does this).

: In summary, I am wondering if this suggests three potential research topics:
:# Importance by scope
:#* Are there meaningful differences in importance depending on the scope? In other rods, if we ask a set of WikiProject members, will they agree with importance ratings that were gathered using an algorithm based on Wikidata? What are their reasons for agreeing/disagreeing?
:# Importance in the context of Wikidata
:#* How can we determine the importance of entities in Wikidata?
:#* How does Wikidata importance relate to Wikipedia article importance?
:# Recommendations for article creation
:#* This should specifically target smaller Wikipedias. In other words, we are perhaps interested in determining some sort of ''base set'' of articles. This might turn out to be [[List of articles every Wikipedia should have]], or it might turn out to be something different.
:#* A key element would be to study what happens when local and global scope collide. In [http://www-users.cs.umn.edu/~morten/publications/wikisym2012-urwikipedia.pdf our WikiSym 2012 paper] we did a rudimentary investigation of articles in a single language and found that they had a limited scope. Based on the PLOS ONE paper mentioned above, different language editions have slightly different focuses. It would therefore most likely be useful if we had a way of adding a local influence to a global scope, or something along those lines, in order to improve the quality of the recommendations.

: I'm not certain to what extent these should be incorporated into the proposal. Pinging [[User:Halfak (WMF)]] so he knows about this thread. Thanks again for the comments! Cheers, [[User:Nettrom|Nettrom]] ([[User talk:Nettrom|talk]]) 19:44, 3 February 2017 (UTC)

Revision as of 19:44, 3 February 2017

Importance of Wikidata Items

It seems like the proposed notion of importance of articles to a language version of Wikipedia is restricted to articles that already exist in that language. But for smaller Wikipedias, there may be many important articles that have not yet been created. It would be really valuable to have a notion of importance for any entity in Wikidata that has an article in at least one language. In particular, it would be great to have a global, language independent list of Wikidata items ranked by importance as well as a separate importance ranking for each language. These rankings could be used to not only prioritize work on existing articles, but also prioritize work on creating new articles and filling knowledge gaps. Ewulczyn (WMF) (talk) 00:53, 2 February 2017 (UTC)Reply

Hi Ewulczyn (WMF), thank you for the insightful comments! You are right that the current proposal aims to determine importance of already existing articles. I agree that being able to determine importance for articles that do not exist, or for any Wikidata entity for that matter, would be valuable. Then some of the other notions of importance could potentially be a filter on top of that calculation (e.g. "any Wikidata entity with >= 1 article in >= 1 language"). However, I do not think we know much about how importance works across different scopes. Some of it is found in the discussions around List of articles every Wikipedia should have, where contributors argue about whether the list reflects a global perspective. There are also some research papers that look into cultural differences in importance (e.g. Research:Newsletter/2014/June#"Interactions of cultures and top people of Wikipedia from ranking of 24 language editions" reviews a preprint of a PLOS ONE paper that does this).
In summary, I am wondering if this suggests three potential research topics:
  1. Importance by scope
    • Are there meaningful differences in importance depending on the scope? In other rods, if we ask a set of WikiProject members, will they agree with importance ratings that were gathered using an algorithm based on Wikidata? What are their reasons for agreeing/disagreeing?
  2. Importance in the context of Wikidata
    • How can we determine the importance of entities in Wikidata?
    • How does Wikidata importance relate to Wikipedia article importance?
  3. Recommendations for article creation
    • This should specifically target smaller Wikipedias. In other words, we are perhaps interested in determining some sort of base set of articles. This might turn out to be List of articles every Wikipedia should have, or it might turn out to be something different.
    • A key element would be to study what happens when local and global scope collide. In our WikiSym 2012 paper we did a rudimentary investigation of articles in a single language and found that they had a limited scope. Based on the PLOS ONE paper mentioned above, different language editions have slightly different focuses. It would therefore most likely be useful if we had a way of adding a local influence to a global scope, or something along those lines, in order to improve the quality of the recommendations.
I'm not certain to what extent these should be incorporated into the proposal. Pinging User:Halfak (WMF) so he knows about this thread. Thanks again for the comments! Cheers, Nettrom (talk) 19:44, 3 February 2017 (UTC)Reply