Research:Wikimedia Summer of Research 2011: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Steven (WMF) (talk | contribs)
all will be moved here soon :-) ...
m Reverted changes by 2603:7000:E300:4E0E:F4A1:936F:7233:F706 (talk) to last version by Tegel
 
(30 intermediate revisions by 12 users not shown)
Line 2: Line 2:
|title=Wikimedia Foundation Summer of Research 2011
|title=Wikimedia Foundation Summer of Research 2011
|contact={{Investigator|Diederik van Liere<br />Maryana Pinchuk<br />Steven Walling|Wikimedia Foundation}}
|contact={{Investigator|Diederik van Liere<br />Maryana Pinchuk<br />Steven Walling|Wikimedia Foundation}}
|cis={{Investigator|[[User:EpochFail|Aaron Halfaker]]|University of Minnesota}} {{Investigator|[[User:Staeiou|R. Stuart Geiger]]|UC Berkeley}} {{Investigator|Melanie Kill|University of Maryland}}{{Investigator|Fabian Kaelin|McGill University}} {{Investigator|Jonathan Morgan<br />Shawn Walker|University of Washington}} {{Investigator|Yusuke Matsubara|University of Tokyo}} {{Investigator|Giovanni Luca Ciampaglia|University of Lugano}}
|cis={{Investigator|Aaron Halfaker|University of Minnesota}} {{Investigator|R. Stuart Geiger|UC Berkeley}} {{Investigator|Melanie Kill|University of Maryland}}{{Investigator|Fabian Kaelin|McGill University}} {{Investigator|Jonathan Morgan<br />Shawn Walker|University of Washington}} {{Investigator|Yusuke Matsubara|University of Tokyo}} {{Investigator|Giovanni Luca Ciampaglia|University of Lugano}}
|start-year=2011
|start-year=2011
|start-month=06
|start-month=06
|status=in-progress
|status=completed
|field=md
|field=md
|open-data=yes
|open-data=yes
Line 13: Line 13:
}}
}}
{{Shortcut|R:WSOR11}}
{{Shortcut|R:WSOR11}}
The '''Wikimedia Foundation Summer of Research''' (WSoR) has brought eight academic researchers to study longterm participation trends in Wikipedia. In light of the results of the [[:strategy:Editor Trends Study|Editor Trends Study]] and the Board's [[:wmf:Resolution:Openness|resolution on openness]], this multidisciplinary team will be using week-long group sprints to answer detailed questions related to participation.
From June through August 2011, the '''Wikimedia Foundation Summer of Research''' (WSoR) brought a group of researchers to study long-term participation trends in Wikipedia using a multidisciplinary approach.


==The research team==
From June through August 31st of 2011, the project will run sprints that provide both qualitative and quantitative explanations of how new editors interact with Wikipedia, as well as data-driven recommendations for exactly what can be done to improve retention. You can read about the team in the [http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/ announcement].
Led by [[User:Drdee|Diederik van Liere]], [[User:Buickmackane|Maryana Pinchuk]], and [[User:Steven (WMF)|Steven Walling]], the Foundation had the pleasure of having the following researchers visit us for summer 2011:


* [[User:Staeiou|R. Stuart Geiger]] is a PhD candidate, UC Berkeley School of Information, focusing on knowledge production in distributed and decentralized environments -- specifically Wikipedia and scientific research networks. He has been a Wikipedia editor since 2004 has been studying the project as an ethnographer since 2007. His current research explores the relationship between technical infrastructures and social structures, and he has written on bots, vandal fighting, administration, and the history of Wikipedia.
If you're interested in an example of the kind of work that the summer will be producing, Maryana Pinchuk and Steven Walling produced three preliminary sprints on the topic of [[Research:Communication to New Editors 2004-2011|Communication to New Editors 2004-2011]]. Current sprints from the project will be listed on this page. Data and code produced will be released under a free license.
* [[User:EpochFail|Aaron Halfaker]] is a PhD candidate of Computer Science at the University of Minnesota, GroupLens Research, focusing on Computer-mediated human interaction. Aaron started editing Wikipedia four years ago and quickly found his niche creating user scripts to find ways of improving the collaborative experience. His research explores mechanisms for motivating and supporting volunteer collaboration.
* [http://www.cs.mcgill.ca/~fkaeli/ Fabian Kaelin] is a Master of Science candidate from McGill University, focused on machine learning.
* [http://melaniekill.com/ Melanie Kill] is Assistant Professor of English at the University of Maryland, specializing in digital rhetoric and genre studies. She is currently at work on a book on Wikipedia and the history of the genre of the encyclopedia. She earned her PhD in Rhetoric and Language Studies from University of Washington and previously has taught at Texas Christian University.
* [[User:Junkie.dolphin|Giovanni Luca Ciampaglia]] is a PhD candidate at the University of Lugano in Switzerland. Giovanni is a computer scientist who studies user involvement in commons-based peer production communities, group consensus and collective deliberation processes.
* [[User:Whym|Yusuke Matsubara]] is a PhD student, University of Tokyo (Japan), studying computational linguistics. His research focus is in analysing how people write and read from a computational and empirical point of view. Since 2008, he has been an occasional writer, translator and programmer for Wikimedia.
* [[User:Jtmorgan|Jonathan Morgan]] is a PhD candidate, University of Washington, studying social interaction on collaborative online creative environments. As a researcher, he is particularly interested in tracing connections between the things people say (and the way they say them) and their roles, goals and activities online. He also works on the design of tools for improving public deliberation on the web, and on practical tools for internet researchers.
* [http://students.washington.edu/stw3/ Shawn Walker] is a PhD candidate at the University of Washington iSchool, and studies digital government and public engagement.


== Research questions ==
== Research questions ==
In light of the results of the [[:strategy:Editor Trends Study|Editor Trends Study]] and the Board's [[:wmf:Resolution:Openness|resolution on openness]], the team used week-long group sprints to answer detailed questions related to participation. The following [[Research:Wikimedia Summer of Research 2011/Questions|lists of questions]] helped guide our inquiry, though a precise list of the topics covered are available below.
:[[Research:Wikimedia Summer of Research 2011/Questions]]
The draft set of research questions is still evolving, but is a good look at the set of topics this summer will focus on. Please feel free to comment on on the question set, especially looking to point out existing research and ways that the questions can be clearer to those outside the Summer of Research team.


==Sprints==
==Conclusions==
Please see our [[Research:Wikimedia Summer of Research 2011/Summary of Findings|Summary of Findings]].
To create a new sprint:
# Use this form to create the sprint page. Make sure to give your sprint a name.<inputbox>
type=create
editintro=Research:Projects/2011_Summer_of_Research/Sprints/Editintro
preload=Research:Projects/2011_Summer_of_Research/Sprints/Template
default=Research:Sprint Name
buttonlabel=Create Sprint
bgcolor=inherit
width=50
</inputbox>
# Add a link to the sprint page under the appropriate space below.
# Start researchin'


===Past research sprints===
==List of research project pages==
{{Commons category|Summer of Research 2011}}
May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.
May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


Line 66: Line 63:
|image=Patrol_months.per_user.png
|image=Patrol_months.per_user.png
|contact=EpochFail
|contact=EpochFail
|description=The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been deceasing.
|description=The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been decreasing.
}}
}}
{{Sprint summary
{{Sprint summary
Line 94: Line 91:
}}
}}
{{Sprint summary
{{Sprint summary
|image=WhosGivingWikilove.png
|title=New user help requests
|title=Classifying wikilove messages
|image=UserHelpSpaces.png
|contact=Jtmorgan
|contact=Jtmorgan
|description=This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.
|description= Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the <nowiki>{{helpme}}</nowiki> template, even though many of them received Welcome templates that included 'Helpme' instructions''<nowiki>*</nowiki>''
}}
}}
{{Sprint summary
{{Sprint summary
Line 107: Line 104:
{{Sprint summary
{{Sprint summary
|title=Rhetoric of the welcome message
|title=Rhetoric of the welcome message
|image=Template_W-basic_screenshot.png
|image=Template W-basic screenshot.png
|contact=Drkill
|contact=Drkill
|description=This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.
|description=This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.
Line 118: Line 115:
{{Sprint summary
{{Sprint summary
|title=The Speed of Speedy Deletions
|title=The Speed of Speedy Deletions
|image=
|contact=Staeiou
|contact=Staeiou
|description=Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.
|description=Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.
}}
{{Sprint summary
|title=New user help requests
|image=UserHelpSpaces.png
|contact=Jtmorgan
|description= Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the <nowiki>{{helpme}}</nowiki> template, even though many of them received Welcome templates that included 'Helpme' instructions''<nowiki>*</nowiki>'' Full research report on this and related sprints [[Research:NewUserHelp-FullResearchReport|here]].
}}
}}
{{Sprint summary
{{Sprint summary
Line 126: Line 130:
|contact=Jtmorgan
|contact=Jtmorgan
|contact2=Swalker
|contact2=Swalker
|description=Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage.
|description=Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints [[Research:NewUserHelp-FullResearchReport|here]].
}}
}}
{{Sprint summary
{{Sprint summary
Line 169: Line 173:
|description=About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!
|description=About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!
}}
}}
{{Sprint summary

|image=Added total nobots.png

|title=WikiPride
|contact=declerambaul
|description=This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.
}}
{{Sprint summary
|image=Edits to English Wikipedia trending articles.png
|title=Trending articles and new editors
|contact=Whym
|description=This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.
}}
{{Sprint summary
|image=
|title=Wikiproject Participation & Mentorship
|contact=jtmorgan
|contact2=swalker
|description=This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.
}}
{{Sprint summary
|image=2011 top20 byNewMembers.png
|title=Visualizing Wikiproject Activity
|contact=jtmorgan
|contact2=swalker
|description=In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.
}}
{{Sprint summary
|image=Redlinks-histo-full.png
|title=One Link, Two Links, Red Links, Blue Links
|contact=Staeiou
|description=This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.
}}
{{Sprint summary
|title=Editor classes
|image=Editor classes recruitment.png
|contact=Zackexley
|description=This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.
}}
<br style="clear:both;"/>
<br style="clear:both;"/>


==Data and code==
===Current research sprints===
Where they are comprised of public, freely-licensed Wikipedia data, we will be releasing the datasets used to complete our summer's work, as well as the code/queries used to produce them.
* [[Research:Rhetoric of the welcome message|Rhetoric of the welcome message]] -- [[User:Drkill|Drkill]]

* [[Research:Measuring_overall_contribution_of_editors|Measuring overall contribution of editors]] -- [[User:declerambaul|declerambaul]] [[User:Drdee|Diederik]]
* [[WSoR datasets]]
** Results at: [[/WikiPride]]
* [[Research:Query Library]]
* [[Research:Projects/2011_Summer_of_Research/New_User_Participation_in_Deletion_Processes|New User Participation in Deletion Processes]] --[[User:Staeiou|Staeiou]]
* [[svn:trunk/tools/wsor/]]
* [[Research:MDM - The Magical Difference Engine|The Magical Difference Engine]] -- [[User:swalker|Swalker]] [[User:declerambaul|declerambaul]], [[User:Whym|Whym]]
* [[Research:Wikiproject_Participation_%26_Mentorship|Wikiproject Participation and Mentorship]] -- [[User:swalker|Swalker]] [[User:jtmorgan|Jtmorgan]]
* [[Research:Trending articles and new editors|Trending articles and new editors]] -- [[User:Whym|Whym]]
* [[Research:One_Link,_Two_Links,_Red_Links,_Blue_Links|One Link, Two Links, Red Links, Blue Links]] -- [[User:Staeiou|Staeiou]]

Latest revision as of 15:52, 26 June 2021

Contact
Diederik van Liere
Maryana Pinchuk
Steven Walling
This page documents a completed research project.
Shortcut:
R:WSOR11

From June through August 2011, the Wikimedia Foundation Summer of Research (WSoR) brought a group of researchers to study long-term participation trends in Wikipedia using a multidisciplinary approach.

The research team[edit]

Led by Diederik van Liere, Maryana Pinchuk, and Steven Walling, the Foundation had the pleasure of having the following researchers visit us for summer 2011:

  • R. Stuart Geiger is a PhD candidate, UC Berkeley School of Information, focusing on knowledge production in distributed and decentralized environments -- specifically Wikipedia and scientific research networks. He has been a Wikipedia editor since 2004 has been studying the project as an ethnographer since 2007. His current research explores the relationship between technical infrastructures and social structures, and he has written on bots, vandal fighting, administration, and the history of Wikipedia.
  • Aaron Halfaker is a PhD candidate of Computer Science at the University of Minnesota, GroupLens Research, focusing on Computer-mediated human interaction. Aaron started editing Wikipedia four years ago and quickly found his niche creating user scripts to find ways of improving the collaborative experience. His research explores mechanisms for motivating and supporting volunteer collaboration.
  • Fabian Kaelin is a Master of Science candidate from McGill University, focused on machine learning.
  • Melanie Kill is Assistant Professor of English at the University of Maryland, specializing in digital rhetoric and genre studies. She is currently at work on a book on Wikipedia and the history of the genre of the encyclopedia. She earned her PhD in Rhetoric and Language Studies from University of Washington and previously has taught at Texas Christian University.
  • Giovanni Luca Ciampaglia is a PhD candidate at the University of Lugano in Switzerland. Giovanni is a computer scientist who studies user involvement in commons-based peer production communities, group consensus and collective deliberation processes.
  • Yusuke Matsubara is a PhD student, University of Tokyo (Japan), studying computational linguistics. His research focus is in analysing how people write and read from a computational and empirical point of view. Since 2008, he has been an occasional writer, translator and programmer for Wikimedia.
  • Jonathan Morgan is a PhD candidate, University of Washington, studying social interaction on collaborative online creative environments. As a researcher, he is particularly interested in tracing connections between the things people say (and the way they say them) and their roles, goals and activities online. He also works on the design of tools for improving public deliberation on the web, and on practical tools for internet researchers.
  • Shawn Walker is a PhD candidate at the University of Washington iSchool, and studies digital government and public engagement.

Research questions[edit]

In light of the results of the Editor Trends Study and the Board's resolution on openness, the team used week-long group sprints to answer detailed questions related to participation. The following lists of questions helped guide our inquiry, though a precise list of the topics covered are available below.

Conclusions[edit]

Please see our Summary of Findings.

List of research project pages[edit]

May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


Quality of PPI editor work -- Drdee This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.

Newbie reverts and article length -- EpochFail Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.

Newbie teaching strategy trends -- Staeiou, Drkill, Jtmorgan Wikipedian teaching strategies are shifting in two significant ways:

  • a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
  • a decline in personalized teaching corresponded with an increase in templated instruction


Patroller work load -- EpochFail The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been decreasing.

Alternative lifecycles of new users -- Staeiou, Jtmorgan New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.

Ignored period and retention -- Whym Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.

Newbie reverts and subsequent editing behavior -- Swalker Editor retention has been decreasing over time. The negative effect of a revert has increased over time

Deletion notifications to new users -- Staeiou There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.

Classifying wikilove messages -- Jtmorgan This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.

Anonymous edits -- declerambaul IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki

Rhetoric of the welcome message -- Drkill This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.

Sentiment analysis tool of new editor interaction -- Whym This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

New user help requests -- Jtmorgan Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the {{helpme}} template, even though many of them received Welcome templates that included 'Helpme' instructions* Full research report on this and related sprints here.

New User Participation in Help Spaces -- Jtmorgan, Swalker Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints here.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

New editor welcome wishlist -- Drkill Describes features new editors might find most useful in welcome message templates.

File:Vandal revert 50 prop.by month.png

Vandal fighter work load -- EpochFail There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.

First edit session -- EpochFail For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment now than they used to. The more initial investment, the more negative the effect of rejection.

WikiPride -- declerambaul A visualization method is presented that can be used to analyze trends for any cohort centric statistic. This visualization method is then used to show contributions of editor cohorts and how those contributions compare to previous years' cohorts.

Editor lifecycle -- Junkie.dolphin This research is looking at the evolution of contributors activity over the years by analyzing statistical regularities in collective patterns of editing activity

Lag between registration and first edit -- Junkie.dolphin, Staeiou About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!

WikiPride -- declerambaul This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.

Trending articles and new editors -- Whym This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.

Wikiproject Participation & Mentorship -- jtmorgan, swalker This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.

Visualizing Wikiproject Activity -- jtmorgan, swalker In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.

One Link, Two Links, Red Links, Blue Links -- Staeiou This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.

Editor classes -- Zackexley This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.

Data and code[edit]

Where they are comprised of public, freely-licensed Wikipedia data, we will be releasing the datasets used to complete our summer's work, as well as the code/queries used to produce them.