Research:Wikimedia Summer of Research 2011

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by EpochFail (talk | contribs) at 22:10, 27 July 2011 (→‎Past research sprints: BOLD replacement of complete projects list.). It may differ significantly from the current version.
Contact
Diederik van Liere
Maryana Pinchuk
Steven Walling

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Shortcut:
R:WSOR11

The Wikimedia Foundation Summer of Research (WSoR) has brought eight academic researchers to study longterm participation trends in Wikipedia. In light of the results of the Editor Trends Study and the Board's resolution on openness, this multidisciplinary team will be using week-long group sprints to answer detailed questions related to participation.

From June through August 31st of 2011, the project will run sprints that provide both qualitative and quantitative explanations of how new editors interact with Wikipedia, as well as data-driven recommendations for exactly what can be done to improve retention. You can read about the team in the announcement.

If you're interested in an example of the kind of work that the summer will be producing, Maryana Pinchuk and Steven Walling produced three preliminary sprints on the topic of Communication to New Editors 2004-2011. Current sprints from the project will be listed on this page. Data and code produced will be released under a free license.

Research questions

Research:Wikimedia Summer of Research 2011/Questions

The draft set of research questions is still evolving, but is a good look at the set of topics this summer will focus on. Please feel free to comment on on the question set, especially looking to point out existing research and ways that the questions can be clearer to those outside the Summer of Research team.

Sprints

To create a new sprint:

  1. Use this form to create the sprint page. Make sure to give your sprint a name.

  2. Add a link to the sprint page under the appropriate space below.
  3. Start researchin'

Past research sprints

May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


Quality of PPI editor work -- DrDee This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.

Newbie reverts and article length -- EpochFail Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.

Newbie teaching strategy trends -- Staeiou, Drkill, Jtmorgan Wikipedian teaching strategies are shifting in two significant ways:

  • a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
  • a decline in personalized teaching corresponded with an increase in templated instruction


Patroller work load -- EpochFail The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been deceasing.

Alternative lifecycles of new users -- Staeiou, Jtmorgan New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.

Ignored period and retention -- Whym Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.

Newbie reverts and subsequent editing behavior -- Swalker Editor retention has been decreasing over time. The negative effect of a revert has increased over time

Deletion notifications to new users -- Staeiou There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.

New user help requests -- Jtmorgan Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the {{helpme}} template, even though many of them received Welcome templates that included 'Helpme' instructions*

Anonymous edits -- declerambaul IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki

Rhetoric of the welcome message -- Drkill This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.

Sentiment analysis tool of new editor interaction -- Whym This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

New User Participation in Help Spaces -- Jtmorgan, Swalker Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

New editor welcome wishlist -- Drkill Describes features new editors might find most useful in welcome message templates.

File:Vandal revert 50 prop.by month.png

Vandal fighter work load -- EpochFail There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.

First edit session -- EpochFail For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment not than they used to. The more initial investment, the more negative the effect of rejection.

Current research sprints