Research:Wikimedia Summer of Research 2011

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Steven (WMF) (talk | contribs) at 18:37, 1 September 2011 (→‎Past research sprints). It may differ significantly from the current version.
Contact
Diederik van Liere
Maryana Pinchuk
Steven Walling

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Shortcut:
R:WSOR11

The Wikimedia Foundation Summer of Research (WSoR) has brought eight academic researchers to study longterm participation trends in Wikipedia. In light of the results of the Editor Trends Study and the Board's resolution on openness, this multidisciplinary team will be using week-long group sprints to answer detailed questions related to participation.

From June through August 31st of 2011, the project will run sprints that provide both qualitative and quantitative explanations of how new editors interact with Wikipedia, as well as data-driven recommendations for exactly what can be done to improve retention. You can read about the team in the announcement.

If you're interested in an example of the kind of work that the summer will be producing, Maryana Pinchuk and Steven Walling produced three preliminary sprints on the topic of Communication to New Editors 2004-2011. Current sprints from the project will be listed on this page. Data and code produced will be released under a free license.

Research questions

Research:Wikimedia Summer of Research 2011/Questions

The draft set of research questions is still evolving, but is a good look at the set of topics this summer will focus on. Please feel free to comment on on the question set, especially looking to point out existing research and ways that the questions can be clearer to those outside the Summer of Research team.

Sprints

To create a new sprint:

  1. Use this form to create the sprint page. Make sure to give your sprint a name.

  2. Add a link to the sprint page under the appropriate space below.
  3. Start researchin'

Past research sprints

May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


Quality of PPI editor work -- Drdee This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.

Newbie reverts and article length -- EpochFail Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.

Newbie teaching strategy trends -- Staeiou, Drkill, Jtmorgan Wikipedian teaching strategies are shifting in two significant ways:

  • a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
  • a decline in personalized teaching corresponded with an increase in templated instruction


Patroller work load -- EpochFail The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been deceasing.

Alternative lifecycles of new users -- Staeiou, Jtmorgan New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.

Ignored period and retention -- Whym Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.

Newbie reverts and subsequent editing behavior -- Swalker Editor retention has been decreasing over time. The negative effect of a revert has increased over time

Deletion notifications to new users -- Staeiou There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.

Classifying wikilove messages -- Jtmorgan This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.

Anonymous edits -- declerambaul IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki

Rhetoric of the welcome message -- Drkill This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.

Sentiment analysis tool of new editor interaction -- Whym This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

New user help requests -- Jtmorgan Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the {{helpme}} template, even though many of them received Welcome templates that included 'Helpme' instructions* Full research report on this and related sprints here.

New User Participation in Help Spaces -- Jtmorgan, Swalker Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints here.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

New editor welcome wishlist -- Drkill Describes features new editors might find most useful in welcome message templates.

File:Vandal revert 50 prop.by month.png

Vandal fighter work load -- EpochFail There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.

First edit session -- EpochFail For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment now than they used to. The more initial investment, the more negative the effect of rejection.

WikiPride -- declerambaul A visualization method is presented that can be used to analyze trends for any cohort centric statistic. This visualization method is then used to show contributions of editor cohorts and how those contributions compare to previous years' cohorts.

Editor lifecycle -- Junkie.dolphin This research is looking at the evolution of contributors activity over the years by analyzing statistical regularities in collective patterns of editing activity

Lag between registration and first edit -- Junkie.dolphin, Staeiou About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!

WikiPride -- declerambaul This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.

Trending articles and new editors -- Whym This sprint explores the effect of being ignored (having no messages on user talk) to a new Wikipedian. The results show that while in the past, being ignored tended to mean that new users would not stay on the site, now new users who are ignored tend to be retained more.

Wikiproject Participation & Mentorship -- jtmorgan, swalker This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.

Visualizing Wikiproject Activity -- jtmorgan, swalker In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.

One Link, Two Links, Red Links, Blue Links -- Staeiou This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.

Editor classes -- Zackexley This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.