Research:Wikimedia Summer of Research 2011/Summary of Findings

From Meta, a Wikimedia project coordination wiki
Summer research in progress
The WSOR team presented these findings at the Wikimedia Foundation in August

This is the summary of findings from the Wikimedia Foundation Summer of Research, a program in the Foundation's Community Department which brought together eight academic researchers from around the globe to study the dynamics of the Wikipedia editing community.

The multidisciplinary group employed a variety of qualitative and quantitative methodologies to tackle some of the most pressing questions about current and historic editor trends. Guided by the results of the Editor Trends Study, the project primarily focused on barriers to participation by new Wikipedians.

This page is intended as a summary of some of their most important findings, as well as a guide to the documentation pages for each topic, where the research methods and conclusions are described in more detail. Feel free to discuss anything you might have questions or comments about on the Talk page.

Please note: unless otherwise stated below, the majority of our studies focused on English Wikipedia in order to scale our analyses to the largest dataset available first, as well to augment our work with qualitative data. The tools and data used to perform the summer's work are freely licensed, and there is documentation available if you would like to replicate any part of the summer's work.

How new Wikipedians join

While the majority of our team's work during the summer focused on the work of editors after they register and become new Wikipedians, some of our research do address facets of anonymous contribution, the content new contributors encounter, and the process of moving from registration to actively editing for the first time. The following projects cover these aspects of how new Wikipedians enter the system:

Anonymous contributions

(Documentation)

Question
  • How does the overall editing pattern of anonymous contributors compare to registered editors, especially new ones?
Key conclusion
  • Editing by anonymous users is declining faster than registered users (see charts), but as of today it still accounts for roughly a fifth of all edits on English Wikipedia.

Edits by anonymous users over time Edits by registered users over time

Registration and first edits

(Documentation)

Questions
  • How many people who register ever edit?
  • How long does it take new Wikipedians to engage with the site through their first edit?
Key conclusions
  • Only 30% of registered users on English Wikipedia have ever edited (see chart)
  • Most users who register an account and edit do so within an hour. (see chart)
  • There is another distinct cohort of users who wait months or even years before making their first edit. (see chart )

Average time between registration and first edit English Wikipedia registration and edit counts

Edits to trending articles

(Documentation)

Question
  • Are trending articles (i.e. those popular among readers, such as breaking news) an entry vector for new contributors?
Key conclusion
  • More than half of the edits to trending articles, whether they are semi-protected or not, come from experienced registered users and IP addresses with lots of prior editing history, not newly-registered editors or anonymous editors new to editing. (see charts)

Edits to trending articles Edits to trending articles, excluding those semi-protected

The ratio of red links to blue links

(Documentation)

Questions
  • Red links are one measurement of how much the encyclopedia is complete, and can entice new editors to contribute. What is the ratio of red to blue links, and has this been changing over time?
  • What topics have the most incoming links of either kind?
Key conclusions
  • There are still hundreds of thousands of red-linked articles, many of them linked from multiple pages in the encyclopedia: 95,000 red-links are linked 30 or more times. (see chart)
  • The most-wanted articles measured by incoming red links are on foreign history, culture, and biographies.

Red-linked articles on English Wikipedia by number of times they are linked on the site

How Wikipedians contribute to the encyclopedia over time

The summer's work also continued the kind of large scale quantitative analysis of the Editor Trends Study, but instead looking at how Wikipedians, especially new Wikipedians, contribute over their lifetime as editors. These analyses have given us a new look into who exactly is contributing to the different namespaces of Wikipedia and how their patterns ebb and flow over time:

Cohort contribution analysis

Research project page

Question
  • How do different cohorts add and remove content from articles?
Key conclusions
  • A large proportion of the content (measured in bytes or characters, not counting deleted articles) in the main namespace is added by editors who joined recently. (see chart)
  • Similar to the conclusions of Editor Trends Study, it appears there is a natural growth and then slow tapering off of main namespace contributions by a cohort

Megabytes added to English Wikipedias main namespace, by cohort

Editor lifecycles

Research project page

Question
  • Are there identifiable patterns in the overall lifecycle of editing activity in the project?
Key conclusions
  • Editors go through a natural editing cycle -- a period of ramp-up, steady contribution, and eventual tapering off of activity. (see charts)
  • For heavy editors, the initial period is longer, but the middle phase of steady contribution is also much more lengthy and productive than for light editors. (see charts)
  • Very active contributors have generally remained unchanged in their patterns over the years, while low activity editors today contribute less than low activity editors from previous years

Activity rate over time for users whose first edit occurred on January, 2006, with activity rate equal to 1e-4 and 1e-5 edits/sec, respectively Activity over time for editors whose first edit occurred in January, 2006; editing activity bins a = 1e-6, 1e-7

Community interaction with new Wikipedians

In order to better understand the causes of the decline in retention of new Wikipedians, it was absolutely vital that we study closely the interactions between experienced editors and those new to the project. The following studies have helped us get a better understanding of how the current and historical dynamics of Wikipedia's community:

A new Wikipedian's first edit session

Research project page

Question
  • How does rejection of the first contributions by a new editor impact their participation?
Key conclusions
  • Having their contributions rejected (deleted or reverted) makes editors less likely to stick around on Wikipedia, independent of all other variables. (see chart)
  • Rejection tends to hurt highly invested new editors (those who make more edits in their first editing session) the most.
  • The rise of rejection correlates to the drop in new editor survival, and the two lines meet around 2007. (see chart)
  • New editors are demonstrating less initial investment (as measured by number of edits in their first editing session) in Wikipedia now than in the past.

Rejection and survival rates of new editors over time

Reverts and article length

Research project page

Question
  • We know the average article length is growing. How has this impacted new editors?
Key conclusions
  • Newbies are editing longer articles now than in the past. (see chart)
  • Over the entire history of Wikipedia, editing longer articles increases your likelihood of being reverted.

Length of pages edited by newbies, per year

The impact of being ignored

Research project page

Question
  • Does being ignored entirely make a difference for retention of new editors?
Key conclusions
  • Before 2006, new editors who spent a long time editing without receiving any messages on their talk page were less likely to stick around in Wikipedia. (see chart)
  • After 2006, this changed: new editors who received no talk page messages were more likely to continue editing longer than those who did. (see chart)
  • This change might be explained by the rise of template warnings around 2006. (see next set of findings)

Ignored period and new user retention

How we communicate with new Wikipedians

Research project page, Research project page, Research project page

Question
  • How has communication to new editors (on their user talk pages) changed over time?
Key conclusions
  • Since 2004, there has been a significant drop in messages including praise and thanks, corresponded with an increase in the overlap of teaching/instructional communication with criticism.
  • Currently, about 80% of messages to new users come from bots or semi-automated editing tools like Twinkle and Huggle. (see charts)
  • Currently, about 65% of the communications to new users are warning templates on their talk pages. (see charts)

First messages to new users over time, proportional First messages to new users over time, raw numbers

Research project page

Question
  • How has the content added to the different namespaces of Wikipedia changed over time?
Key conclusions
  • From 2006 onward, there was a tremendous increase in bytes added to the user talk space. (see chart)
  • Template warnings were created in 2006 and rose steadily in number over the years, and they have contributed heavily to this byte increase in user talk.

Bytes added to English Wikipedia, by namespace

Research project page, Research project page

Question
  • Can rewriting templates to be more personalized and teach editors about the community have an impact on their retention and the quality of contributions in the future?
Key conclusions
  • Rhetorical analysis of the most commonly used welcome templates shows that, while their appearance has changed significantly over time, their style and content has not.
  • Virtually all templates, both welcoming and warning, are written in passive, institutionalized language that appears highly impersonal to new editors.
  • Changing the language of the standard Huggle warning template to be more personalized dramatically changed the editing patterns of new users who received a warning.
  • Blatant vandals who received a personalized warning were less likely to continue vandalizing.
  • Good-faith editors who received a personalized warning were more likely to ask constructive questions on the talk page of the user who warned them.
  • About 10% of the reverts were false positives and should not have been reverted.

Barchart of the proportion of editors who make good contact with the reverting editor after receiving various messages as part of the huggle experiment. Barchart of the proportion of editors who continue editing after receiving various messages as part of the huggle experiment.

The impact of deletion

Speedy deletion (CSD)

Research project page

Question
  • How does speedy deletion impact new editors?
Key conclusions
  • Speedy deletion accounts for about 60% of deletion
  • The average time for a newly created article to be tagged for speedy deletion is 2 minutes.
  • The average article tagged for CSD is deleted in half an hour.
  • Most articles that get tagged for speedy deletion, about 37%, are tagged A7: not notable.
  • The next highest CSD tag (G11: Unambiguous advertising or promotion) accounts for only 8% of all CSDs.

Articles for Deletion (AfD)

Research project page, Research project page, Research project page

Question
  • How do Articles for Deletion discussions and nomination impact new editors?
Key conclusions
  • The number of deletion notifications sent to new users has risen dramatically since 2004.
  • Despite an increase in notifications, the vast majority of article creators do not participate in Articles for Deletion discussions about articles they started (see chart)

AFD participation by article creators

New Page Patrol

Research project page

Question
  • Who does the majority of work in New Page Patrol?
  • Has the workload of individual New Page Patrollers increased or decreased over time?
Key conclusions
  • Patroller work on Wikipedia follows a power law curve, meaning a small number of people do a significant majority of the work.
  • The workload of patrollers (both the average number of patrolled pages per year and per month) has been decreasing since 2007, measured by patrolling actions in the logs.
  • Since 2008, about 30% of the top patrollers are bots.

Workload for the top 50 new page patrollers over time

Vandal-fighters

Research project page

Question
  • How can we identify very active vandal-fighters?
  • Has the workload of individual vandal-fighters increased or decreased over time?
Key conclusions
  • As with patrolling, vandal-fighting follows a power law curve.
  • The workload of vandal-fighters is also decreasing.
  • This could be related to the fact that vandalism overall has been decreasing since 2007, as well as because bots such as ClueBot and XLinkBot are doing a large amount of work today.

Plot of the log10 number of vandal fighters (> 5 reverts for vandalism per month) and the log10 number of active editors (> 5 edits per month) from the English Wikipedia

How new Wikipedians learn and collaborate in community spaces

Last but not least, we took some time this summer to study how new Wikipedians become a part of community activies outside of articles and their associated discussion pages:

The Project namespace

Research project page

Question
  • How much do new Wikipedians participate in key community spaces outside the main namespace?
Key conclusions
  • Relatively few new Wikipedians add content to the Project (i.e. Wikipedia) namespace and its associated Talk pages (which includes policy or guidelines, WikiProjects, and other maintenance material).

Megabytes added to English Wikipedia namespace four and five

Research project page

Question
  • How much do new Wikipedians participate in key community spaces outside the main namespace?
Key conclusions
  • New users are participating in all community spaces less and less, about 5% of new users in 2008 as compared to 20% in 2004.

New user participation in different community spaces over time

Help requests

Research project page

Question
  • How and where do new Wikipedians request assistance?
Key conclusions
  • New users do not often seek out help on-wiki.
  • When they do ask for help, they do not use traditional help spaces: most help requests seem to happen in Talk and User talk namespaces.

Locations of help requests from new users

Research project page

Question
  • What do new Wikipedians ask for help with?
Key conclusions
  • The most common type of help requested by new users is editing policy.
  • Technical editing questions are the second-most common.
  • Requests like asking for tasks or inquiring about behavior policy are very rare.

Help request topics

WikiProjects

Research project page, Research project page

Questions
  • How do Wikipedians indicate their membership in WikiProjects?
  • How has membership in WikiProjects changed over time?
  • Do new Wikipedians tend to join WikiProjects?
  • Which WikiProjects are best at attracting active members, especially new Wikipedians?
  • What impact does WikiProject membership have on contributions by both new and experienced Wikipedians?
Key conclusions
  • While some WikiProjects continue to thrive, many have ceased to be active.
  • Most users who join WikiProjects are not newbies.
  • However, newbies who join a WikiProject early in their editing lifecycle go on to make more edits, both in and out of the Wikiproject space. (see chart)

Editing activity before and after joining a WikiProject Current top 20 Wikiprojects by new members