Research:Wikimedia Summer of Research 2011: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Steven (WMF) (talk | contribs)
m Reverted changes by 2603:7000:E300:4E0E:F4A1:936F:7233:F706 (talk) to last version by Tegel
 
(163 intermediate revisions by 25 users not shown)
Line 1: Line 1:
{{WSOR11}}{{Research Project
This Summer the Wikimedia Foundation will be bringing in a handful of graduate students to work with the Community Department, led by [[:mw:User:Diederik|Diederik van Liere]] and [[User:Buickmackane|Maryana Pinchuk]], on a few months of rapid iterations on vital research questions related to the recruitment and retention of new editors on Wikipedia. This page is a placeholder for links to our announcements and preliminary research. If you have any questions or are a Wikimedian interested in participating in either quantitative or qualitative research, please comment on the Talk page.
|title=Wikimedia Foundation Summer of Research 2011
|contact={{Investigator|Diederik van Liere<br />Maryana Pinchuk<br />Steven Walling|Wikimedia Foundation}}
|cis={{Investigator|Aaron Halfaker|University of Minnesota}} {{Investigator|R. Stuart Geiger|UC Berkeley}} {{Investigator|Melanie Kill|University of Maryland}}{{Investigator|Fabian Kaelin|McGill University}} {{Investigator|Jonathan Morgan<br />Shawn Walker|University of Washington}} {{Investigator|Yusuke Matsubara|University of Tokyo}} {{Investigator|Giovanni Luca Ciampaglia|University of Lugano}}
|start-year=2011
|start-month=06
|status=completed
|field=md
|open-data=yes
|open-access=yes
|wmf-support={{WMF-support|ho}}
|featured=yes
}}
{{Shortcut|R:WSOR11}}
From June through August 2011, the '''Wikimedia Foundation Summer of Research''' (WSoR) brought a group of researchers to study long-term participation trends in Wikipedia using a multidisciplinary approach.


==The research team==
==Preliminary work==
Led by [[User:Drdee|Diederik van Liere]], [[User:Buickmackane|Maryana Pinchuk]], and [[User:Steven (WMF)|Steven Walling]], the Foundation had the pleasure of having the following researchers visit us for summer 2011:
These datasets and analyses are mostly test cases for the rest of the work for the summer, but do suggest some interesting trends nonetheless. Our basic methodologies are described below.
=== Assessing quality of the first edits made by new editors, 2004 and 2011 ===
How many contributions by new editors are made in good faith and are worth retaining or improving? Are most edits by newbies vandalism or spam, or are they made primarily in good faith?


* [[User:Staeiou|R. Stuart Geiger]] is a PhD candidate, UC Berkeley School of Information, focusing on knowledge production in distributed and decentralized environments -- specifically Wikipedia and scientific research networks. He has been a Wikipedia editor since 2004 has been studying the project as an ethnographer since 2007. His current research explores the relationship between technical infrastructures and social structures, and he has written on bots, vandal fighting, administration, and the history of Wikipedia.
We selected a randomized sample of first edits by contributors who joined in April 2004 and in April 2011, derived via simple SQL query run against the [[toolserver]]. We then analyzed these edits by hand, ranking the first edit on a 1-5 scale, with one being pure vandalism and five being a well-referenced content addition indistinguishable from the edit of an experienced contributor. We also noted when the first edit was not a mainspace contribution, and whether that was vandalism or not.
* [[User:EpochFail|Aaron Halfaker]] is a PhD candidate of Computer Science at the University of Minnesota, GroupLens Research, focusing on Computer-mediated human interaction. Aaron started editing Wikipedia four years ago and quickly found his niche creating user scripts to find ways of improving the collaborative experience. His research explores mechanisms for motivating and supporting volunteer collaboration.
* [http://www.cs.mcgill.ca/~fkaeli/ Fabian Kaelin] is a Master of Science candidate from McGill University, focused on machine learning.
* [http://melaniekill.com/ Melanie Kill] is Assistant Professor of English at the University of Maryland, specializing in digital rhetoric and genre studies. She is currently at work on a book on Wikipedia and the history of the genre of the encyclopedia. She earned her PhD in Rhetoric and Language Studies from University of Washington and previously has taught at Texas Christian University.
* [[User:Junkie.dolphin|Giovanni Luca Ciampaglia]] is a PhD candidate at the University of Lugano in Switzerland. Giovanni is a computer scientist who studies user involvement in commons-based peer production communities, group consensus and collective deliberation processes.
* [[User:Whym|Yusuke Matsubara]] is a PhD student, University of Tokyo (Japan), studying computational linguistics. His research focus is in analysing how people write and read from a computational and empirical point of view. Since 2008, he has been an occasional writer, translator and programmer for Wikimedia.
* [[User:Jtmorgan|Jonathan Morgan]] is a PhD candidate, University of Washington, studying social interaction on collaborative online creative environments. As a researcher, he is particularly interested in tracing connections between the things people say (and the way they say them) and their roles, goals and activities online. He also works on the design of tools for improving public deliberation on the web, and on practical tools for internet researchers.
* [http://students.washington.edu/stw3/ Shawn Walker] is a PhD candidate at the University of Washington iSchool, and studies digital government and public engagement.


== Research questions ==
Results are described at: "[https://blog.wikimedia.org/blog/2011/04/15/neweditorsquality/ How much do new editors actually improve Wikipedia?]"
In light of the results of the [[:strategy:Editor Trends Study|Editor Trends Study]] and the Board's [[:wmf:Resolution:Openness|resolution on openness]], the team used week-long group sprints to answer detailed questions related to participation. The following [[Research:Wikimedia Summer of Research 2011/Questions|lists of questions]] helped guide our inquiry, though a precise list of the topics covered are available below.


==Conclusions==
We'll publish the totals data shortly, but the actual samples will not be distributed to avoid calling out individual editors by name.
Please see our [[Research:Wikimedia Summer of Research 2011/Summary of Findings|Summary of Findings]].


==List of research project pages==
=== The type and tone of user talk page edits directed at new editors within their first 30 days ===
{{Commons category|Summer of Research 2011}}
As a follow up experiment to the previous one, which gave us an idea of how many new editors made valuable contributions according to Wikipedia standards, we wanted to look at how these good faith contributors were being communicated with on their user talk pages early on.
May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


{{Sprint summary
====Process====
|image=Summer_of_Research_-_Comparing_PPI_editors_%26_regular_editors_by_cum._edit_count_other_ns.png
We prepared another random sample of several hundred edits made to user talk pages of new registered users on English Wikipedia from 2004 through 2011. These edits were made by other contributors within 30 days of a new person’s first edit.
|title=Quality of PPI editor work
|contact=Drdee
|description=This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.
}}
{{Sprint summary
|image=Length_of_page_for_newbie_edits.by_year.png
|title=Newbie reverts and article length
|contact=EpochFail
|description=Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.
}}
{{Sprint summary
|title=Newbie teaching strategy trends
|image=Message-personal-template.png
|contact=Staeiou
|contact2=Drkill
|contact3=Jtmorgan
|description=Wikipedian teaching strategies are shifting in two significant ways:
* a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
* a decline in personalized teaching corresponded with an increase in templated instruction
}}
{{Sprint summary
|title=Patroller work load
|image=Patrol_months.per_user.png
|contact=EpochFail
|description=The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been decreasing.
}}
{{Sprint summary
|title=Alternative lifecycles of new users
|image=Wsor-6jun-sprint-new-users-deletions.png
|contact=Staeiou
|contact2=Jtmorgan
|description=New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.
}}
{{Sprint summary
|title=Ignored period and retention
|image=Ignored_period_and_retention_30_edits_EN_wiki.svg
|contact=Whym
|description=Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.
}}
{{Sprint summary
|title=Newbie reverts and subsequent editing behavior
|image=Percent_retained_sbs.png
|contact=Swalker
|description=Editor retention has been decreasing over time. The negative effect of a revert has increased over time
}}
{{Sprint summary
|title=Deletion notifications to new users
|image=Deletion_notifications_to_new_users.png
|contact=Staeiou
|description=There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.
}}
{{Sprint summary
|image=WhosGivingWikilove.png
|title=Classifying wikilove messages
|contact=Jtmorgan
|description=This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.
}}
{{Sprint summary
|title=Anonymous edits
|image=Wsor-june13-enwiki-time-anon.png
|contact=declerambaul
|description= IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki
}}
{{Sprint summary
|title=Rhetoric of the welcome message
|image=Template W-basic screenshot.png
|contact=Drkill
|description=This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.
}}
{{Sprint summary
|title=Sentiment analysis tool of new editor interaction
|contact=Whym
|description=This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.
}}
{{Sprint summary
|title=The Speed of Speedy Deletions
|image=
|contact=Staeiou
|description=Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.
}}
{{Sprint summary
|title=New user help requests
|image=UserHelpSpaces.png
|contact=Jtmorgan
|description= Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the <nowiki>{{helpme}}</nowiki> template, even though many of them received Welcome templates that included 'Helpme' instructions''<nowiki>*</nowiki>'' Full research report on this and related sprints [[Research:NewUserHelp-FullResearchReport|here]].
}}
{{Sprint summary
|title=New User Participation in Help Spaces
|image=HelpRequestLocations.png
|contact=Jtmorgan
|contact2=Swalker
|description=Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints [[Research:NewUserHelp-FullResearchReport|here]].
}}
{{Sprint summary
|title=Software for quick processing of Wikidumps
|contact=EpochFail
|description=A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.
}}
{{Sprint summary
|title=New editor welcome wishlist
|contact=Drkill
|description=Describes features new editors might find most useful in welcome message templates.
}}
{{Sprint summary
|title=Vandal fighter work load
|image=Vandal_revert_50_prop.by_month.png
|contact=EpochFail
|description=There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.
}}
{{Sprint summary
|title=First edit session
|image=Survival_rate.rejection_rate.by_year_and_edit_session.png
|contact=EpochFail
|description= For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment now than they used to. The more initial investment, the more negative the effect of rejection.
}}
{{Sprint summary
|title=WikiPride
|image=Added_percentage_ns0.png
|contact=declerambaul
|description=A visualization method is presented that can be used to analyze trends for any cohort centric statistic. This visualization method is then used to show contributions of editor cohorts and how those contributions compare to previous years' cohorts.
}}
{{Sprint summary
|title=Editor lifecycle
|image=Timechart_2006-1_1e-4_1e-5_comparison.pdf
|contact=Junkie.dolphin
|description=This research is looking at the evolution of contributors activity over the years by analyzing statistical regularities in collective patterns of editing activity
}}
{{Sprint summary
|title=Lag between registration and first edit
|image=Ns0all.png
|contact=Junkie.dolphin
|contact2=Staeiou
|description=About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!
}}
{{Sprint summary
|image=Added total nobots.png
|title=WikiPride
|contact=declerambaul
|description=This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.
}}
{{Sprint summary
|image=Edits to English Wikipedia trending articles.png
|title=Trending articles and new editors
|contact=Whym
|description=This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.
}}
{{Sprint summary
|image=
|title=Wikiproject Participation & Mentorship
|contact=jtmorgan
|contact2=swalker
|description=This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.
}}
{{Sprint summary
|image=2011 top20 byNewMembers.png
|title=Visualizing Wikiproject Activity
|contact=jtmorgan
|contact2=swalker
|description=In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.
}}
{{Sprint summary
|image=Redlinks-histo-full.png
|title=One Link, Two Links, Red Links, Blue Links
|contact=Staeiou
|description=This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.
}}
{{Sprint summary
|title=Editor classes
|image=Editor classes recruitment.png
|contact=Zackexley
|description=This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.
}}
<br style="clear:both;"/>


==Data and code==
The sample was gathered using the [[Toolserver]], and the following query is an example of how the 2005 set was gathered. (If you want to run it on different years, simply change the timestamps.) In very early years, such as 2004, where there were fewer editors altogether, we limited the query to 500.
Where they are comprised of public, freely-licensed Wikipedia data, we will be releasing the datasets used to complete our summer's work, as well as the code/queries used to produce them.


* [[WSoR datasets]]
{{collapse top|SQL query to get the sample}}
* [[Research:Query Library]]
<source lang="SQL">
* [[svn:trunk/tools/wsor/]]
use enwiki_p;
select su.user_name,r.rev_id
from (SELECT u.user_id,u.user_name,u.user_registration,min(r.rev_timestamp) t
FROM user u
INNER JOIN revision r
ON u.user_id = r.rev_user
JOIN page p
ON r.rev_page = p.page_id
WHERE u.user_registration BETWEEN '20050201000000' AND '20050301000000' and u.user_id between 135000 and 235000
AND UNIX_TIMESTAMP(r.rev_timestamp) - UNIX_TIMESTAMP(u.user_registration) < (60*60*24*7)AND page_namespace = 1
GROUP BY u.user_id
LIMIT 500) su
INNER JOIN page p
ON su.user_name = p.page_title
INNER JOIN revision r
ON r.rev_page=p.page_id and r.rev_user != su.user_id
where p.page_namespace = 3
AND UNIX_TIMESTAMP(r.rev_timestamp) - UNIX_TIMESTAMP(su.t) < (60*60*24*30);
</source>
{{collapse bottom}}

The complete list of classification possibilities is below. If it was applicable, we noted multiple items per edit. For example: if the edit was the addition of a warning template, we marked "Template", "Tip, correction, or warning" and then assigned a tone depending on the contents of the template used.

* Content discussion and/or debate: any edit whose purpose was primarily to discuss or debate the content of encyclopedia articles.
* Template: any edit that was a template.
* Welcome: any edit that was obviously intended to welcome a new editor, either in template form or personalized.
* Tip, correction, warning: Any tip about future editing, correction about errors in past editing procedure or technique, or any warning to cease editing in violation of policy/guideline.
* Invitation: any invitation to edit a specific page or subject, such as a WikiProject invitation or a suggestion about an interesting topic.
* Praise: any form of praise, from personalized text to barnstars.
* Vandalism: any edit to the user talk page that was purely vandalism.
* Socializing: any edit that did not discuss the project directly, but instead consisted of socializing.
* Minor: any minor change in formatting, grammar, spelling, etc.

====Results====
Results are described at: "[https://blog.wikimedia.org/blog/2011/05/02/neweditorwarnings/ The Rise of Warnings to New Editors on English Wikipedia]". The totals data for the two items compared is below, but the actual samples will not be distributed to avoid calling out individual editors by name.

{| class="wikitable"
|+ Two types of edits made to the user talk pages of good faith editors, correlated with tone analysis
! Year
! Edits that included praise
! Edits that added a template with a negative tone
! Total number of edits analyzed
|-
|2004
|36
|0
|251
|-
|2005
|23
|0
|223
|-
|2006
|26
|11
|243
|-
|2007
|5
|24
|347
|-
|2008
|7
|33
|235
|-
|2009
|13
|36
|176
|-
|2010
|3
|50
|209
|-
|2011
|6
|84
|244
|-
|}

{| class="wikitable"
! Year
! Edits that included praise
! Edits adding a template with a negative tone
|-
|2004
|14.34%
|0
|-
|2005
|10.31%
|0
|-
|2006
|10.70%
|4.53%
|-
|2007
|1.44%
|6.92%
|-
|2008
|2.98%
|14.04%
|-
|2009
|7.39%
|20.45%
|-
|2010
|1.44%
|23.92%
|-
|2011
|2.46%
|34.4%
|-
|}

The totals calculated as a percent of the whole (in the sample) resulted in the following chart:

[[File:Praise versus Negative templates, English Wikipedia 2004-2011.png|center|Praise versus Negative templates, English Wikipedia 2004-2011]]

[[Category:Research]]

Latest revision as of 15:52, 26 June 2021

Contact
Diederik van Liere
Maryana Pinchuk
Steven Walling
This page documents a completed research project.
Shortcut:
R:WSOR11

From June through August 2011, the Wikimedia Foundation Summer of Research (WSoR) brought a group of researchers to study long-term participation trends in Wikipedia using a multidisciplinary approach.

The research team[edit]

Led by Diederik van Liere, Maryana Pinchuk, and Steven Walling, the Foundation had the pleasure of having the following researchers visit us for summer 2011:

  • R. Stuart Geiger is a PhD candidate, UC Berkeley School of Information, focusing on knowledge production in distributed and decentralized environments -- specifically Wikipedia and scientific research networks. He has been a Wikipedia editor since 2004 has been studying the project as an ethnographer since 2007. His current research explores the relationship between technical infrastructures and social structures, and he has written on bots, vandal fighting, administration, and the history of Wikipedia.
  • Aaron Halfaker is a PhD candidate of Computer Science at the University of Minnesota, GroupLens Research, focusing on Computer-mediated human interaction. Aaron started editing Wikipedia four years ago and quickly found his niche creating user scripts to find ways of improving the collaborative experience. His research explores mechanisms for motivating and supporting volunteer collaboration.
  • Fabian Kaelin is a Master of Science candidate from McGill University, focused on machine learning.
  • Melanie Kill is Assistant Professor of English at the University of Maryland, specializing in digital rhetoric and genre studies. She is currently at work on a book on Wikipedia and the history of the genre of the encyclopedia. She earned her PhD in Rhetoric and Language Studies from University of Washington and previously has taught at Texas Christian University.
  • Giovanni Luca Ciampaglia is a PhD candidate at the University of Lugano in Switzerland. Giovanni is a computer scientist who studies user involvement in commons-based peer production communities, group consensus and collective deliberation processes.
  • Yusuke Matsubara is a PhD student, University of Tokyo (Japan), studying computational linguistics. His research focus is in analysing how people write and read from a computational and empirical point of view. Since 2008, he has been an occasional writer, translator and programmer for Wikimedia.
  • Jonathan Morgan is a PhD candidate, University of Washington, studying social interaction on collaborative online creative environments. As a researcher, he is particularly interested in tracing connections between the things people say (and the way they say them) and their roles, goals and activities online. He also works on the design of tools for improving public deliberation on the web, and on practical tools for internet researchers.
  • Shawn Walker is a PhD candidate at the University of Washington iSchool, and studies digital government and public engagement.

Research questions[edit]

In light of the results of the Editor Trends Study and the Board's resolution on openness, the team used week-long group sprints to answer detailed questions related to participation. The following lists of questions helped guide our inquiry, though a precise list of the topics covered are available below.

Conclusions[edit]

Please see our Summary of Findings.

List of research project pages[edit]

May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.


Quality of PPI editor work -- Drdee This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.

Newbie reverts and article length -- EpochFail Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.

Newbie teaching strategy trends -- Staeiou, Drkill, Jtmorgan Wikipedian teaching strategies are shifting in two significant ways:

  • a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
  • a decline in personalized teaching corresponded with an increase in templated instruction


Patroller work load -- EpochFail The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been decreasing.

Alternative lifecycles of new users -- Staeiou, Jtmorgan New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.

Ignored period and retention -- Whym Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.

Newbie reverts and subsequent editing behavior -- Swalker Editor retention has been decreasing over time. The negative effect of a revert has increased over time

Deletion notifications to new users -- Staeiou There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.

Classifying wikilove messages -- Jtmorgan This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.

Anonymous edits -- declerambaul IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki

Rhetoric of the welcome message -- Drkill This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.

Sentiment analysis tool of new editor interaction -- Whym This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

New user help requests -- Jtmorgan Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the {{helpme}} template, even though many of them received Welcome templates that included 'Helpme' instructions* Full research report on this and related sprints here.

New User Participation in Help Spaces -- Jtmorgan, Swalker Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints here.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

New editor welcome wishlist -- Drkill Describes features new editors might find most useful in welcome message templates.

File:Vandal revert 50 prop.by month.png

Vandal fighter work load -- EpochFail There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.

First edit session -- EpochFail For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment now than they used to. The more initial investment, the more negative the effect of rejection.

WikiPride -- declerambaul A visualization method is presented that can be used to analyze trends for any cohort centric statistic. This visualization method is then used to show contributions of editor cohorts and how those contributions compare to previous years' cohorts.

Editor lifecycle -- Junkie.dolphin This research is looking at the evolution of contributors activity over the years by analyzing statistical regularities in collective patterns of editing activity

Lag between registration and first edit -- Junkie.dolphin, Staeiou About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!

WikiPride -- declerambaul This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.

Trending articles and new editors -- Whym This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.

Wikiproject Participation & Mentorship -- jtmorgan, swalker This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.

Visualizing Wikiproject Activity -- jtmorgan, swalker In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.

One Link, Two Links, Red Links, Blue Links -- Staeiou This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.

Editor classes -- Zackexley This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.

Data and code[edit]

Where they are comprised of public, freely-licensed Wikipedia data, we will be releasing the datasets used to complete our summer's work, as well as the code/queries used to produce them.