Research:Post-edit feedback

Template:Editor Engagement Experiments Project The purpose of this experiment is to test whether various types of positive feedback after submission of an edit increase the productivity and retention of Wikipedia editors.

Background

Previous research demonstrated that feedback mechanisms have a positive effect on incentivizing repeated contributions in collaborative communities.^[1]^[2]. According to the currently available research literature, providing feedback to wiki editors -- whether that feedback is a simple confirmation, an expression of gratitude, or user contribution statistics -- offers a potential retention incentive, or a "feedback mechanism that encourage individuals to continue contributing over time".^[1]

Research questions

In this experiment, we are testing our hypotheses that:

Providing any kind of feedback after a user successfully edits will motivate them to edit more, as compared to users who receive no feedback post-edit.
Providing post-edit feedback specific to the user and their experience will be more effective at encouraging further participation, compared to generalized feedback or none at all.

Once we have proven or disproven the hypotheses above, we'd also like to answer more specific hypotheses about how much editors are motivated by post-edit feedback: Those hypotheses include:

Providing positive feedback will increase the number of edits a contributor makes in an edit session because their contributions are being acknowledged as they edit.
Providing positive feedback will shorten the time it takes new contributors to reach editing milestones, especially the 10, 50, and 100 edit marks because their contributions are being acknowledged. In the historical feedback test, the value of achieving contribution levels is reflected back to the editor.
Providing positive feedback improves the long-term retention of editors over 30, 60, and 90 days because new editors will be receiving constant affirmation for their contributions.
At what stage in an editor's lifecycle is feedback most effective, as measured by edit count?

Assumptions

Post-edit feedback will be most likely to have an impact on newbie editors who do not already have the expectation that they will not receive feedback.
Newbie editors may not always know that they've saved an edit without receiving feedback.
If we choose to display post-edit feedback for all edits, users will develop "blindness" to the message over time.
Historical feedback, social ranking, and similar forms of more contextual, intermittent feedback will be more effective than simple repetitive gratitude.

In order to verify that bucketing and data collection are functioning as required we need...

To be able to create accounts, in order to see that newly-registered editors (and only new editors) are being bucketed into control or experimental conditions.
To be able to make edits to any number or type of pages, and ensure all edits (not a sample) are logged along with normal revision data like who made them, the page, and the time. This data should include whether the editor was in the control or one of the two experimental conditions.

Overall metrics

The following will be collected for each editor in the experiment.
The event referred to can be any type of feedback, including none at all.
Each metric may be measured from a point originating before the experiment and to some point after its conclusion. Let the interval be defined $[t_{R}-X,t_{R}+N+Y]$ where X and Y may be tuned for each metric.

Pre-Event Edit Counts (non-negative integer): Edit counts over fixed period before event (where applicable).
Post-Event Edit Counts (non-negative integer): Edit counts over fixed period after event.
Time to Milestone (non-negative integer): The time taken by an editor to reach milestones (e.g. n edits) after the event
Edits per session (non-negative real): Rate of editing per session
k/n-Retention (binary): Whether an editor has made at least k edit(s) at minimum n days after the event

Methodology

Activation

each experiment will honor an initial activation buffer period of a few hours after the deployment;
the eligible population of users for each experiment will be determined by considering users registered within a predefined eligibility period; the beginning of the eligibility period will fall after the end of the activation buffer;
for this experiment, there is no gap between the user registration time and the beginning of the treatment (the treatment starts immediately upon registration);
the treatment will remain active for a minimum duration for all eligible users, regardless of when they registered an account.

Key
t0	deployment completed
t1	beginning of eligibility period
t2	end of eligibility period
[t0,t1]	activation buffer
[t1,t2]	eligibility period
tR	registration time
[tR, tR+N]	duration of treatment
tR+N	end of treatment
t4 = t2 +N	end of the experiment

Sampling

we will run an eligibility check on all new users by checking their account registration against the eligibility period;
eligible users will then be assigned to a given experimental condition or to a control group via a hashing function applied to their user id;
a predefined eligibility period combined with a a deterministic bucketing function based on user ids will allow us to establish at any time to what experimental condition a given user was exposed without the need of storing additional data.
Note: accounts will all be newly-registered on English Wikipedia, but may be registering automatically through the Single User Login system. Post-collection filtering for accounts that have edited on other Wikimedia projects will likely be a good step toward cleaning up the data.

Data collection

we will rely on data collected via the MediaWiki DB (revision and user tables) and the clicktracking extension;
we will use the clicktracking extension to store events served for successfully completed edits. The event count in the log (non sampled) should match the number of revisions originating from users in the corresponding bucket. We will not capture any impression or click data.

Iterations

Experiment #0 (Dry Run)

This experiment will be activated on 2012-07-19; re-deployed on 2012-07-23
Dry run the code in production with no visible change to the user for the purpose of assessing whether:

we can effectively bucket users
activate experiments to eligible users based on registration time or other user data
serve different experimental code by condition
accurately match revision and log event data
verify that bucketing is working per spec.

Given that the dry run meets the above criteria we will proceed to the next iteration.
For the dry run we are planning to use the following intervals to describe the graph in the activation section:

[t0,t1] 3 hours

[t1,t2] 2 days

N 1 day

[t0,t4] 4 days

The dry run will give us the following samples for the purpose of the data integrity tests:
- 8K eligible participants
- 2.6K participants for each condition (33% of the eligible population)
- 650 active editors (with at least 1 edit) per condition
- 2K revisions

Experiment #1 - confirmation vs. gratitude

This experiment was activated on 2012-07-30
This experiment will end on 2012-08-11 and will be deactivated on 2012-08-13

Methods

The first phase of this research will test one simple confirmation message, one thank you message, versus no post-edit feedback as a control.

There will be three buckets for users: the confirmation group, the gratitude group, and a control which receives no message
The experiment will only be delivered to editors who registered after the deployment date
The experiment will be delivered in all namespaces and for all edits, except page creation
We used the following intervals for PEF1:

[t0,t1] 3 hours

[t1,t2] 7 days

N 7 days

[t0,t4] 15 days

Experiment #2 - Historical Feedback

We anticipate activating this experiment on TBD

Methods

The second iteration of the experiment will test only delivering a message to editors when they reach certain milestones in their edit count.

There will be two buckets for users: the historical feedback group and a control which receives no message; this bucketing system will necessarily function differently than the bucketing system for experiment 1, as users will only see the feedback message when they reach the specific milestone triggers.
The experiment will run for approx. 2 weeks, depending on sample size.
The experiment will present a message for edits at count: 1, 5, 10, 25, 50, and 100.
The experiment will only be delivered to registered editors
The message will delivered in all (editable) namespaces
A different message will be delivered for each edit milestone.

Feature requirements and user experience

See the documentation on MediaWiki.org

References

↑ ^a ^b Cheshire, C., & Antin, J. (2008). The Social Psychological Effects of Feedback on the Production of Internet Information Pools. Journal of Computer-Mediated Communication, 13(3), 705-727. DOI PDF
↑ Mazarakis, A., & van Dinther, C. (2011) Feedback Mechanisms and their Impact on Motivation to Contribute to Wikis in Higher Education, WikiSym '11. DOI PDF

[cheshire-1] Cheshire, C., & Antin, J. (2008). The Social Psychological Effects of Feedback on the Production of Internet Information Pools. Journal of Computer-Mediated Communication, 13(3), 705-727. DOI PDF

[2] Mazarakis, A., & van Dinther, C. (2011) Feedback Mechanisms and their Impact on Motivation to Contribute to Wikis in Higher Education, WikiSym '11. DOI PDF

[1]

[2]

Background

Research questions

Assumptions

Overall metrics

Methodology

Activation

Sampling

Data collection

Iterations

Experiment #0 (Dry Run)

Experiment #1 - confirmation vs. gratitude

Methods

Experiment #2 - Historical Feedback

Methods

Feature requirements and user experience

See also

References