Research:Activity session: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
→‎Definition: \leq character
m slight rephrase, apostrophe
Line 1: Line 1:
[[File:Edit session - Tony Bartels.png|thumb|right|400x400px|'''Figure 1. Estimated session length for Toby Bartels.'''  Edits made by “Toby Bartels” are plotted and annotated over time for a session he completed on September 4th, 2010.  The estimated session start time is plotted at 430 seconds before the users first edit.]]An '''edit session''' represents a sequence of edits performed by an editor during a "session" of wiki-work. Assuming that editors tend to work on the encyclopedia in bursts that appear in the log as a quick succession of edits to articles and other pages, the beginning and end of their work session can be approximated by the first and last edit recorded in such a sequence.
[[File:Edit session - Tony Bartels.png|thumb|right|400x400px|'''Figure 1. Estimated session length for Toby Bartels.'''  Edits made by “Toby Bartels” are plotted and annotated over time for a session he completed on September 4th, 2010.  The estimated session start time is plotted at 430 seconds before the user's first edit.]]An '''edit session''' represents a sequence of edits performed by an editor during a "session" of wiki-work. Assuming that editors tend to work on the encyclopedia in bursts that appear in the log as a quick succession of edits to articles and other pages, then the beginning and end of their work session can be approximated by the first and last edit recorded in such a sequence.


== Definition ==
== Definition ==

Revision as of 20:43, 24 August 2012

Figure 1. Estimated session length for Toby Bartels.  Edits made by “Toby Bartels” are plotted and annotated over time for a session he completed on September 4th, 2010.  The estimated session start time is plotted at 430 seconds before the user's first edit.

An edit session represents a sequence of edits performed by an editor during a "session" of wiki-work. Assuming that editors tend to work on the encyclopedia in bursts that appear in the log as a quick succession of edits to articles and other pages, then the beginning and end of their work session can be approximated by the first and last edit recorded in such a sequence.

Definition

An edit session is a sequence of edits made by an editor where the difference between the time at which any two sequential edits occurred is less than . In other words, a set of edits is an edit session if:

Where:

  • = the index of edit in a sequence of edits
  • = the time at which edit occurred in seconds
  • = the maximum time between edits (commonly set to one hour)

Construction

Constructing the edit sessions for a user is relatively trivial and easy to compute (linear complexity). Simply iterate forward through a list of an editors edits sorted by timestamp. When the difference in time between two edits is larger than draw a session boundary.

For example:

Python code
#!/usr/bin/env python
import time, calendar
TIME_CUTOFF = 3600 #seconds = 1 hour
user_revisions = [] #Result of http://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=EpochFail&ucdir=newer&ucprop=timestamp&uclimit=500&format=jsonfm

def wp_to_seconds(time_string):
    '''Converts the MediaWiki time format into a Unix Epoch, the number of seconds since Jan 1st, 1970 GMT'''
    time = time.strptime(time_string, '%Y-%m-%dT%H:%M:%SZ')
    return int(calendar.timegm(time))

last_timestamp = None
session = []
for rev in user_revisions:
    current_timestamp = wp_to_seconds(rev['timestamp']) #Convert from string to seconds sine Jan. 1st 1970
    if last_timestamp == None: last_timestamp = current_timestamp #Are we starting a new session?
    
    if current_timestamp - last_timestamp <= TIME_CUTOFF: #Continue the current session
        session.append(rev)
    else:                                                 #Dump out the last session and start a new one
        print session
        session = [rev]
        last_timestamp = None

if len(session) > 0:
        print session

Session duration

The session duration approximates the amount of time an editor actually spent working on Wikipedia during the edit session under the assumption that, in between the edits that an editor makes, she is performing legitimate wiki-work, and therefore, we can estimate their labor hours by measuring the time taken to complete her session. A naive way to derive session duration from an edit session is to simply find the difference in time between the first and last edits in the session. However, this approach does not account for the amount of time that the first edit in a session required to make, and therefore, sessions that contain only one edit would appear to have required zero labor-hours.

A method to account for the time that the temporal bounds of edit sessions do not capture is to calculate the average time between edits across all sessions that contain more than one edit, (430 seconds in April of 2012). By combining this mean inter-edit time with the the difference in time between the first and last session edits one can produce an estimated session duration that accounts for the necessary work for producing the first edit.

Limitations

The arbitrary value of

Figure 3. Vetting with inter-edit times. A histogram of the time between edits for registered editors is plotted with a 3 Gaussian mixed model fit overlay. The one hour session cutoff is noted.
Figure 2. Inter-edit fit over time. The mean and sd for the three sub-distributions of inter-edit times is plotted over time.

The value of chosen for edit sessions is somewhat arbitrary. In the relevant research, the cutoff of one hour is commonly used under the assumption that some large edits may take up to an hour to complete and that time away from the wiki would generally last more than one hour. With any selected value, trade-offs will be balanced between combining edit sessions together and cutting them off too quickly.

Despite the arbitrary nature of , an hour has held up to spot checking[1][2][3] and data analysis suggesting a multi-modal distribution between inter-edit times (see figure 3) confirms that one hour is a reasonable value.

Usage

  • Grouping edits together
    • Halfaker used the notion of an edit session ( = 1 hour) to measure the first experience of newcomers as registered editors[1]. He found that the number of revisions an editor makes in their first edit session (understood as editor "investment") is a strong predictor of long-term retention.
    • Halfaker et al. built on the previous study by using newcomers first session ( = 1 hour) edits as a dataset for determining good and bad-faith editors[2]. This analysis was used to show that The Decline is not caused by decreasing newcomer quality.
    • Panciera et al. measured the number of edits per session ( = 1 hour) to control for editors who performed many small edits as opposed to those who package a large change into a single edit[3]
  • Measuring labor hours
    • Using the session duration measurement, the total number of hours that an editor has spent editing can be approximated. Ongoing research by Geiger & Halfaker builds such estimations across the encyclopedia.

References

  1. a b Aaron Halfaker (2011). First edit session, a technical document produced for the 2011 Wikimedia Summer of Research.
  2. a b Aaron Halfaker, R. Stuart Gieger, Jonathan Morgan & John Riedl. (in-press). The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to sudden popularity is causing its decline, American Behavioral Scientist.
  3. a b Katie Panciera, Aaron Halfaker & Loren Terveen (2009). Wikipedians are Born, Not Made: A study of power editors on Wikipedia, GROUP