Grants:Evaluation/Evaluation reports/2013/en
- This overview page provides up to date information about who submitted data and how we gathered that and additional data for evaluating programs in the Wikimedia movement.
Here you will find information about:
- Who the Program Evaluation and Design team is.
- The goals of program evaluation, some important definitions, and why we're doing this.
- Challenges we had with the data reported and any limitations we experienced.
- Any additional data that we had to mine in order to have more representative data.
- What kind of questions we have and what priority goals program leaders have.
- Who was able to report on inputs and participation, content production and quality, recruitment and retention, and replication and shared learning.
- Our goals with this ongoing reporting.
- What all the numbers, evaluation words and graphs mean in this report.
- ... and finally next steps and recommendations — from tool development to what other kinds of data is needed to further the awesome reporting being done.
Program Evaluation and Design team
The team responsible for producing this report is the Wikimedia Foundations Program Evaluation and Design team, and includes Frank Schulenburg, Senior Director of Programs; Dr. Jaime Anstee, Program Evaluation Specialist, Interns Edward Galvez and Yuan Li; and Sarah Stierch, former Community Coordinator.
-
Edward Galvez
Program Evaluation and Design intern. -
Yuan Li
Program Evaluation and Design intern. -
Sarah Stierch
Program Evaluation & Design Community Coordinator April 2013-January 2014.
Introduction and background
Two important definitions
“Programs”
- What is a program?
A program is a group of projects and activities that share a similar theories of change and often have the same mission or goals. These programs may take place frequently, on a repeated basis with returning or new sets of participants, or they may be one-shot events.
Our team has been looking into a specific set of Wikimedia programs: those that are organized and run by Wikimedia community members. These programs include edit-a-thons, editing workshops, GLAM content donations, on-wiki writing contests, photo upload competitions and events (i.e. Wiki Loves Monuments, Wiki Takes, Wiki Expeditions, etc.), and the Wikipedia Education Program. Some of these programs share similar theories of change and most commonly, goals of editor recruitment, engagement, and and retention along with Wikimedia content creation and improvement. We plan on expanding the number of programs we examine with each round of evaluation.
“Program Leader”
- What is a program leader?
A program leader is a person who plans, executes and, most of the time, evaluates programs. Sometimes programs have multiple program leaders. With regards to Wikimedia programs, program leaders might be individuals with no chapter affiliation. They may be the volunteer President of a chapter, or perhaps a paid employee of a chapter who designs and executes programs specifically for that chapter. They could be a member of an affiliate group recognized by the Wikimedia community. It could be a librarian who hosts workshops at their library to teach people how to edit Wikipedia. You might be a program leader!
Current evaluation initiative
Our team has designed an evaluation initiative during this initial phase. This initiative comprises of:
- Self-evaluation - Program leaders are responsible for evaluating their own programs. Our team is here to support that self-evaluation.
- Collaboration - Program evaluation and design is a new concept to many community members and Wikimedia Foundation staff members alike. We are in this together, and by learning and working together as an evaluation community, we can build a shared understanding about evaluation and design to maximize the impact of our programs in the movement.
- Capacity building - Our goal, as a team, is to provide program leaders with the necessary skills and tools to evaluate and design their programs. By doing this successfully, the community can have the capacity to comfortably evaluate without fear or worry of the processes involved.
Community activities
Based on our current evaluation initiative, we continue to plan and execute community activities to engage, inspire and empower program leaders during this pilot period of program evaluation and design. Activities also allow the our team, and the Wikimedia community, to get a better understand about the "state of evaluation" in the community, and where we can grow together regarding it. Our activities thus far include:
The first Program Evaluation & Design Workshop, June 2013, Budapest
This pilot workshop brought together 21 program leaders from 15 different countries to learn the basics about program evaluation and theory, including theory of change and logic models. Between facilitated presentations and hands on team based activities, attendees gained first hand experience at creating and using these tools to further the impact of their programs.
- Learn more: "Finding out what works: first program evaluation workshop" via the Wikimedia Foundation blog.
Evaluation capability status survey
In August 2013, we sent out the request for over 100 program leaders around the world to take a survey. This survey allowed us and the Wikimedia community to understand what types of data program leaders were tracking and monitoring regarding edit-a-thons, workshops, GLAM content donations, photography contests, online editing contests, and the Wikipedia Education Program.
- Learn more: "Survey shows interest in evaluation in Wikimedia movement, with room to grow" via the Wikimedia Foundation blog.
Data collection survey
In October 2013, we sent out a follow-up survey requesting data from program leaders. Unlike the capability survey, which surveyed what kinds of data program leaders were collecting, this survey requested that program leaders voluntarily share actual data that they have collected for edit-a-thons, workshops, GLAM content donations, photography contests, online editing contests, and the Wikipedia Education Program. The results of this survey are included in this, our first Evaluation Report.
Wikimedia programs: Evaluation report (beta)
Purpose and goals
This initial version of the Evaluation Report aims to provide the Wikimedia community with a first look at data collected from community-run programs around the world and to identify opportunities to further support the community with program evaluation and design. The report includes data from the first Data Collection Survey in addition to data pulled from online tools that are also available to the community.
The goals of this initial report are:
- To use this as a baseline or starting point about metrics and data reporting, with the hope that program leaders in the Wikimedia community will be inspired to collect and report data that can assist programs in reaching their identified goals.
- So that the Program Evaluation and Design team and the Wikimedia Community can use this pilot report to explore methods for improving the collection and reporting of data—learnings that can be applied to the next data collection survey and report! We want to support program leaders to make evaluation and learning easy and fun.
Overall Response Rates and Limitations
Response Rate
23 program leaders voluntarily reported on 64 programs they produced. Our team removed six reported programs due to the inability to disaggregate and confirm the numbers that were shared, bringing the usable total to 58. In addition, one program leader sent the Program Evaluation & Design team a list of cohorts from six workshops they produced, for which we pulled the data ourselves. That data is included in this report. To expand collection, we mined data for 61 additional program implementations from data that was publicly available on wiki (i.e. reports, event pages). This collected data comprised 51% of data used in this report. This increased our collection of output and outcome data, helping us to fill in some gaps that were not covered from the surveys or responses. In total, 119 program implementations have been included in this report.
Data issues and limitations
- The survey had a low response rate, thus a low number of reported data and a high variability in the data that was reported.
Because of this, this report includes means, response range, standard deviations, and medians. Because of the wide range of numeric responses, and thus low number of modes, modes are being reported selectively, and not in all the data reported in this report.
- Program leaders aren't consistently reporting program budgets and staff/volunteer program implementation hours.
Even those who have been tracking their inputs, outputs, and/or outcomes have done so with varying consistency and levels of analysis. For example, while many program leaders track their budgets, they often don't track the budget down to the details—they only track the overall budget. Thus details about how much certain parts of a program cost, and other specifics, are lacking. Out of the 59 programs reported on by program leaders, 64% included a budget report; however, 22% (12 out of 13 reports provided directly) reported no budget but did report hours invested.
Most program leaders who responded to the survey were able to estimate how much staff (51% of data reported) and volunteer (81% data reported) hours went into implementing a program, but, very few were able to report exact hours (7% for staff hours, and 5% for volunteer hours). In total, 89% of program leaders reported some type of data about hours, with volunteer hours being reported most often at 86% of the implementations reported directly.
Out of the mined data that we collected based on public records, the only programs that had available budget information were the 24 Wiki Loves Monuments events (44% of mined data). However, the Wiki Loves Monuments data we mined did not provide any staff or volunteer hours. The other programs that we mined, which totaled an additional 30 programs, had no budget or hours that were publicly reported on. In conclusion, report data is lacking in each of the six programs at this time. This also means that we were unable to complete any meaningful cost-benefit analysis at this time.
- For content production metrics, only a minority of program leaders were able to report on most measures for their program events (i.e. edit counts, characters added, media uploaded, pages created).
The following percentages are how many program leaders reported the following data:
- 63% - photos/media uploaded
- 39% - edit counts
- 27% - amount of text added to Wikipedia's article namespace (for most European languages 1 byte = 1 character)
- Finally
Under half of the respondents were able to share data about the retention of new or existing editors (45%). 63% of reports contained partial or complete data for budget, hours, and content production.
The team also acknowledges the timing of their reporting requests, due to many program leaders being involved in the wrapping up and reporting for Wiki Loves Monuments 2013.
Supplemental data mining
- We had to collect extra data, to fill in some gaps due to the low response rate.
In addition to collecting self-reported program data from program leaders, we worked hard to identify and locate potential sources for program data. Some program leaders provided us cohort usernames, event dates, and times, which allowed us the opportunity to inquire into their events in order to fill-in certain data gaps. We collected additional data on the following programs:
- Edit-a-thons - Edit-a-thons were the most frequently self-reported program type. However, many program leaders did not track usernames of participants in order to track their contributions made before, during, and after the event. We pulled additional data on 20 English Wikipedia edit-a-thons, for which public records of participants were available on wiki. These names were used as cohorts to track user activity rates 30 days prior to the event, during the event, and 30 days after the event. This allowed us to examine content production and user retention related to edit-a-thons. We also pulled 30 day prior and after data for two edit-a-thons submitted by program leaders through the survey.
- Editing workshops - One program leader submitted usernames, event dates, and program details for six workshops. This allowed us to create cohorts for those six workshops and pull data via Wikimetrics. We pulled data on the cohorts to examine the three and six month retention of new users in the cohort list. Some usernames were unable to be confirmed via Wikimetrics and additional research, but the majority were able to report usable data.
- On-wiki writing contests - Additional data for on-wiki writing contests were pulled for six contests in three different language Wikipedias. This data, which was publicly available on wiki, included data gathered regarding program dates, budget, number of participants, the content was that was created/improved, and the quality of the content upon the end of the contests. We worked with program leaders, when possible, to confirm volunteer hours and budget. We were unable to judge retention and characters added due to limitations in being able to pull only contest specific data.
- Wiki Loves Monuments - We used data from three directly reported Wiki Loves Monuments as well as data from 24 Wiki Loves Monuments implementations from 2012 and 2013 that had received Wikimedia Foundation grants (including those from the FDC) and had reported a specific budget for the program. This totaled 27 program implementations of Wiki Loves Monuments for two years. We used publicly collectable data regarding those 24 Wiki Loves Monuments to gather information about the number of: participants, photos added, photos used, and photos named as Featured, Quality, or Valued images. This data was pulled using three community built tools: Wiki Loves Monuments tool by emijrp, GLAMorous, and CatScan 2, the latter two created by Magnus Manske. We also contacted program leaders from the 24 Wiki Loves Monuments to review and confirm numbers gathered, and contribute additional data regarding budget and donated resources.
- Other photo upload initiatives - An additional five upload events were tracked down for reporting through various ways. Additional data was gathered by us for an additional five program implementations, these programs included three other Wiki Loves events, a Wiki Takes event and the pilot project Festivalsommer 2013. These programs were selected in order to expand on the amount of data regarding other photo upload events. Data pulled was based on publicly available information and on direct reports from program leaders. Data collected included the number of participants, photos uploaded, photos used, and photos named as Featured, Quality or Valued Images. The team used the Wiki Loves Public Art tool created by Wikimedia Österreich to pull selected data. For the Festivalsommer project, additional data was acquired through direct interaction with the program organizer.
Data and analysis
- We have a lot of questions, and the data reported has helped us answer
- What do these programs costs in terms of dollars and hours invested and what other costs may be hidden in donated resources used?
- What is the reach of these programs in terms of accessing new and existing editors/contributors?
- How much content do programs produce in terms of bytes pages or photos/media added?
- What are the costs in terms of dollars and hours input per unit of content (text pages or photos/media added) or per participant/recruit (for workshops which produce no content)?
- To what extent do program outputs increase the quality of Wikimedia projects?
- To what extent does program participation produces new active editors/contributors, or retain active editors, at 3- and 6- months retention points?
- To what extent does the program have examples for easy sharing and replication?
Priority goal setting
We worked with the community to discover the most commonly seen goals for programs. The June pilot workshop in Budapest served as a way for the our team to identify 18 commonly seen outcomes across programs, which were discovered through conversations, break out sessions, and logic modeling. Participants in the Data Collection Survey were asked to select outcomes and targets the programs they reported on had. These 18 priority goals are:
- Building and engaging community
- Increasing accuracy and/or quality of contributions (i.e. clean high resolution photographs which are placed in the proper articles)
- Increasing peoples awareness of Wikimedia projects
- Increasing peoples buy-in for the free knowledge/open knowledge/culture movements
- Increasing contributions to the projects
- Increasing diversity of contributions and content
- Increasing diversity of contributors
- Increasing positive perceptions about Wikimedia projects
- Increasing reader satisfaction
- Increasing the usefulness, usability, and use of contributions
- Increasing the use and access to projects
- Increasing peoples editing/contributing skills
- Increasing volunteer motivation and commitment
- Increasing respect for the projects (i.e. higher education acceptance)
- Making contributing fun
- Making contributing easier
- Recruiting new editors/contributors
- Retaining existing editors/contributors
In the survey, program leaders could also write in other goals in a section titled "other". This set of 18, with the "other" option, were presented for each program that program leaders reported on. They were asked to select priority goals for their reported programs. The number of goals program leaders reported ranged from five to 11. The overall mean for any given program was nine selected priority goals. In general, program leaders demonstrated difficulty in "prioritizing": Out of all reports, only 12.5% selected five or fewer priority goals.
Inputs and participation
Inputs
- The majority of program leaders reported some type of budget, but the majority didn't report data about hours it took to implement their programs.
Regarding inputs, program leaders were asked to report:
- Budget - how much it cost them to produce their program in US dollars
- Staff and volunteer hours - How many actual or estimated hours staff and volunteers put into their program from beginning to end
- Donated resources - Including equipment, prizes, give-aways, meeting space, and other similar things donated by organizations or individuals to support the program
Most program leaders reported budget data, while a larger number did not provide data about hours. Across the 119 report responses:
- 55% included budget data (22% of budgets reported were zero dollars)
- 34% included staff hours (51% of staff hours reported were zero dollars)
- 44% included volunteer hours (2% of volunteer hours reported were zero dollars)
Participation
- The majority of program leaders could report how many people participated in their program, when little over half were able to tell us how many new editors made accounts for their programs.
Regarding participation, program leaders were asked to report:
- Total number of program participants
- Number of participants that created new user accounts during the program
The majority of participants reported the total number of participants (98%), when little over half (57%) reported number of new user accounts created during their program.
GLAM content donations had a slightly different reporting request about participation:
- Total number of GLAM volunteers involved in the program (78% reported)
- Total number of GLAM staff involved in the program (89% reported)
Program leaders were also asked to provide the dates, and times, if applicable, for their program.
Content production and quality improvement
Content production
- Most program leaders were able to tell us how much media was added during their program, but a minority were able to report on how many characters (bytes) were added during their events, let alone how many editors actually editing during their programs.
Regarding content production, program leaders were asked to provide various types of data, about what happened during their program, depending on the level of data they were able to record and track. These data types were:
- Total number of characters added (33% reported)
- Average number of characters added (33% reported)
- Number of participants that added characters (8% reported)
- Number of photos/media added (80% reported)
- Number of Wikimedia project pages created or improved (50% reported)
Content production metrics were not requested of those who reported about editing workshops, since content production is not the main goal of that type of program.
Quality improvement
- Most program leaders were able to report how many and much of their images, uploaded during their program, were used in the projects after the program ended. However, most were unable to report about the quality of articles and images, and most that did report on it stated that no featured, good, or valued articles or images came out of their event.
The survey also asked that program leaders report on the quality of the content that was produced during the program. They could report:
- Total number of good articles (38% Reported, 51% of reported no good articles)
- Total number featured articles (34% Reported, 77% of reported no featured articles)
- Use count of photos added that were being used in Wikimedia project pages (63% reported)
- Number of unique images used on Wikimedia project pages (63% reported, 9% of reported that none were being used)
- Number of Quality Images (27% reported)
- Number of Valued Images (29% reported)
- Number of Featured Pictures (28% reported)
Those who reported about edit-a-thons and workshops were not asked to report about image use and quality.
Recruitment and retention
- Just over half of respondents were able to tell us how many of their participants were active 3 months following their program and less than half were able to do so 6 months after. Tools like Wikimetrics can make this possible, which means tracking usernames is important to learning about retention. For edit-a-thons and workshops, the majority of those reported on did not retain new editors six months after the event ended. A retained "active" editor was one who had averaged five or more edits a month [1]
Regarding the recruitment and retention of active editors, program leaders were asked to report two areas of data. An "active editor" is defined as making 5+ edits a month.[2]
- Total number of contributors still active 3 months after the event (55% reported, 15% reported zero retained)
- Total number of contributors still active 6 months after the event (45% reported, 19% reported zero retained)
If the program reported on was an edit-a-thon or workshop, program participants may have been split into two groups: new editors or existing editors, in order to learn the retention details about each cohort. This is important, since both edit-a-thons and workshops often attract new and experienced editors, unlike on-wiki writing contests that generally target existing contributors, and the Wikipedia Education Program that generally targets new editors.
In terms of recruitment and 6-month retention of new editors (those who made accounts at or for the event):
- Edit-a-thons (44% reported, 85% of those reported zero retained)
- Editing workshop programs (56% reported, 78% of those reported zero retained)
- Wiki Loves Monuments recruitment and retention data was mined using the entire set of uploader usernames for the 2012 events.
- We asked different questions about recruitment and retention for Wikipedia Education Program and GLAM content donations.
We asked about the retention of partnerships between educational institutions or cultural organizations, instead of editor retention. Wikipedia Education Program respondents were asked to identify how many instructors were participating in the program, and GLAM content donation respondents were asked if the GLAM they worked with would continue their partnership with Wikimedia and if the content donation would lead to other GLAM partnerships.