Grants:IEG/Wikidata Toolkit: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
initial proposal (Part 1); more to come
 
Line 53: Line 53:
<!--The templates below help maintain your proposal, please leave them, thank you!-->
<!--The templates below help maintain your proposal, please leave them, thank you!-->
{{IEG/Proposals/Button/2}}
{{IEG/Proposals/Button/2}}

== Part 2: The Project Plan ==

__TOC__<!--The sections below provide the required structure for part 2 of your project proposal, feel free to add to them but please keep the provided subsections.-->
==Project plan==
<!--Leave this blank.-->

''Temporary note: the rest of this proposal will be provided soon, but not today.'' --[[User:Markus Krötzsch|Markus Krötzsch]] ([[User talk:Markus Krötzsch|talk]]) 16:52, 27 September 2013 (UTC)

===Scope:===
====Scope and activities====
<!--What will you spend your time doing if this project is funded? What will be completed by the end of the project?-->


====Tools, technologies, and techniques====
<!--What kinds of things do you expect you’ll need most in order to complete your project? -->
<!--Examples: outreach, meetups, surveys, database queries, gadgets for wikipages, Wikimedia merchandise for volunteers, etc-->


===Budget:===
====Total amount requested====
<!--Please list the total grant amount you are requesting in USD or local currency (USD will be assumed if no other currency is specified).-->


====Budget breakdown====
<!--How you will use the funds you are requesting? You can create a table with the table button from the toolbar above, or just list bullet points.-->
<!--Examples: Travel roundtrip from Bangladesh to India: (Amount) USD, Visual design contractor: (Amount) USD, Project management: (Amount) USD, Wikimedia merchandise for volunteer giveaways: (Amount) USD, etc.-->


===Intended impact:===
====Target audience====
<!--Who will be better served or impacted as a result of this project?-->


====Fit with strategy====
<!--What crucial thing are you trying to change or have an effect on in the Wikimedia movement with this project? Please select a Wikimedia strategic priority that your project most directly aims to impact and explain how your project fits-->
<!--Examples of strategic priorities: Increasing Reach (more people are able to access Wikimedia projects), Improving Quality (better quality and quantity of content on Wikimedia projects), Increasing Participation (larger and more diverse groups of people are contributing to Wikimedia projects)-->


====Sustainability====
<!--What do you expect will happen to your project after the grant ends? How might the project be continued or grown in new ways afterwards?-->


====Measures of success====
<!--How will you know if the project is successful? What are the targets, metrics, or measurable criteria will you use to learn if you’ve met your goals?-->


==Participant(s)==
<!--Please list all prospective grantees for this project and tell us something about each of you. We’re particularly interested in hearing about any related skills, experience, or other background that you think will help make this idea successful.-->


==Discussion==
<!--Leave this blank.-->
===Community Notification:===
<!--You are responsible for notifying relevant communities of your proposal. Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.-->
Please paste a link to where the relevant communities have been notified of this proposal, and to any other relevant community discussions, here.

===Endorsements:===
<!--This section is for community members to describe why they think a project idea is of value.-->
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the [[Grants talk:{{PAGENAME}}|talk page]] of this proposal.

*''Community member: add your name and rationale here.''


<!--You have finished creating all sections of your proposal. Have a great time continuing to develop your project idea, we look forward to your completed submission!-->

Revision as of 16:52, 27 September 2013

status: draft

Individual Engagement Grants
Individual Engagement Grants
Review grant submissions
review
grant submissions
Visit IdeaLab submissions
visit
IdeaLab submissions
eligibility and selection criteria

project:

Wikidata Toolkit


project contact:

markus(_AT_)semantic-mediawiki.org

participants:


grantees:

  • Markus Krötzsch is the creator of Semantic MediaWiki and data architect of Wikidata. He is a Departmental Lecturer at the University of Oxford and will be leading a research group at TU Dresden starting Nov 2013.
  • Research assistant tbd (another person from Markus's research group at TU Dresden)
  • Student assistant tbd (a secondary goal of the project is to involve students in Wikipedia-related development and research)


summary:

The project will develop a toolkit and web service to query and analyse information exported from Wikidata, providing a feature-rich query API based on a robust and scalable backend.





2013 round 2

Project idea

Problem: The Wikidata project collects large amounts of data, but understanding this data requires technical means for querying and analysis that are not currently available. Even skilled developers have hardly any basis for working with Wikidata.

Solution: A modular toolkit for loading, querying, and analysing Wikidata data will make it easy for developers to use Wikidata in their applications. A web service built on top of this toolkit will offer live query capabilities to a wider range of users. The work will heavily draw from prior experience and existing tools, the goal being to unify and improve existing partial solutions.

Motivation

Wikidata collects large amounts of data across all Wikipedia languages. The data comprises names, dates, coordinates, relationships, URLs, but also references for many statements. In contrast to Wikipedia, where the main way of accessing information is to read single pages, the information in Wikidata is most interesting when viewing facts in a wider context, combining information across many subjects. For example, we can now answer the question how the sex distribution of people with Wikipedia articles varies across languages. For Wikidata editors, complex questions are interesting for yet another reason: they use them to check data quality by looking for patterns that should not normally occur. For instance, the mother of a person should normally be female, which is not always the case now. This and many other interesting insights about Wikidata can be gained by querying the data set for certain patterns, thus revealing the true potential of the project.

Unfortunately, Wikidata does not support any advanced form of query. The basic API provided by the project is limited to retrieving elements by their label (or alias). It is not even possible to find pages that refer to another page, e.g., to find the albums recorded by a certain artist – MediaWiki's what links here is sometimes (ab)used as a workaround in cases where it is enough to know that another page is mentioned somewhere in the data. In all cases mentioned above, custom-made software is used to analyse the data from dumps. This is a time-consuming offline process in each case, which often takes hours to complete. Even worse, the lack of technical support excludes the vast majority of users from analysing Wikidata. Even technically trained users who would be able to formulate, say, an SQL query are discouraged by the immense technological barrier of creating their own query answering system.

The goal of this project is to develop necessary technical components to simplify query answering over up-to-date Wikidata data. The heart of this project is a robust and flexible query backend that provides an API for running a variety of queries. A web service to showcase the functionality will be created and set up to use current (or very recent) data. The main approach for achieving this is to develop a set of modular, re-usable, client-side components for in-memory query answering. While the size of Wikidata is large (and growing quickly), it is certainly in the range of modern main memory sizes, and the added flexibility of a memory-based model is essential to support a wider range of queries. Moreover, components for loading and updating data selectively can help to filter information so that querying is possible even on machines with commodity memory sizes.

Project goals

The project has two technical main outcomes:

(1)  Wikidata toolkit. A set of modular components for in-memory processing of information from Wikidata in a programmatic way
(2)  Query web service. A web service to run queries against current Wikidata content that is built on top of the toolkit

In addition, the project aims at a soft outcome to ensure sustainability beyond the initial grant:

(3)  Community engagement. Active involvement of volunteer developers and interested users

Outcome (1) is the heart of the project. Outcome (2) is a first application that will make (1) more tangible and help evaluating project progress. Outcome (3) aims at increasing the long-term impact of the project. In view of (3), a particular focus of toolkit development will be maintainable code and an extensible architecture.

The general goals that these outcomes should help to achieve are:

  • Significantly lower barrier for using and analysing Wikidata content
  • Improved quality control mechanisms for Wikidata editors
  • Higher utility and visibility of Wikidata content, beyond direct use in Wikimedia projects
  • Increase in content-driven applications based on Wikidata content

The following are no goals of the project: to develop a new database management software (the project is read-only), to replace future Wikidata query features (they address different needs and requirements), to develop innovative user interfaces for queries/analysis (this might be a follow-up project), to improve MediaWiki API access for programmes (API access and bot frameworks are different types of toolkits; the problems addressed in the present project are not addressed by Wikidata's current web API).

Ready to create the rest of your proposal?
Use the button below just once to create the remaining sections you'll need!


Part 2: The Project Plan

Project plan

Temporary note: the rest of this proposal will be provided soon, but not today. --Markus Krötzsch (talk) 16:52, 27 September 2013 (UTC)

Scope:

Scope and activities

Tools, technologies, and techniques

Budget:

Total amount requested

Budget breakdown

Intended impact:

Target audience

Fit with strategy

Sustainability

Measures of success

Participant(s)

Discussion

Community Notification:

Please paste a link to where the relevant communities have been notified of this proposal, and to any other relevant community discussions, here.

Endorsements:

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.

  • Community member: add your name and rationale here.