Grants:IEG/PlanetMath Books Project: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
m fix link
full stop
Line 43: Line 43:
# Writing and editing work to transform this raw material into truly expository prose
# Writing and editing work to transform this raw material into truly expository prose
# Transformation tools necessary for exporting to Wikibooks
# Transformation tools necessary for exporting to Wikibooks
# As an alternate route to generating content, we will work with off-the-shelf Optical Character Recognition (OCR) systems to re-typeset several public domain texts, incorporating that material wholesale, and adding the results to, e.g., [[w:WikiSource|WikiSource]]
# As an alternate route to generating content, we will work with off-the-shelf Optical Character Recognition (OCR) systems to re-typeset several public domain texts, incorporating that material wholesale, and adding the results to, e.g., [[w:WikiSource|WikiSource]].


===Concrete deliverables===
===Concrete deliverables===

Revision as of 14:52, 27 September 2013

status: draft

Individual Engagement Grants
Individual Engagement Grants
Review grant submissions
review
grant submissions
Visit IdeaLab submissions
visit
IdeaLab submissions
eligibility and selection criteria

project:

PlanetMath Books Project

idea creator:

Arided

project contact:

holtzermann17(_AT_)gmail.com

participants:


grantees:

  • Joseph Corneli (Arided)
  • Raymond Puzio

volunteers:

  • Deyan Ginev

advisors:

  • Jon Borwein
  • Michael Kohlhase
  • Murdoch James Gabbay
  • Lee Worden
  • Christoph Lange

summary:

As an early wiki encyclopedia, PlanetMath provided lots of material for Wikipedia. We want to do it again, for Wikibooks.





2013 round 2

Project idea

PlanetMath is a free/open mathematics community that uses the same license as Wikipedia. It is best known for its encyclopedia, but in recent years, the software has been rebuilt with a focus on problem solving and building expository material for students, rather than encyclopedic material at the research reference level. We want to build a collection of free/open problem sets, course outlines, and textbooks, using material from PlanetMath, Wikipedia, and math.stackexchange.com, along with other free/open or public domain sources. PlanetMath's special-purpose software makes it an ideal place to assemble this content, but we also want to publish results "downstream", e.g. to Wikibooks (along the lines of the earlier WikiProject PlanetMath Exchange).

Project goals

The project has the following components:

  1. Technical platform improvements to PlanetMath's user interface to streamline the production of books
  2. Crawling and light-weight semantic linking to connect up existing free materials across the Web (e.g. questions and answers, problem sets, encyclopedia articles)
  3. Writing and editing work to transform this raw material into truly expository prose
  4. Transformation tools necessary for exporting to Wikibooks
  5. As an alternate route to generating content, we will work with off-the-shelf Optical Character Recognition (OCR) systems to re-typeset several public domain texts, incorporating that material wholesale, and adding the results to, e.g., WikiSource.

Concrete deliverables

During the first phase of the project, we will focus on platform issues and build a few subject-specific "outlines" along the lines of the Schaum's Outlines series. We will build on significant prior work with the Planetary platform and NNexus autolinker, both of which are in active use on PlanetMath currently.

Planned future work that will be made possible by our efforts

Subsequent phases of the project will seek for "feature parity" with the Schaum's Outline series as a whole. (One of these books selected at random had 97 expository sections, 877 problems, and 420 worked solutions; feature parity with the series as a whole would require approximately 6000 expository sections, 60000 problems, and 20000 solutions).


Get involved

Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "Discussion" link at the top of the page to start the conversation and share your thoughts.


Part 2: The Project Plan

Project plan

Scope:

Scope and activities

The first phase of the project will focus on tuning the technical platform. Concretely, work will focus on custom modules for the Drupal content management system, on which Planetary is based, some improvements to the NNexus concept indexer and autolinker, and custom OCR pre- and post-processing routines. This technical work will be evaluated by producing several books demonstrating the workflow:

OCR+proofreading | web scraping | content reuse → semantic linking → content assembly → import → editing → downstream distribution.

Tools, technologies, and techniques

This is a software focused project: the main system is Planetary, a customized version of Drupal (currently version 7) that incorporates LaTeXML and MathJax for mathematics rendering, and which uses the NNexus concept-indexer and autolinker as a way to identify (weak) semantic links between user-contributed content.

We have weekly phone conferences that have been happening for about a year – this is a good way to keep up to date with progress and also to bring in newcomers.

The software projects are all free software and managed through public issue trackers and mailing lists.

Budget:

Total amount requested

$22000

Budget breakdown

  • $1000 will purchase a license of InftyReader, a state-of-the-art math-aware OCR software
  • $2000 will purchase a high-end computer for OCR and related processing
  • $2500 will pay for developing custom OCR helper tools
  • $500 will pay for time spent on OCR, proofreading, and testing the OCR helper
  • $10000 will pay for improvements to the Drupal components of the software system (including bulk upload facilities needed for efficient content assembly)
  • $4000 will pay for time spent doing content editing and assembling books
  • $2000 will pay for time spent developing export tools, moving content to Wikibooks, and other dissemination activities

Intended impact:

Target audience

We want to build a free, interactive, replacement for expensive and technically outmodded textbooks. Mathematics students and teachers will be the first beneficiaries of this work.

Fit with strategy

Increasing Reach and Improving Quality: Wikipedia has great mathematics content and is often a first-stop for high-level mathematics questions. However, students still don't have a reliable "first-stop" for learning how to do mathematics. Mathematics is best learned as part of an active process of problem solving. The content we're building will support this sort of active use.

Encourage innovation: We're working outside of Wikimedia, but we hope to find robust ways to work with Wikimedia, using alternative software to help solve a problem of importance in project that is very much aligned with Wikimedia's mission.

Sustainability

PlanetMath is one of the earliest online wiki-like communities – it launched in 2001 and has been running successfully since then, powered by a 100% free/open software system, which has recently been re-built using the popular Drupal web content management system. A new workflow for producing mathematics textbooks will make PlanetMath an important tool for the next decade. In particular, the tools we will develop will help circulate content between other free/open platforms including Wikipedia, Wikibooks, and math.stackexchange.com.

Measures of success

In this phase of the project, we are focusing on technical implementation and only expect to produce a relatively small number of books. However, we will do pre-processing of material that would be useful for subsequent efforts – in particular, aligning the Wikipedia and PlanetMath encyclopedia coverage, and using the NNexus autolinker to crawl, index, and connect questions from math.stackexchange.com to this body of work. Metrics include the percentage (and distribution) of questions, answers, and article linked in this fashion, as well as the amount of work required to transform the products of this "automated" content assembly phase into usable books.

Participant(s)

Joe is one of the main developers on the Planetary project, and has been responsible for most of the custom code on PlanetMath. He has submitted a Ph. D. thesis in at the Knowledge Media Institute of the Open University (UK), entitled "Peer Produced Peer Learning: A mathematics case study" – this thesis describes the primary considerations behind the PlanetMath rebuild/reboot, and including a report on preliminary user trials of the new software.

Ray is one of the top contributors to PlanetMath, where he also serves as the Operations Manager. He holds a Ph. D. in Physics from Yale.

Deyan, who will be participating in the project as a volunteer, is the current maintainer of the NNexus indexer and autolinker, and is also one of the core contributors to LaTeXML. He is working on a Ph. D. in computer science at Jacobs University Bremen, and a board member at PlanetMath.

Discussion

Community Notification:

Please paste a link to where the relevant communities have been notified of this proposal, and to any other relevant community discussions, here.

Endorsements:

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.

  • Community member: add your name and rationale here.