RDF

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Duesentrieb (talk | contribs) at 18:43, 3 June 2005 (→‎Implementation: custom RDF). It may differ significantly from the current version.

This page is about an effort to provide an extensive RDF interface for mediawiki. The idea is to create a flexible framework that goes byond the ability of the code described at RDF metadata. I (User:Duesentrieb) am working on this together with User:Evan.

Scope

The idea is to provide a way to query different kinds of information about a wiki page in RDF format. This may include

  • basic metadata as defined by the Dublin Core standard and/or the Creative Commons project.
  • List of all authors
  • List of all links to and from a page
  • List of keyworkds (Categories for a page)
  • List of members of a category
  • ...and a plugin interface for more.

Also, the RDF could be deliver in different notations, like

  • RDF/XML
  • N-Tripel
  • Turtle
  • ..and my be others.

This will probably implemented as an extension that provides a special page which, wehn called without parameters, would present a form where the user can specify what data is wanted in what format.

Another possible feature would be to allow users to add custom RDF information on any page, in Turtle format, in a <rdf>...</rdf> block

Implementation

The implementation will probably be based on the RAP framework [1].

core

The core function will take a list of "models", i.e. datasets wanted, and return a RAP model which can then be serialized to create the actual output.

Here is the code for that function, as suggested by User:Evan:

 function getRdf($article, $modelNames=$wgDefaultModelNames) {
   global $wgModelFunctions;

   #empty model

   $fullModel = new RAP::Model();

   for ($modelNames as $modelName) {
        $modelFunction = $wgModelFunctions[$modelName];
        if ($modelFunction == null) {
             #print error
             continue;
        }
        $model = $modelFunction($article);
        $fullModel->merge($model);
   }

   return $fullModel;
 }
 

$wgModelFunctions may be changed to contain a human readable description of the data set in addition to the function name. This would the be displayed on the query form where the user can choose the data sets.

some example functions

Here's an exampel function for creating Dublin Core base data (code suggested by User:evan):

 function DublinCoreModel($article) {
    global $DC_creator, $DC_date; # available from RAP system

    $model = new Model();

    $resource = getArticleResource($article); # Gets a RAP resource for the article; we'll have some utilities like this
    $user = getUserResource($article->getUser()); # another utility

    $model->add(new Statement($resource, $DC_creator, $user));
    $model->add(new Statement($resource, $DC_date, new Literal($article->getDate()));

    # etc.

    return $model;
 }
 

Example function for listing all links from a page (also suggested by User:evan):

 function LinkingModel($article) {
 
    global $DCMES, $DCTERM, $DCMI_types;
 
    $model = new Model();
 
    $resource = getArticleResource($article); # here's that utility again
 
    $linkFromTitles = $article->links(); # actually, I'm pretty sure this doesn't exist, but it should. B-)
 
    for ($linkFromTitles as $linkFromTitle) {
        $model->add(new Statement($resource, $DCTERM['References'], titleToResource($linkFromTitle));
    }
     
    $linkToTitles = $article->whatLinksHere(); # another function that never was
 
    for ($linkToTitles as $linkToTitle) {
        $model->add(new Statement($resource, $DCTERM['isReferencedBy'], titleToResource($linkToTitle));
    }
 
    # ... more for Image links, etc.
 
    return $model;
 }
 

It may however be better to have a more generic function that allows to build such a list directly from an SQL query. This would make it very easy to add new datasets.

custom RDF

Function for building a model from custum RDF stuff on the wiki page:

 function InTextTurtleModel($article) {

    $text = $article->getText();

    $turtleBits = preg_match("<rdf>.*?</rdf>"); # Get stuff between <rdf> tags

    $turtle = string_join($turtleBits); # ...and join it into one big string

    $turtleParser = new N3Parser(); # RAP's "N3" parser is really a Turtle parser

    $model = $turtleParser->parse2model($turtle); # FIXME: handle errors here

    return $model; #  # That's it!
 }
 

the code on the wiki page would use Turtle notation. The blow examplesais that some of the text of the page was copied from another Wikipedia article, and that another part of the page was copied from some other random URL.

 <rdf>
    <> dc:source <http://www.example.com/some/upstream/document.txt>, Wikipedia:AnotherArticle .

    <http://www.example.com/some/upstream/document.txt>
      a cc:Work;
      dc:creator "Anne Example-Person", "Anne Uther-Person";
      dc:contributor "Yadda Nudda Person";
      dc:dateCopyrighted "14 Mar 2005";
      cc:License cc:by-sa-1.0.
 </rdf>
 

Links