DBpedia: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Updated page for the September 2007 release ofthe dataset (Chris Bizer).
m →‎DBpedia Ontology: link disambiguation
 
(18 intermediate revisions by 10 users not shown)
Line 1: Line 1:
'''DBpedia''' is a community effort to extract structured information from [[Wikipedia]] and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.
'''DBpedia''' is a community effort to extract structured information from [[Wikipedia]] and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.


Information about the DBpedia project and dataset can be found here:
== The DBpedia Dataset ==
* [[:w:en:DBpedia|The DBpedia article on Wikipedia]]
* [http://dbpedia.org Project site]


===DBpedia on MetaWiki===
Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates, categorisation information, images, geo-coordinates and links to external Web pages. This structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content.
Several pages were created on MetaWiki to aid in extracting information from Wikipedia.
Semi-structured information can be found on Wikipedia in articles with infoboxes.
The main problem is that infoboxes are generally designed to appeal to human users, when rendered.
For machines it is much harder to access the information in infoboxes and extract meaningful and useful data.
To ease this process two different pages were created in the [[User:DBpedia-Bot]] namespace:


====DBpedia Ontology====
The DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevantexternal web pages, 180,000 external links into other RDF datasets, 207,000 Wikipedia categories and 75,000 YAGO categories.
The [[User:DBpedia-Bot/ontology| DBpedia Ontology]] aims to be a common scheme for articles (and especially infoboxes of articles) in Wikipedia.
It is a repository of unique names and identifiers (vocabulary) for the strings used in infoboxes and aims to merge together what belongs together and separate strings with different meaning.


A simple example:
The DBpedia project uses the [http://en.wikipedia.org/wiki/Resource_Description_Framework Resource Description Framework] as a flexible data model for representing extracted information and for publishing it on the Web. As of September 2007, the DBpedia dataset consists of around 103 million RDF triples, which have been extracted from the English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian versions of Wikipedia.
On the English Wikipedia's infoboxes there are 27 different variants of [[User:DBpedia-Bot/ontology/birthPlace|birthPlace]] such as birthplace, placeOfBirth, born,
which all have the same meaning, i.e. telling the place where a person was born.
On the other hand, in [[Wikipedia:Bjork]] for example '''born''' is used for [[User:DBpedia-Bot/ontology/birthPlace|birthDate]] and [[User:DBpedia-Bot/ontology/birthPlace|birthPlace]] at the same time.


The [[User:DBpedia-Bot/ontology| DBpedia Ontology]] provides a single name for each meaning and thus makes it easier for machines to extract information correctly.
The DBpedia dataset is available under the terms of the GNU Free Documentation License.
Besides names for properties it also contains classes, which provide a strict categorisation system for articles.


More about the ontology can be found on the [[User:DBpedia-Bot/ontology| DBpedia Ontology page]] and the [[User:DBpedia-Bot/ontology-overview| Ontology overview page ]].
The DBpedia dataset is interlinked on RDF level with various other Open Data datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets.
As of June 2007, DBpedia is interlinked with the following datasets: GeoNames, Musicbrainz, CIA World Fact Book, DBLP, Project Gutenberg, DBtune Jamendo and Eurostat as well as US Census data.
See [http://dbpedia.org/docs/ DBpedia website] and [http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData W3C SWEO Linking Open Data Community Project] for details about interlinked datasets.


It adheres to common Semantic Web standards like [[Wikipedia:RDF| RDF]] and [[Wikipedia:Web Ontology Language| OWL]].
== Accessing the DBpedia Dataset ==


====DBpedia Mapping====
The DBpedia dataset can be accessed using three different access mechanisms:
As the [[User:DBpedia-Bot/ontology| DBpedia Ontology]] provides a repository for semantically well-defined identifiers, it is now possible to create a [[User:DBpedia-Bot/mapping| mapping]] from infoboxes and templates to the ontology.


Each infobox template (such as [[Wikipedia:Template:Infobox_Musical_artist]]) can be assigned an OWL class from the ontology.
* SPARQL Endpoint. There is a [http://dbpedia.org/sparql public SPARQL endpoint] which enables you to query the dataset using the [http://en.wikipedia.org/wiki/SPARQL SPARQL] query language. You can use the [http://DBpedia.org/snorql SNORQL query explorer] to ask queries against the endpoint (does not work with Internet Explorer). Several example queries are found on the [http://dbpedia.org/docs/ DBpedia website].
Furthermore, each template parameter (such as Background, Born, Died, Origin) can be mapped to an ontology property.
* Linked Data Interface. DBpedia is also served as [http://en.wikipedia.org/wiki/LinkedData Linked Data], meaning that you can use Semantic Web browsers like [http://www.w3.org/2005/ajar/tab Tabulator], [http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/disco/ DISCO] or the [http://demo.openlinksw.com/DAV/JS/rdfbrowser/index.html Open Link Data Browser] to navigate the dataset.
* Downloads. The DBpedia dataset can also be downloaded from the [http://dbpedia.org/docs/ DBpedia website].


More information about the can soon be found on the [[User:DBpedia-Bot/mapping| mapping page]].
== External links ==
=== Web Pages ===
* [http://dbpedia.org/ DBpedia Project] - Official website
* [http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData W3C SWEO Linking Open Data Community Project]


=== Publications ===
====Synchronization====
* Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives: [http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/Auer-Bizer-ISWC2007-DBpedia.pdf DBpedia: A Nucleus for a Web of Open Data]. 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, November 2007.
* Christian Bizer et al.: [http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/DBpedia-WWW2007-draft-slides.pdf DBpedia - Querying Wikipedia like a Database]. Developers track presentation at WWW2007.
* Sören Auer, Jens Lehmann: [http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content]. Paper at ESWC 2007.
* Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: [http://www2007.org/papers/paper391.pdf Yago: A Core of Semantic Knowledge - Unifying WordNet and Wikipedia]. Paper at WWW2007.
* Christian Bizer et al.: [http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkingOpenData.pdf Interlinking Open Data on the Web] ([http://linkeddata.org/documents/eswc2007-poster-linking-open-data.pdf Poster]). Poster at ESWC 2007.


The changes made to subpages of [[User:DBpedia-Bot/mapping]] and [[User:DBpedia-Bot/ontology]]
[[Category:Free software culture and documents]]
are synchronized with a Semantic Web triple store and take effect within minutes.
[[Category:Open access]]

[[Category:World Wide Web]]
The ontology can be downloaded with this link:
[[Category:Semantic web]]
[http://dbpedia-live.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org%2Fmeta&should-sponge=&query=+CONSTRUCT+{%3Fs+%3Fp+%3Fo}+where%0D%0A{%0D%0A%3Fb+%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2Forigin%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2FTBoxExtractor%3E+.%0D%0A%3Fb+owl%3AannotatedSource+%3Fs+.%0D%0A%3Fb+owl%3AannotatedProperty+%3Fp+.%0D%0A%3Fb+owl%3AannotatedTarget+%3Fo+.%0D%0AFilter+%28!+%28%3Fp+in+%28%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2Feditlink%3E%2C+%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2FeditLink%3E%2C+%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2Frevisionlink%3E%2C%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2FrevisionLink%3E%2C%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fmeta%2Foaiidentifier%3E%29%29%29.%0D%0A}+&format=application%2Frdf%2Bxml&debug=on&timeout= link] and viewed in standard Ontology editors like [[Wikipedia:Protégé_(software)]]


[[Category:DBpedia|*]]
[[Category:German engineering]]
[[Category:German engineering]]
[[Category:Research]]
[[Category:Research]]
[[category:Wikidata]]
[[category:Wikidata]]
[[Category:Wikimedia_projects]]

Latest revision as of 12:36, 27 August 2016

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.

Information about the DBpedia project and dataset can be found here:

DBpedia on MetaWiki[edit]

Several pages were created on MetaWiki to aid in extracting information from Wikipedia. Semi-structured information can be found on Wikipedia in articles with infoboxes. The main problem is that infoboxes are generally designed to appeal to human users, when rendered. For machines it is much harder to access the information in infoboxes and extract meaningful and useful data. To ease this process two different pages were created in the User:DBpedia-Bot namespace:

DBpedia Ontology[edit]

The DBpedia Ontology aims to be a common scheme for articles (and especially infoboxes of articles) in Wikipedia. It is a repository of unique names and identifiers (vocabulary) for the strings used in infoboxes and aims to merge together what belongs together and separate strings with different meaning.

A simple example: On the English Wikipedia's infoboxes there are 27 different variants of birthPlace such as birthplace, placeOfBirth, born, which all have the same meaning, i.e. telling the place where a person was born. On the other hand, in Wikipedia:Bjork for example born is used for birthDate and birthPlace at the same time.

The DBpedia Ontology provides a single name for each meaning and thus makes it easier for machines to extract information correctly. Besides names for properties it also contains classes, which provide a strict categorisation system for articles.

More about the ontology can be found on the DBpedia Ontology page and the Ontology overview page .

It adheres to common Semantic Web standards like RDF and OWL.

DBpedia Mapping[edit]

As the DBpedia Ontology provides a repository for semantically well-defined identifiers, it is now possible to create a mapping from infoboxes and templates to the ontology.

Each infobox template (such as Wikipedia:Template:Infobox_Musical_artist) can be assigned an OWL class from the ontology. Furthermore, each template parameter (such as Background, Born, Died, Origin) can be mapped to an ontology property.

More information about the can soon be found on the mapping page.

Synchronization[edit]

The changes made to subpages of User:DBpedia-Bot/mapping and User:DBpedia-Bot/ontology are synchronized with a Semantic Web triple store and take effect within minutes.

The ontology can be downloaded with this link: link and viewed in standard Ontology editors like Wikipedia:Protégé_(software)