Wikidata/Notes/Inclusion syntax v0.2

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Daniel Kinzler (WMDE) (talk | contribs) at 09:14, 29 May 2012 (→‎Multiple Statements: Other parameters (e.g. ''unit'') are passed to the template as simple parameters.). It may differ significantly from the current version.
New Version (May 29 2012) This is a completely reworked proposal with little resemblance to the original draft!

This page descripbes the data inclusion syntax for the Wikibase client, by which the properties of data items can be included and renderd on a wiki page using templates.

Accessing Item Data

Properties of a Wikidata Item can be used via the #property parser function:

 {{#property:population}}
 

This will provide the value of the population property of the page's default item. The default item is the Wikidata item that is associated with this page by virtue of its Wikilink property.

To access properties of a different item, the item has to be specified explicitly, either by id or using the associated Wiki page:

 {{#property:population|item=id/q12345}}

or

 {{#property:population|item=en/Germany}}

Item data is cached in memory, so accessing several properties of the same item is efficient.

Value Formatting

The item value is returned as wikitext per default, with some formatting applied where appropriate. For instance, dates and numbers are formatted using the formatting rules of the page's content language, item references are turned into local wiki links, etc. The concrete formatting rules depend on the propertie's data type.

To access the raw value, e.g. to apply custom formatting, use the format option:

 {{#property:population|format=raw}}
 

This would return something like 1234567 instead of using e.g. the en-US formatted version 1,234,567.

Unit conversion can also be done on the fly:

 {{#property:area|unit=miles^2|precision=0.1}}

This would return the area in square miles, to one degit after the decimal point.

Statement Parts

In Wikidata, properties don't simply have one value. Instead, their have statements consisting of several parts (roughly equivalent to the "snaks" in the data model), one of it is (usually) "the" value. Some types of properties may not have a single value - e.g. a geo-location would have a latitude and longitude part, but no "value".

All parts of the statement can be accessed using the #property function:

 {{#property:population|part=source}}
 {{#property:population|part=accuracy}}
 {{#property:population|part=timestamp|precision=year}}
 

These would provide the source(s), the accuracy, and the timestamp of the statement about the population. The timestamp would be given as a year, even if provided in more detail. The source references are themselves complex objects that require templates for rendering (see below).

If the part is not given explicitly, the default (value) part of the statement is used (as in the examples in the previous section). If the statement has no default value part, and no part is explicity specified, a warning message is returned instead of any property value.

TBD: List of parts! part=exists, part=label (from property!), part=type (from property!), part=indicators?

Coalescing Values

Properties don't necessarily have a single value. They may have multiple values (more precisely, multiple statements), either because they are inheritely multivalued, or because conflicting oppinions exist about the value, or because the value changes over time.

Each statement has a "preference" value, which is may be "preferred", "other", "unsourced" and "deprecated". When determining the statement to use for a call to the #property function, "preferred" statements are picked over "other" statements, "other" statements are preferred over "unsourced" onces, etc.

However, there may still be multiple statements that are equally "strong". In that case, these staements are combined or "coalesced" to form a single virtual statememnt: all parts of the statements are combined in a way appropriate for they respective data types. The list of sources for the statements are concatenated, the worst accuracy is choosen, the values are combined into ranges or lists, depending on their type (note that while ther emay be several statements with different values for a given property, they all have the same data type, namely, the type specified in the property declaration). For example, if there are three values for the population given by different sources, and there's no agreement on which source should be authoritative, this would be represented by three statements for a single property:

property value accuracy timestamp source preference
population 263455 +/-200 2010 Foo preferred
population 251104 +/-100 2011 Bar preferred
population 268122 +/-1% January 2010 Quux preferred

There may be more non-preferred (e.g. older) values:

property value accuracy timestamp source preference
population 261108 +/-200 2009 Foo other
population 250104 +/-80 2009 Acme other

In this case, the default values would automatically be coalesced into a range:

  {{#property:population}}

Would evaluate to something like this (depending on the property definition):

251,104 – 268,122

Other parts are also coalesced, essentially forming a single statement:

property value accuracy timestamp source preference
population 251104 - 268122 +/-1% 2010 - 2011 Foo, Bar, Quux preferred

The way different parts are combined depends on their type and semantics: scalar values are combiend into ranges, texts, sources and item references are combined into lists, the accuracy is using the maximum (worst accuracy), the timestamp uses a range cut be the minimum accuracy (in this case, the "January" part is dropped because other timestamps didn't provide a month), etc.

Explicit Selection of Statements

To avoid automatic coalescing even if there are several equally strong statements, the desired statement can be selected explicitly, using it's source as an identifier (Note: this assumes that a single sources does not make multiple contradicting claims – if it does, the source should be made more precise, e.g. by giving the page and paragraph).

So, in the above example, this could be used:

  {{#property:population|source=Bar}}
  

This would pick the value given by source "Bar":

251,104
TBD: how source/reference identifier are maintained.

Multiple Statements

If a property has multiple statements, it is somtimes desirable to simply list all or some of them in detail. With the help of a scripting language like Lua, this is easy enough using a for-loop. However, we need some help to achieve the same using traditional MediaWiki template syntax (i.e. parser functions).

To this end, the {{#property-values}} can be used:

   {{#property-values:area|template=country-area-info|unit=miles^2|precision=0.1}}
   

This would call the template country-area-info for each statement of the property area. Inside the the template, the respective property (e.g. area) would have only a single value. Other parameters (e.g. unit) are passed to the template as simple parameters.

If Template:Country-area-info was supposed to generate a single table row, its implementation could look something like this:

     |-
     | {{#property:area|unit={{{unit|km}}}|precision={{{precision|1}}}}}
     | {{#property:area|part=accuracy}}
     | {{#property:area|part=timestamp|precision=year}}
     | {{#property:area|part=source}}

No coalescing would take place here, since the additional statements for the property would be masked by the {{#property-values}} function.

Changing the Default Item

If a page or template wants to make a different item the default item, this can be done using the {{#data-item}} function. For instance, on the page Germany on the English language Wikipedia, the default item would (per definition) be en/Germany. So

 {{#property:population}}
 

would be shorthand for

 {{#property:population|item=en/Germany}}

To override this, the following syntax can then be used:

 {{#data-item:id/q54321}}
 

Now,

 {{#property:population}}
 

would be shorthand for

 {{#property:population|item=id/q54321}}  

This may be of limited used directly inside an article page (the need to do this would indicate that there's somethign wrong with the language links or the article's scope). But it is expected that this mechanism is wuite useful inside templates. Maybe there's an extra article (and data item) about the Germany Economy, with some overview data like GNP, etc. It may then be useful to show an infobox about the economy directly on the page about Germany itself. With the help of {{#data-item}}, we could do the following on the Germany page:

 {{national economy box|item=en/Economy of Germany}}
 

Inside Template:National_economy_box, the {{{item}}} parameter could now be used directly to fetch data properties from the desired item:

 {{#property:GNP|item={{{item}}}}}
 

But this is cumbersome (item={{{item}}} all over the place) and also error prone (missing item={{{item}}} in some places). It's nicer to just set the default item for the template:

 {{#data-item:{{{item}}}}}
 

This sets the data item for the present scope (implemented using a preprocessor frame), so the item doesn't have to be given explicitly:

  {{#property:GNP}}
  

Note that {{#data-item}} sets the default item for the present scope (e.g. the template), not the entire page (which would be confusing)!

Special properties

TBD: Describe special properties like edit-link, id, alias, etc.

Formatting Sources

TBD: machnisms for formatting sources and source references in compliance with the current citation system and existing citation templates.

Property Definitions

TBD: Describe what properties properties have, how they are maintained, etc


Item properties reference the Property declaration. Property declarations are heavily cached on repo and client. They contain:

  • Datatype (also a reference, also heavily cached)
  • Labels in several languages
  • "no source needed" flag (don't assign "unsourced" status, even if no source is given)
  • "multi-value" flag (multiple default values don't constitute a conflict/dispute)