Talk:Wikipedia DTD and Wikipedia DTD: Difference between pages
Need a rationale for creating a custom DTD |
m found the right page |
||
Line 1: | Line 1: | ||
==Introduction== |
|||
''There was I discussion on general aspects of what should be marked up in wikipedia (names, places, dates/times...). since [[Wikipedia DTD]] is about an XML representation of the current syntax I moved it to [[Talk:Simple ideology of Wikitax]]'' --[[User:Nichtich|Nichtich]] 01:36 Feb 7, 2003 (UTC) |
|||
This is a draft of Wikipedia [[w:DTD|DTD]], an interchangable [[en:XML|XML]] representation of the content of wikipedia articles. It contains elements for the content of an article (Wikitext DTD). This is also a contribution to get a formal [[wikitext standard]] that is still lacking. When there is such a standard multiple suites of software (some based on [[mediawiki]], some not) could support it, and each Wikipedia could choose between them, while supporting the same [[portable open content]]. |
|||
---- |
|||
''<nowiki><![CDATA[do [not] parse <this>]]></nowiki>'' |
|||
===Important notes=== |
|||
:I've never been quite clear on how CDATA sections work. If my data includes a raw "]]>", how do I encode it? --[[User:Brion VIBBER|Brion VIBBER]] 06:44 Jan 23, 2003 (UTC) |
|||
Wikipedia DTD is an interchange format. It is not to be meant to write articles in it nor to replace a database! |
|||
Up to now I ([[User:Nichtich|Jakob Voss]]) am the only author of the Wikipedia DTD but this is meant to be collaborative work in progress - feel free to contribute. Though you should at least know XML basics. Please add comments on [[Talk:Wikipedia DTD]]. I am also working on simple WikipediaDTD-to-HTML and WikipediaDTD-to-Wikitax scripts that will be made public soon. The most important thing to use Wikipedia DTD in real life is a Wikitax-2-XML parser. The possibility to mix wiki syntax with '''invalid''' HTML is quite complicated. |
|||
:*<nowiki><![CDATA[just]]></nowiki>''']]&gt;'''<nowiki><![CDATA[split]]></nowiki> |
|||
'''On Compatibility''': |
|||
---- |
|||
Up to know there are wikipedia-articles containing data that will not fit into |
|||
wikiarticle.dtd because they contain broken or ugly HTML. Some elements that |
|||
are still allowed in wikitex should have to be removed (for instance HTML-coded |
|||
headings and horizontal lines or the font-element). This is a topic to discuss. |
|||
Still missing parts: table, dl, pre, div, ruby, font, var. And many attributes |
|||
''<link system="wiki" href="image:Wiki.png"/>'' |
|||
=== See also === |
|||
Image links are functionally different from regular wiki links, as they embed images. It would be best to use a distinct tag. --[[User:Brion VIBBER|Brion VIBBER]] 07:03 Jan 23, 2003 (UTC) |
|||
* [[Wikitax]] - talk on wikitax |
|||
* [[Wikitext DTD]] - some more talk |
|||
* [[wikitext standard]] - talk about a standard of a [[wikitax]] representation of wikipedia/media articles. |
|||
* [[Wikipedia lexer]] - the Wikitax-to-structured-data-for-instance-XML stuff (work needed here!) |
|||
* [[XML syntax]] - talk an other XML/Wiki relations |
|||
* [[Wikipedia DTD/Overview]] - Element overview and comparision (out of date) |
|||
* [[Wikipedia DTD/Examples]] - Example documents |
|||
* [[Wikipedia.DTD]] - the DTD (as documented here) NOTE: wikitax-element missing! |
|||
* [[MediaWiki]] sourcecode, especially: |
|||
** includes\OutputPage.php (wikitax) |
|||
** [http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/includes/SpecialExport.php SpecialExport.php] |
|||
*** [[XML import and export]] (XML export format) |
|||
** docs\schema.doc and maintenance\tables.sql (core structure) |
|||
* The [http://www.w3.org/TR/xhtml1/ XHTML standard] |
|||
=== Advantages === |
|||
:You're right. I suggest: |
|||
You could also use an [[XML syntax|XML representation]] to |
|||
* produce valid XHTML |
|||
Link to the '''page''' of the image: |
|||
* produce printable output (for instance PDF with FO) |
|||
<tt><nowiki>[[:image:Wiki.png]] |
|||
* export Wikitax to other formats (for instance [[w:DocBook]]) |
|||
<link system="wiki" href="image:Wiki.png"/></nowiki></tt> |
|||
* export other formats to Wikitax (for instance OpenOffice files) |
|||
* perform automatic analysis on structure and layout of articles |
|||
* exploit better XML-based tools to define and extend [[Wikitax]] |
|||
* integrate specialized DTDs for "who" ([[person_DTD]]), "when and where" ([[spacetime_DTD]]) easily, as long as the tag spaces are distinct |
|||
==Wikipedia articles== |
|||
Embed image/media/file/... |
|||
===Root element=== |
|||
<tt><nowiki>[[image:Wiki.png]] <media href="image:Wiki.png"/></nowiki></tt> |
|||
Up to now the Wikipedia DTD is only meant to single articles so the root element is article. There is a metadata section containing the title and other stuff and the article content itself that may be text or a redirect. |
|||
<pre><!ELEMENT article (meta, (wikitax | text | redirect))> |
|||
I prefer <tt>media</tt> because we only embed media objects and |
|||
</pre> |
|||
<tt>embed</tt> could mean something link "embed the content of another page". |
|||
See also discussion on special pages below. |
|||
=== Linking model === |
|||
---- |
|||
Links (to other articles) are one of the most important thing in Wikipedia. In most cases they simply consist of an article name, but there may also be namespaces. In Wikipedia there are 9 namespaces: default (none), talk, user, user-talk, wikipedia, wikipedia-talk, image, image-talk and special. The names of namespaces may differ in different languages but the namespaces remain the same. For instance there is no namespace ''Diskussion'', it is only the german name for the ''talk'' namespace. Local names are not part of the Wikipedia DTD. |
|||
I prefer seperating the ''talk''-property of a namespace. |
|||
''<nowiki><link system="wiki" href="Wikipedia FAQ"/></nowiki>'' |
|||
... |
|||
''<nowiki><url href="http://www.wikipedia.org"/></nowiki>'' |
|||
... |
|||
''<nowiki><mail to="webmaster@wikipedia.org"/></nowiki>'' |
|||
<pre> |
|||
These seem overcomplicated. Wouldn't it be simpler (in an XML way) to use the same tag for all links, and just have a wiki-specific URI? eg: |
|||
<!ENTITY % possible-namespaces "(special | user | wikipedia | image)"> |
|||
*local wiki link: [[Main Page]] |
|||
**<link href="wiki:Main_Page">Main Page</link> |
|||
*interwiki link: [[MeatBall:CommunityExpectation]] |
|||
**<link href="wiki://MeatBall/CommunityExpectation">MeatBall:CommunityExpectation</link> |
|||
*interlanguage link: <nowiki>[[eo:DTD de Vikipedio]]</nowiki> |
|||
**<link href="wiki://EsperantoWikipedia/DTD_de_Vikipedio" rel="language" lang="eo" /> |
|||
*remote non-wiki link: [http://slashdot.org/ Slashdot] |
|||
**<nowiki><link href="http://slashdot.org/">Slashdot</link></nowiki> |
|||
*ISBN: ISBN 0-201-89683-4 |
|||
**<nowiki><link href="isbn:0201896834">ISBN 0-201-89683-4</link></nowiki> |
|||
<!ENTITY % local-link-model " |
|||
Upon (possible) reconversion to wiki syntax, the parser could use the most efficient form of representation available in that particular wiki syntax for that type of link. |
|||
talk (talk) #IMPLIED |
|||
namespace %possible-namespaces; #IMPLIED |
|||
article CDATA #REQUIRED |
|||
"></pre> |
|||
The attributes of the local-link-model parameter entity form a full local link destination. |
|||
---- |
|||
===Metadata elements=== |
|||
*No redundancy please |
|||
The metadata section contains information about title, status, version history and interwiki links. Only the title is obligatory. There may be added elements for copyright information, category-links and other ''this-is-much-more-dublin-core-than-article-content-stuff''. |
|||
*An XML syntax should code information in tags and attributes. parsing strings is ugly and less efficient. |
|||
*The difference between interwiki links and local wiki links depends on the application. Try to edit <tt>test:baz</tt> since now it's a valid name but maybe there will be a "test"-wiki in the future. |
|||
*interlanguage links are a special topic. We could use a special tag: <pre><interlanguage href="eo:DTD_de_Vikipedio"/></pre> |
|||
*How about <tt>link system="url"</tt> for external links instead of <tt>url</tt> and <tt>email</tt>? |
|||
<pre><!ELEMENT meta (title, status?, interwiki*, history?)></pre> |
|||
--[[User:Nichtich|Nichtich]] 22:34 Feb 2, 2003 (UTC) |
|||
==== Title ==== |
|||
It could be useful to code |
|||
The title of an article does not change - so it is not part of the article history. Since a title may contain namespaces it is the easisiest to specify the full title as a link to the article itself. The interwiki-attribute may specify the wiki the article comes from (normally a language). |
|||
<pre><!ELEMENT title EMPTY> |
|||
<pre>User:Foo => <link system="wiki" space="user" href="Foo"/> |
|||
<!ATTLIST title |
|||
Talk:Bar => <link system="wiki" space="talk" href="Bar"/></pre> |
|||
interwiki NMTOKEN #IMPLIED |
|||
%local-link-model; |
|||
></pre> |
|||
==== Interwiki links ==== |
|||
and in other languages |
|||
There are several reasons why interwiki links belong into the metadata section. Interwikilinks are ''relations'' of an article (maybe there will be other relation types in the future) in spite of normal links in the article content that may mean several things. Concurrently there may be links to other wikipedias inside the article content. Do not mix this. Interwiki links use the same [[#Linking model]] as other links but they must specify a known language. |
|||
<pre><!ELEMENT interwiki EMPTY> |
|||
<pre>Benutzer:Foo => <link system="wiki" space="user" href="Foo"/> |
|||
<!ATTLIST interwiki |
|||
Diskussion:Bar => <link system="wiki" space="talk" href="Bar"/></pre> |
|||
language NMTOKEN #REQUIRED |
|||
%local-link-model; |
|||
></pre> |
|||
==== Article Status ==== |
|||
But how to handle a page like: |
|||
The status element contains status information like whether the article is protected, whether a table of contents should be shown etc. Since the status may change due edits it´s also part of the article history. |
|||
<pre>Talk:User:Foo</pre> |
|||
<pre><!ELEMENT status EMPTY> |
|||
Also possible (for instance in the german Wikipedia): |
|||
<!ATTLIST status |
|||
protected (protected) #IMPLIED |
|||
counter CDATA #IMPLIED |
|||
notoc (notoc) #IMPLIED |
|||
></pre> |
|||
''Note:'' In the actual database the counter (number of times a page has been viewed) is only saved for the current version. This should be changed in the future to get more usage information (for instance to see how often a page has been viewed since the last edit or to detect edit-wars automatically). |
|||
<pre>Diskussion:Talk:User:Foo</pre> |
|||
==== Version history ==== |
|||
--[[User:Nichtich|Nichtich]] 21:59 Feb 2, 2003 (UTC) |
|||
The version history simply contains of a number of edits. |
|||
---- |
|||
===notes on paragraphs=== |
|||
Manual Paragraphs with the <tt>p</tt> tag are pretty ugly to handle. |
|||
Try: |
|||
<pre>< |
<pre><!ELEMENT history (edit)+></pre> |
||
Each edit contains the edit information (user, timestamp...) and the current status, interwiki-links and content of an article after the edit. The article content is optional - if it is not provided there is just no change or it is just not included because were are not interested in. |
|||
with an empty line in it.</p></pre> |
|||
<pre> |
|||
You get: |
|||
<!ELEMENT edit (status?, interwiki*, (text | redirect)?)> |
|||
<!ATTLIST edit |
|||
user CDATA #REQUIRED |
|||
comment CDATA #IMPLIED |
|||
timestamp CDATA #IMPLIED |
|||
minor (minor) #IMPLIED |
|||
></pre> |
|||
=== Redirects === |
|||
<pre><p>Hi! This is a paragraph |
|||
Either an article contains a redirect or text. Redirects are links to articles in the same wiki. |
|||
<p> |
|||
with an empty line in it.</p> |
|||
<pre><!ELEMENT redirect EMPTY> |
|||
<!ATTLIST redirect |
|||
%local-link-model; |
|||
></pre> |
|||
=== Wikitax === |
|||
Since there is no Wikitax2XML-parser yet the article content could also be transfered in Wikitax. Since Wikitax depends on the Wikimedia software and might change a version information should be provided. |
|||
<pre><!ELEMENT wikitax (#PCDATA)> |
|||
<!ATTLIST wikitax |
|||
version CDATA #REQUIRED |
|||
></pre> |
|||
== Wikitext == |
|||
An XML representation of article content is the core or Wikipedia DTD. You can also use this part alone. |
|||
<pre> |
|||
<!ENTITY % wikitext-block "ul | ol | center | blockquote | pbr | hr | h1 | h2 | h3 | h4 | h5 | h6"> |
|||
<!ENTITY % wikitext-inline-format "b | i | sub | sup | big | small | tt | u | br | nowiki"> |
|||
<!ENTITY % wikitext-inline-special "math | wikivar | link | reference | url"> |
|||
<!ENTITY % wikitext-inline "%wikitext-inline-format; | %wikitext-inline-special;"> |
|||
</pre> |
</pre> |
||
''There are missing some not yet defined elements in the parameter entities.'' |
|||
but the valid syntax is |
|||
<pre><!ELEMENT text (#PCDATA | %wikitext-block; | %wikitext-inline;)*> |
|||
<pre><p>Hi! This is a paragraph</p> |
|||
</pre> |
|||
<p>with an empty line in it.</p></pre> |
|||
=== Block elements === |
|||
Why can't we just remove all invalid HTML-Tags? :-( |
|||
==== Headings, horzontal line ==== |
|||
In contrast to HTML there are no attributes. |
|||
<pre><!ELEMENT h1 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT h2 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT h3 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT h4 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT h5 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT h6 (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT hr EMPTY> |
|||
----- |
|||
</pre> |
|||
==== Indented lines ==== |
|||
IMHO the interlanguage-link would be better like |
|||
<pre><!ELEMENT indent (#PCDATA | %wikitext-inline;)*> |
|||
<pre><interlanguage lang="eo" href="DTD_de_Vikipedio"/></pre> |
|||
<!ATTLIST indent |
|||
depth CDATA '1' |
|||
> |
|||
</pre> |
|||
==== Lists ==== |
|||
That way you give more information without the need of parsing |
|||
To avoid #PCDATA and sublist mixing we define oli=li+ol and uli=li+ul |
|||
the content of href. [[User:Lothar Kimmeringer|Lothar Kimmeringer]] 02:37, 15 Sep 2003 (UTC) |
|||
---- |
|||
How about a definition of the term DTD right up front as in this external link:<BR> |
|||
http://www.hyperdictionary.com/dictionary/Document+Type+Definition |
|||
<pre><!ELEMENT ol (li | ol | uli)+> |
|||
== Need a rationale for creating a custom DTD == |
|||
<!ELEMENT ul (li | oli | uli)+> |
|||
<!ELEMENT oli (li | ol | uli)+> |
|||
<!ELEMENT uli (li | oli | uli)+> |
|||
<!ELEMENT li (#PCDATA | %wikitext-inline;)*> |
|||
</pre> |
|||
'''TODO:''' |
|||
* attributes: "type", "start", "value", "compact", |
|||
* definition lists |
|||
==== Tables ==== |
|||
'''TODO''' |
|||
* elements: table, tr, td, th |
|||
* attributes: "summary", "width", "border", "frame", "rules", cellspacing", "cellpadding", "valign", "char", charoff", "colgroup", "col", "span", "abbr", "axis", headers", "scope", "rowspan", "colspan" |
|||
==== center, blockquote ==== |
|||
<pre><!ELEMENT center (#PCDATA | %wikitext-inline;)*></pre> |
|||
<pre><!ELEMENT blockquote (#PCDATA | %wikitext-inline;)*> |
|||
<!ATTLIST blockquote |
|||
cite CDATA #IMPLIED |
|||
></pre> |
|||
==== pre, div ==== |
|||
TODO (what is allowed inside? - div as block and inline?) |
|||
==== Paragraph breaks ==== |
|||
Wikitax provides a way to seperate paragraphs: just add an empty line. |
|||
In Wikitext DTD this is represented by the tag <tt><pbr/></tt> |
|||
(paragraph break). The posibility to create paragraphs with |
|||
<tt><p></tt> should be abolished because it leads to broken |
|||
XML and we should reduce the number of allowed HTML-tags. |
|||
<pre><!ELEMENT pbr EMPTY> |
|||
</pre> |
|||
=== Inline elements === |
|||
==== Wikitext special elements ==== |
|||
''TODO:'' nowiki, media |
|||
===== nowiki parts ===== |
|||
<pre><!ELEMENT nowiki (#PCDATA)></pre> |
|||
===== Links ===== |
|||
See [[#Linking_model]] for details. |
|||
<pre><!ELEMENT link (#PCDATA | %wikitext-inline-format;)*> |
|||
<!ATTLIST link |
|||
interwiki NMTOKEN #IMPLIED |
|||
%local-link-model; |
|||
> |
|||
</pre> |
|||
===== Math ===== |
|||
The image attribute may provide an image representation |
|||
<pre><!ELEMENT math (#PCDATA)> |
|||
<!ATTLIST math |
|||
image ENTITY #IMPLIED |
|||
> |
|||
</pre> |
|||
===== URL ===== |
|||
<pre><!ELEMENT url (#PCDATA | %wikitext-inline-format;)*> |
|||
<!ATTLIST url |
|||
href CDATA #REQUIRED |
|||
></pre> |
|||
===== Reference ===== |
|||
<pre><!ELEMENT reference EMPTY> |
|||
<!ATTLIST reference |
|||
system (email | RFC | ISBN) #REQUIRED |
|||
value CDATA #IMPLIED |
|||
></pre> |
|||
===== Images and other media files ===== |
|||
<pre><!ELEMENT media EMPTY> |
|||
<!ATTLIST media |
|||
name CDATA #REQUIRED |
|||
data ENTITY #IMPLIED |
|||
></pre> |
|||
==== bold/italic ==== |
|||
strong and em will never be used in the right way so use b and i instead. There are no attributes allowed. |
|||
<pre><!ELEMENT b (#PCDATA | i | big | small | sub | sup | tt | u | br | %wikitext-inline-special;)*> |
|||
<!ELEMENT i (#PCDATA | b | big | small | sub | sup | tt | u | br | %wikitext-inline-special;)*></pre> |
|||
==== Several HTML tags ==== |
|||
Several HTML-tags are also allowed in Wikitext DTD, but most of them are |
|||
simplified in some way (for instace no or less attributes). ''These tags |
|||
are <u>not</u> HTML - they are like the same HTML-tags, not equal!'' |
|||
<pre><!ELEMENT tt (#PCDATA | b | i | big | small | sub | sup | u | br | %wikitext-inline-special;)*> |
|||
<!ELEMENT u (#PCDATA | b | i | big | small | sub | sup | tt | br | %wikitext-inline-special;)*> |
|||
<!ELEMENT sub (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT sup (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT big (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT small (#PCDATA | %wikitext-inline;)*> |
|||
<!ELEMENT br EMPTY> |
|||
</pre> |
|||
''TODO'': ruby-tags |
|||
==== Variables ==== |
|||
Some dynamic variables can be used in Wikitax as <tt>{{VARNAME}}</tt>. |
|||
<pre><!ELEMENT wikivar EMPTY> |
|||
<!ATTLIST wikivar |
|||
name (CURRENTMONTH | CURRENTMONTHNAME | CURRENTDAY | CURRENTDAYNAME | |
|||
CURRENTYEAR | CURRENTTIME | NUMBEROFARTICLES) |
|||
#REQUIRED |
|||
> |
|||
</pre> |
|||
=== Open questions === |
|||
Why would we create and support a fit-to-purpose DTD, when everything we use in MediaWiki is already available in [http://www.oasis-open.org/docbook/xml/4.2/ DocBook XML] or even [http://nwalsh.com/docbook/simple/ Simplified DocBook]? |
|||
* div (yes), font (no), var (why) |
|||
There are many excellent tools for converting Docbook XML to various other document formats -- HTML, RTF, PostScript and PDF. It's a well-accepted standard for document markup, and it would thus be useful for readers and for downstream publishers. |
|||
What to do with / where to allow universal HTML attributes (id, class, name, style)? |
|||
Creating a custom DTD would mean that we'd have to create processors from scratch. If DocBook (or another existing XML document format) meets our needs, what's the point? --[[User:Evan|Evan]] 00:43, 16 Dec 2003 (UTC) |
|||
:remove id and name, allow class and style at some elements |
Revision as of 17:47, 1 December 2003
Introduction
This is a draft of Wikipedia DTD, an interchangable XML representation of the content of wikipedia articles. It contains elements for the content of an article (Wikitext DTD). This is also a contribution to get a formal wikitext standard that is still lacking. When there is such a standard multiple suites of software (some based on mediawiki, some not) could support it, and each Wikipedia could choose between them, while supporting the same portable open content.
Important notes
Wikipedia DTD is an interchange format. It is not to be meant to write articles in it nor to replace a database!
Up to now I (Jakob Voss) am the only author of the Wikipedia DTD but this is meant to be collaborative work in progress - feel free to contribute. Though you should at least know XML basics. Please add comments on Talk:Wikipedia DTD. I am also working on simple WikipediaDTD-to-HTML and WikipediaDTD-to-Wikitax scripts that will be made public soon. The most important thing to use Wikipedia DTD in real life is a Wikitax-2-XML parser. The possibility to mix wiki syntax with invalid HTML is quite complicated.
On Compatibility: Up to know there are wikipedia-articles containing data that will not fit into wikiarticle.dtd because they contain broken or ugly HTML. Some elements that are still allowed in wikitex should have to be removed (for instance HTML-coded headings and horizontal lines or the font-element). This is a topic to discuss.
Still missing parts: table, dl, pre, div, ruby, font, var. And many attributes
See also
- Wikitax - talk on wikitax
- Wikitext DTD - some more talk
- wikitext standard - talk about a standard of a wikitax representation of wikipedia/media articles.
- Wikipedia lexer - the Wikitax-to-structured-data-for-instance-XML stuff (work needed here!)
- XML syntax - talk an other XML/Wiki relations
- Wikipedia DTD/Overview - Element overview and comparision (out of date)
- Wikipedia DTD/Examples - Example documents
- Wikipedia.DTD - the DTD (as documented here) NOTE: wikitax-element missing!
- MediaWiki sourcecode, especially:
- includes\OutputPage.php (wikitax)
- SpecialExport.php
- XML import and export (XML export format)
- docs\schema.doc and maintenance\tables.sql (core structure)
- The XHTML standard
Advantages
You could also use an XML representation to
- produce valid XHTML
- produce printable output (for instance PDF with FO)
- export Wikitax to other formats (for instance w:DocBook)
- export other formats to Wikitax (for instance OpenOffice files)
- perform automatic analysis on structure and layout of articles
- exploit better XML-based tools to define and extend Wikitax
- integrate specialized DTDs for "who" (person_DTD), "when and where" (spacetime_DTD) easily, as long as the tag spaces are distinct
Wikipedia articles
Root element
Up to now the Wikipedia DTD is only meant to single articles so the root element is article. There is a metadata section containing the title and other stuff and the article content itself that may be text or a redirect.
<!ELEMENT article (meta, (wikitax | text | redirect))>
Linking model
Links (to other articles) are one of the most important thing in Wikipedia. In most cases they simply consist of an article name, but there may also be namespaces. In Wikipedia there are 9 namespaces: default (none), talk, user, user-talk, wikipedia, wikipedia-talk, image, image-talk and special. The names of namespaces may differ in different languages but the namespaces remain the same. For instance there is no namespace Diskussion, it is only the german name for the talk namespace. Local names are not part of the Wikipedia DTD.
I prefer seperating the talk-property of a namespace.
<!ENTITY % possible-namespaces "(special | user | wikipedia | image)"> <!ENTITY % local-link-model " talk (talk) #IMPLIED namespace %possible-namespaces; #IMPLIED article CDATA #REQUIRED ">
The attributes of the local-link-model parameter entity form a full local link destination.
Metadata elements
The metadata section contains information about title, status, version history and interwiki links. Only the title is obligatory. There may be added elements for copyright information, category-links and other this-is-much-more-dublin-core-than-article-content-stuff.
<!ELEMENT meta (title, status?, interwiki*, history?)>
Title
The title of an article does not change - so it is not part of the article history. Since a title may contain namespaces it is the easisiest to specify the full title as a link to the article itself. The interwiki-attribute may specify the wiki the article comes from (normally a language).
<!ELEMENT title EMPTY> <!ATTLIST title interwiki NMTOKEN #IMPLIED %local-link-model; >
Interwiki links
There are several reasons why interwiki links belong into the metadata section. Interwikilinks are relations of an article (maybe there will be other relation types in the future) in spite of normal links in the article content that may mean several things. Concurrently there may be links to other wikipedias inside the article content. Do not mix this. Interwiki links use the same #Linking model as other links but they must specify a known language.
<!ELEMENT interwiki EMPTY> <!ATTLIST interwiki language NMTOKEN #REQUIRED %local-link-model; >
Article Status
The status element contains status information like whether the article is protected, whether a table of contents should be shown etc. Since the status may change due edits it´s also part of the article history.
<!ELEMENT status EMPTY> <!ATTLIST status protected (protected) #IMPLIED counter CDATA #IMPLIED notoc (notoc) #IMPLIED >
Note: In the actual database the counter (number of times a page has been viewed) is only saved for the current version. This should be changed in the future to get more usage information (for instance to see how often a page has been viewed since the last edit or to detect edit-wars automatically).
Version history
The version history simply contains of a number of edits.
<!ELEMENT history (edit)+>
Each edit contains the edit information (user, timestamp...) and the current status, interwiki-links and content of an article after the edit. The article content is optional - if it is not provided there is just no change or it is just not included because were are not interested in.
<!ELEMENT edit (status?, interwiki*, (text | redirect)?)> <!ATTLIST edit user CDATA #REQUIRED comment CDATA #IMPLIED timestamp CDATA #IMPLIED minor (minor) #IMPLIED >
Redirects
Either an article contains a redirect or text. Redirects are links to articles in the same wiki.
<!ELEMENT redirect EMPTY> <!ATTLIST redirect %local-link-model; >
Wikitax
Since there is no Wikitax2XML-parser yet the article content could also be transfered in Wikitax. Since Wikitax depends on the Wikimedia software and might change a version information should be provided.
<!ELEMENT wikitax (#PCDATA)> <!ATTLIST wikitax version CDATA #REQUIRED >
Wikitext
An XML representation of article content is the core or Wikipedia DTD. You can also use this part alone.
<!ENTITY % wikitext-block "ul | ol | center | blockquote | pbr | hr | h1 | h2 | h3 | h4 | h5 | h6"> <!ENTITY % wikitext-inline-format "b | i | sub | sup | big | small | tt | u | br | nowiki"> <!ENTITY % wikitext-inline-special "math | wikivar | link | reference | url"> <!ENTITY % wikitext-inline "%wikitext-inline-format; | %wikitext-inline-special;">
There are missing some not yet defined elements in the parameter entities.
<!ELEMENT text (#PCDATA | %wikitext-block; | %wikitext-inline;)*>
Block elements
Headings, horzontal line
In contrast to HTML there are no attributes.
<!ELEMENT h1 (#PCDATA | %wikitext-inline;)*> <!ELEMENT h2 (#PCDATA | %wikitext-inline;)*> <!ELEMENT h3 (#PCDATA | %wikitext-inline;)*> <!ELEMENT h4 (#PCDATA | %wikitext-inline;)*> <!ELEMENT h5 (#PCDATA | %wikitext-inline;)*> <!ELEMENT h6 (#PCDATA | %wikitext-inline;)*> <!ELEMENT hr EMPTY>
Indented lines
<!ELEMENT indent (#PCDATA | %wikitext-inline;)*> <!ATTLIST indent depth CDATA '1' >
Lists
To avoid #PCDATA and sublist mixing we define oli=li+ol and uli=li+ul
<!ELEMENT ol (li | ol | uli)+> <!ELEMENT ul (li | oli | uli)+> <!ELEMENT oli (li | ol | uli)+> <!ELEMENT uli (li | oli | uli)+> <!ELEMENT li (#PCDATA | %wikitext-inline;)*>
TODO:
- attributes: "type", "start", "value", "compact",
- definition lists
Tables
TODO
- elements: table, tr, td, th
- attributes: "summary", "width", "border", "frame", "rules", cellspacing", "cellpadding", "valign", "char", charoff", "colgroup", "col", "span", "abbr", "axis", headers", "scope", "rowspan", "colspan"
center, blockquote
<!ELEMENT center (#PCDATA | %wikitext-inline;)*>
<!ELEMENT blockquote (#PCDATA | %wikitext-inline;)*> <!ATTLIST blockquote cite CDATA #IMPLIED >
pre, div
TODO (what is allowed inside? - div as block and inline?)
Paragraph breaks
Wikitax provides a way to seperate paragraphs: just add an empty line. In Wikitext DTD this is represented by the tag <pbr/> (paragraph break). The posibility to create paragraphs with <p> should be abolished because it leads to broken XML and we should reduce the number of allowed HTML-tags.
<!ELEMENT pbr EMPTY>
Inline elements
Wikitext special elements
TODO: nowiki, media
nowiki parts
<!ELEMENT nowiki (#PCDATA)>
Links
See #Linking_model for details.
<!ELEMENT link (#PCDATA | %wikitext-inline-format;)*> <!ATTLIST link interwiki NMTOKEN #IMPLIED %local-link-model; >
Math
The image attribute may provide an image representation
<!ELEMENT math (#PCDATA)> <!ATTLIST math image ENTITY #IMPLIED >
URL
<!ELEMENT url (#PCDATA | %wikitext-inline-format;)*> <!ATTLIST url href CDATA #REQUIRED >
Reference
<!ELEMENT reference EMPTY> <!ATTLIST reference system (email | RFC | ISBN) #REQUIRED value CDATA #IMPLIED >
Images and other media files
<!ELEMENT media EMPTY> <!ATTLIST media name CDATA #REQUIRED data ENTITY #IMPLIED >
bold/italic
strong and em will never be used in the right way so use b and i instead. There are no attributes allowed.
<!ELEMENT b (#PCDATA | i | big | small | sub | sup | tt | u | br | %wikitext-inline-special;)*> <!ELEMENT i (#PCDATA | b | big | small | sub | sup | tt | u | br | %wikitext-inline-special;)*>
Several HTML tags
Several HTML-tags are also allowed in Wikitext DTD, but most of them are simplified in some way (for instace no or less attributes). These tags are not HTML - they are like the same HTML-tags, not equal!
<!ELEMENT tt (#PCDATA | b | i | big | small | sub | sup | u | br | %wikitext-inline-special;)*> <!ELEMENT u (#PCDATA | b | i | big | small | sub | sup | tt | br | %wikitext-inline-special;)*> <!ELEMENT sub (#PCDATA | %wikitext-inline;)*> <!ELEMENT sup (#PCDATA | %wikitext-inline;)*> <!ELEMENT big (#PCDATA | %wikitext-inline;)*> <!ELEMENT small (#PCDATA | %wikitext-inline;)*> <!ELEMENT br EMPTY>
TODO: ruby-tags
Variables
Some dynamic variables can be used in Wikitax as Template:VARNAME.
<!ELEMENT wikivar EMPTY> <!ATTLIST wikivar name (CURRENTMONTH | CURRENTMONTHNAME | CURRENTDAY | CURRENTDAYNAME | CURRENTYEAR | CURRENTTIME | NUMBEROFARTICLES) #REQUIRED >
Open questions
- div (yes), font (no), var (why)
What to do with / where to allow universal HTML attributes (id, class, name, style)?
- remove id and name, allow class and style at some elements