API: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Line 448: Line 448:
''See [[/Wikimania 2006 API discussion]].''
''See [[/Wikimania 2006 API discussion]].''


== Usefull Links ==
== Useful Links ==
* [[Proposed Database Schema Changes]]
* [[Proposed Database Schema Changes]]
* [[Help:Database layout|Database layout]]
* [[Help:Database layout|Database layout]]

Revision as of 16:32, 20 October 2006

Attention visitors

This page discusses the future API for MediaWiki software.

MediaWiki at present has four interfaces:

  • MediaWiki API - the new, partially implemented API described on this page.
  • Query API - older API for retrieving data (will be obsolete upon new API completion).
  • Special:Export feature (bulk export of xml formatted data)
  • Regular Web-based interface
 
This page should be moved to MediaWiki.org.
Please do not move the page by hand. It will be imported by a MediaWiki.org administrator with the full edit history. In the meantime, you may continue to edit the page as normal.

The goal of this API is to provide a direct high-level access to the data contained in the MediaWiki databases. The client program must be able to login, get data, and post changes. The API must support thin web-based JavaScript clients, such as Popup (link needed), applications running on user's machine (vandal fighter), or be accessed by another web site (tool server's utilities)

All output will be available in a structured tree format such as XML, JSON, YAML, WDDX, or PHP serialized. A strongly typed RSS or WSDL-style format might also be implemented using wrappers.

Each API module uses a set of parameters. To prevent name collision, each module has a two letter abbreviation, and each parameter name begin with those two letters. For example, the action=login has prefix lg for all of its parameters -- lgname and lgpassword.

Using API internally by other code (done)

Sometimes other parts of the code may wish to use the data access and aggregation functionality of the API. Here are the steps needed to accomplish such usage:

1) Prepare request parameters using FauxRequest class. All parameters are the same as if making the request over the web.

$params = new FauxRequest(array (
	'action' => 'query',
	'list' => 'allpages',
	'apnamespace' => 0,
	'aplimit' => 10,
	'apprefix' => $search
));

2) Create and execute ApiMain instance. Because the parameter is an instance of a FauxRequest object, ApiMain will not execute any formatting printers, nor will it handle any errors. A parameter error or any other internal error will cause an exception that may be caught in the calling code.

$module = new ApiMain($params);
$module->execute();

3) Get the resulting data array.

$data = & $module->GetResultData();

Login / lg (done)

Login gets several tokens that are needed by the server to recognize logged-in user. In every call to api.php, the three values must either be passed as additional parameters, or as cookies within the request header. If any of the login values are given as part of the request, all cookie values are ignored. Please note that user name is passed in as lgname, but returned as normalized lgusername. The first is used for authentication, whereas the second may be passed together with lgtoken and lguserid as tokens when making calls to other modules.

Note: In this and other examples, all parameters are passed in a GET request just for the sake of simplicity. In your application, make sure all large and/or security sensitive parameters are given as part of the POST request.

Request:
  api.php ? action=login & lgname=Yurik & lgpassword=12345 [& lgdomain=wikipedia.org]
Result:
  api:
    login:
      result: Success         Other values: NoName, Illegal, WrongPluginPass,
                                            NotExists, WrongPass, EmptyPass
      lgtoken: 123ABC         Also returned as a cookie (i.e. enwikiToken)
      lgusername: Yurik       Normalized lgname, 
                              also returned as a cookie (i.e. enwikiUserName)
      lguserid: 12345         Also returned as a cookie (i.e. enwikiUserID)

To use the above values, pass them without alteration to any api.php call in addition to other parameters. Here, a rollback token is acquired for the Main Page (restricted operation):

api.php ? action=query & lgtoken=123ABC & lgusername=Yurik & lguserid=23456 
                       & prop=info & intokens=rollback & titles=Main Page
Example
http://en.wikipedia.org/w/api.php?action=login&lgname=user&lgpassword=password

OpenSearch support (done)

This module allows web browsers (Firefox 2.0 at this time) an auto-suggest functionality in the search box. The module needs to be extremelly fast, and provide a simple JSON-formatted output in the form of

["search", ["suggestion1", "suggestion2", ...]]

Since the server might be hit on every user keystroke, the potential server load might be so heavy as to move this feature to separate server(s).

WatchList RSS/ATOM feeds (done)

This module returns watchlist data in a feed format. The potential performance impact is still being evaluated.

Query - General

Overview

Query API module allows applications to get needed pieces of data from the MediaWiki databases, and is loosely based on the Query API interface currently available on all MediaWiki servers. All data modifications will first have to use query to acquire a token to prevent abuse from malicious sites.

Title Normalization (done)

Converts improper page titles to their proper form. Capitalizes first character, replaces '_' with ' ', changes canonical namespace names to their localized alternatives, etc.
Request: Note: articleA's first letter is not capitalized
  api.php ? action=query & titles=Project:articleA|ArticleB
Result:
  api:
    query:
      pages:
        Wikipedia:ArticleA:            Project: is converted to Wikipedia: when running on en-wiki.
          ns: 4                        Show title's namespace except when ns=0
        ArticleB:
      normalized:                      Any requested titles not in the "proper" form will be here
        Project:articleA: Wikipedia:ArticleA
Example
http://en.wikipedia.org/w/api.php?action=query&titles=Project:articleA%7CArticleB

Redirects (done)

Redirects can be resolved by the server, so that the target of redirect is returned instead of the given title. This example is not very useful without additional prop=... element, but shows the usage of redirect function. The 'redirects' section will contain the target of redirect and non-zero namespace code. Both normalization and redirection may take place. In case of redirect to a redirect, all redirections will be solved, and in case of a circular redirection, there might not be a page in the 'pages' section.
Request:
  api.php ? action=query & titles=Main page & redirects
Result:
  api:
    query:
      pages:
        Main Page:
      redirects:
        Main page: Main Page
Same request without the "redirects" parameter would treat "Main page" as a regular page, so revisions and other information may be obtained. In order to see that it is a redirect, the basic page info must be requested using prop=info.
Request:
  api.php ? action=query & titles=Main page & prop=info
Result:
  api:
    query:
      pages:
        Main page:
          id: 12342
          redirect:
Example
http://en.wikipedia.org/w/api.php?action=query&titles=Main%20page&redirects
http://en.wikipedia.org/w/api.php?action=query&titles=Main%20page

Circular Redirects (done)

Assume Page1 → Page2 → Page3 → Page1 (circular redirect). Also, in this example a non-normalized name 'page1' is used.
Request:
  api.php ? action=query & titles=page1 & redirects
Result:
  api:
    query:
      redirects:
        Page1: Page2      Redirects are present, but not the 'pages' element.
        Page2: Page3
        Page3: Page1
      normalized:
        page1: Page1

Limits

To prevent server overloads, each query imposes a limit on how many items it can process. Anonymous and logged-in users have one limit, while bots have a considerably higher limit as they are trusted by the community. At present, each query simply lists the maximum request size it allows. For example, allpages list will allow aplimit= to be set no higher than 500, or in case of a bot - no higher than 5000.
Drawbacks: Currently all limits are additive, so if the user requests allpages and backlinks, the user will get 500 of each. This is not very good, as the more items are compounded into one request, the heavier the load on the server will be. Instead, some sort of a weighted mechanism should be developed, where each request item has a certain "cost" associated with it, and each user is allocated a fixed allowance per request. The more information user requests, the less the limit becomes for that request. Unfortunately, that makes it very hard to figure out the maximum limits before executing the query, so might not be a workable solution.

Query - Meta-Information

Meta queries allow clients to retrieve the data about the MediaWiki settings itself.

To get meta information, clients will use meta= parameter:

api.php ? action=query & meta=siteinfo|userinfo & ...

siteinfo / si (done)

Returns overall site information.
Parameters: siprop=namespaces|general
Example
http://en.wikipedia.org/w/api.php?action=query&meta=siteinfo

userinfo / ui

Returns information about the current user.
Parameters: uiprop=isblocked|hasmsg|rights|groups, uioptions=<opt name>|...
Example
http://en.wikipedia.org/w/api.php?action=query&meta=userinfo

Query - Page Information

Page information items are used to get various data about a list of pages specified with either the titles=, pageids=, or revids= parameters, or by using #generators. Content, links, interwiki links, and other information may be obtained.

info / in (done except tokens)

Gets the basic page information such as pageid, last revid, redirect, last touched, etc. Limit: 500/5000.
Parameters: intokens=edit|rollback|delete|protect|move
Issues: Should there be tokens for rollback/delete/protect/move be available in this way, as oppose to having an action= for each task? There is a potential for abuse, as someone might have a link on their website to wiki, and that link would contain a "delete" action. If a logged in admin clicks on that link, the api will recognize them because of their cookie, and will allow the deletion.
Request: 
  api.php ? action=query & prop=info & titles=TitleA
Result:
  api:
    query:
      pages:
        TitleA:
          id: 12341
          lastrev: 23456
          touched: 20060908025739

categories / cl

Gets a list of all templates used on the provided pages. Limit: 200/1000.
Parameters: clprop=sortkey|timestamp
Request: 
  api.php ? action=query & prop=categories & titles=TitleA
Result:
  api:
    query:
      pages:
        TitleA:
          categories:
            Category:Cat1:
            Category:Cat2:

Content (done)

Requesting content should be done by requesting the last revision with content property.
api.php ? action=query & prop=revisions & rvprop=content & titles=ArticleA|ArticleB

imageinfo / ii

Gets image information for any titles in the image namespace (#6).
Parameters: iiprop=url|history|comment|stats|user|timestamp, iisource=local/shared/all (dflt=local)
url - path to the image, history - include every old image versions, stats - image size/type, user - uploader, iisource - look at the local or shared (commons) image repository, or both.
Example: Get comments for all image uploads, both local and in the commons repository. Here, ImageA was uploaded 3 times to the local wiki, and 2 times to the shared (commons) repository.
Request: 
  api.php ? action=query & prop=imageinfo & titles=Image:ImageA & iiprop=comment|history & iisource=all
Result:
  api:
    query:
      pages:
        Image:ImageA:
          ns:6
          imageinfo:
            local:
              comment: last update comment
            localhistory:
              -                                        history is an unordered list of items
                comment: some update
              -
                comment: another update
            shared:
              comment: last update on commons
            sharedhistory:
              -
                comment: some update on commons

langlinks / ll

Gets a list of all language links (interwikies) from the provided pages to other languages. Limit: 200/1000.

links / pl

Gets a list of all links from the provided pages. Limit: 200/1000.
Parameters: plnamespace (flt).

templates / tl

Gets a list of all templates used on the provided pages. Limit: 200/1000.

imagelinks / il

In Query API interface, this command found pages that embedded the given image. It has been renamed to imgembeddedin.

Gets a list of all images used on the provided pages. Limit: 200/1000.

Query - Revisions (done)

Returns revisions for a given article based on the selection criteria. Revisions may be used with multiple titles only when working with the latest revision. When using rvlimit, rvdir=newer, rvstart, or rvend parameters, titles= must have only one title listed. By default, revisions shows only the id of the last revision.

Request: 
  api.php ? action=query & prop=revisions & titles=ArticleA & rvprop=timestamp|user|comment|content
Result:
  api:
    query:
      pages:
        ArticleA:
          id: 12345
          lastrev: 67890
          revisions:
            67890:
              timestamp: 20060908025739
              user: UserX
              comment: ...change comment...
              content: ...raw revision content...
Additional 'revisions' samples
Get the timestamps of up to 10 revisions, beginning at 2006-09-01 and moving forward in time.
  api.php ? action=query & prop=revisions & titles=ArticleA 
                         & rvprop=timestamp & rvlimit=10 & rvdir=newer & rvstart=20060901000000
Get the timestamps of all revisions for the entire month of September 2006. rvlimit is optional. If the number of revisions exceeds the limit, the 'revisions' element will contain 'continue':'rvstart=20060920122343' with the timestamp to continue from.
  api.php ? action=query & prop=revisions & titles=ArticleA 
                         & rvprop=timestamp & rvstart=20060901000000 & rvend=20061001000000
Get the timestamps of up to 10 revisions, beginning at 12345 and moving back in time. If more than 10 revisions are available, 'revisions' element will contain 'continue':'revids=23512' , where revid is the next revision id in order.
  api.php ? action=query & prop=revisions & revids=12345 
                         & rvprop=timestamp & rvlimit=10 & rvdir=older
Get the timestamps of all revisions between two given revision IDs. rvlimit is optional. If the number of revisions exceeds the limit, the 'revisions' element will contain 'continue':'rvstartid=23512' with the revid to continue from. Both rvstartid & rvendid must belong to the same title. The titles= parameter is not required, but if given, it must be set to the same title as revision IDs.
  api.php ? action=query & prop=revisions & rvprop=timestamp & rvstartid=12345 & rvendid=67890

Examples

Get data with content for the last revision of titles "API" and "Main Page"
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API%7CMain%20Page&rvprop=timestamp%7Cuser%7Ccomment%7Ccontent
Get last 5 revisions of the "Main Page"
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main%20Page&rvlimit=5&rvprop=timestamp%7Cuser%7Ccomment
Get first 5 revisions of the "Main Page"
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main%20Page&rvlimit=5&rvprop=timestamp%7Cuser%7Ccomment&rvdir=newer
Get first 5 revisions of the "Main Page" made after 2006-05-01
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main%20Page&rvlimit=5&rvprop=timestamp%7Cuser%7Ccomment&rvdir=newer&rvstart=20060501000000

Query - Lists

Lists differ from other properties in two aspects - instead of appending data to the elements under 'pages' element, each list has its own separated branch under 'query' element. Also, list output is limited by number of items, and may be continued using "paging" technique. Even when no limit is provided, the query will only return a set number of items, and will also provide a string point from which to continue paging. See allpages list for an example.

allpages / ap (done)

Returns a list of pages in a given namespace starting at from, ordered by page title.
Parameters: apfrom (paging), apnamespace (dflt=0), apredirect (flt), aplimit (dflt=10, max=500/5000)
Example: Request a list of 3 pages from namespace 10 (templates) beginning at the first available page.
Request: 
  api.php ? action=query & list=allpages & apnamespace=10 & aplimit=3
Result:
  api:
    query:
      allpages:
        Template:A-Article:
          id: 12341
          ns: 10
        Template:B-Article:
          id: 12342
          ns: 10
        Template:C-Article:
          id: 12343
          ns: 10
    query-status:
      allpages:
        continue: apfrom=D-Article    The next item in this list would have been Template:D-Article.
The client may now make another request using the continue value as a parameter:
  api.php ? action=query & list=allpages & apnamespace=10 & aplimit=3 & apfrom=D-Article

backlinks / bl

Lists pages that link to the given page. Ordered by linking page title.
Parameters: bltitle, blfrom (paging), blnamespace (flt), blredirect (flt), bllimit (dflt=10, max=500/5000)
  api.php ? action=query & list=backlinks & bltitle=ArticleA

categorymembers / cm

List of pages that belong to a given category, ordered by page title.
Parameters: cmtitle (if title is in NS 0, treats it as category NS), cmfrom (paging), cmnamespace (flt), cmlimit (dflt=10, max=500/5000)
  api.php ? action=query & list=categorymembers & cmtitle=category:title

embeddedin / ei

What pages include template:title page as a template. List of pages that include the given page using {{title}}. Ordered by including page title.
Parameters: eititle, eifrom (paging), einamespace (flt), eiredirect (flt), eilimit (dflt=10, max=500/5000)
  api.php ? action=query & list=embeddedin & eititle=template:title

imgembeddedin / ie

This was renamed from imagelinks to avoid the confusion. imagelinks will now be used to get all images used on a given page.

List of pages that include a given image. Ordered by page title.
Parameters: ietitle (if image title is in NS 0, treats it as an image NS), iefrom (paging), ienamespace (flt), ielimit (dflt=10, max=500/5000)
  api.php ? action=query & list=imgembeddedin & ietitle=image:title

logevents / le (semi-complete)

List log events, filtered by time range, event type, user type, or the page it applies to. Ordered by event timestamp.
Parameters: letype (flt), lefrom (paging timestamp), leto (flt), ledirection (dflt=older), leuser (flt), letitle (flt), lelimit (dflt=10, max=500/5000)
  api.php ? action=query & list=logevents      - List last 10 events of any type

recentchanges / rc

Gets a list of pages recently changed, ordered by modification timestamp.
Parameters: rcfrom (paging timestamp), rcto (flt), rcnamespace (flt), rcminor (flt), rcusertype (dflt=not|bot), rcdirection (dflt=older), rclimit (dflt=10, max=500/5000)
  api.php ? action=query & list=recentchanges  - List last 10 changes

usercontribs / uc

Gets a list of pages modified by a given user, ordered by modification time.
Parameters: ucuser, ucfrom (paging timestamp), ucto (flt), ucnamespace (flt), ucminor (flt), uctop (flt), ucdirection (dflt=older), uclimit (dflt=10, max=500/5000)
  api.php ? action=query & list=usercontribs & ucuser=User:UserA   - List last 10 changes made by userA

users / us

Gets a list of registered users, ordered by user name.
Parameters: usfrom (paging), uslimit (dflt=10, max=500/5000)

watchlist / wl (done)

Get a list of pages on the user's watchlist but only if they were changed within the given time period. Ordered by time of the last change of the watched page.
Parameters: wlfrom (paging timestamp), wlto (flt), wlnamespace (flt), wldirection (dflt=older), wllimit (dflt=10, max=500/5000)

Query - Generators (done)

Generator is way to use one of the above #lists instead of the titles= parameter. The output of the list must be a list of pages, whose titles get automatically used instead of the titles=/revids=/pageids= parameters. Other queries such as content, revisions, etc, will treat those pages as if they were provided by the user in the titles= parameter. Only one generator is allowed, and while it is possible to have both generator= and list= parameters in the same call, they may not contain the same values.

Using allpages as generator

Use the allpages list as a generator, to get the links and categories for all titles returned by allpages.

Request: 
  api.php ? action=query & generator=allpages & apnamespace=3 & aplimit=10 & apfrom=A & prop=links|categories
Result:
  api:
    query:
      pages:
        Template:A-Article:
          id: 12341
          ns: 10
          links:
            Linked Article1:            Linked Article1 is in the main namespace
            Talk:Linked Article2:       For non-main ns, list it as a sub-element
              ns: 1
            ...
          categories:
            Category:Cat1:
            Category:Cat2:
            ...
        Template:B-Article:
          ...
        Template:C-Article:
          ...
    query-status:
      allpages:
        continue: apfrom=D-Article      The next item in this list would have been Template:D-Article.


Generators and redirects

Here, we use "links" page property as a generator. This query will get all the links from all the pages that are linked from Title. For this example, assume that Title has links to TitleA and TitleB. TitleB is a redirect to TitleC. TitleA links to TitleA1, TitleA2, TitleA3; and TitleC links to TitleC1 & TitleC2. Redirect is solved because of the "redirects" parameter.

The query will execute the following steps:
  1. Resolve titles parameter for redirects
  2. For all pages specified in titles=...|... parameter, get all links, and substitute original with the new titles=...|... parameter.
  3. Resolve new titles list for redirects
  4. Execute regular prop=links query using the internally created list of titles.
Request: 
  api.php ? action=query & generator=links & titles=Title & prop=links & redirects
Result:
  api:
    query:
      pages:
        TitleA:
          links:
            TitleA1:
            TitleA2:
            TitleA3:
        TitleC:
          links:
            TitleC1:
            TitleC2:
      redirects:
        TitleB: TitleC

Examples

Show info about 4 pages starting at the letter "T"
http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=4&gapfrom=T&prop=info
Show content of first 2 non-redirect pages begining at "Re"
http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content

Posting Data / needs major editPage.php rewrite

Need Help

At present, user interface code is tightly woven with the database access code, making it unusable for the API. These two must be separated from one another – we need a clean data access layer without any UI logic. If you want to contribute with rewriting EditPage.php, and if you know PHP and MediaWiki, or you think you can learn it, please give us a hand at making this possible. --Yurik 15:07, 17 October 2006 (UTC)[reply]

action=submit allows data to be posted back to the MediaWiki servers. For this to work, the client must first obtain an edittoken by using prop=info & intokens=edit query call. Both the lastrev and the token have to be sent to the server, together with the title of the page, its content, and the summary comment. disablemerge parameter stops the save operation in case the article has been modified after the query call. testrun parameter attempts the save operation by merging the content with the newer changes (if needed), and returning how the page would look like if it was saved, but without actually changing any data.

Note: The parameters should be modified to allow for the controlled merge. For example: rev #1 is received, an attempt is made to save changes to it, but rev #2 has been created in the meantime. The client decides to allow merge with rev #2, but while the decision is made, the rev #3 has been published. The client should have the option to only allow merging with rev #2 which was verified, not with rev#3 that it has not yet seen.

Request:
  api.php ? action=submit & title=Project:articleA & edittoken=abc123 & revid=12345
                          & summary=edit_comment & content=wikitext
                          [& minorEdit] [& disablemerge] [& testrun]
Result:
  api:
    save:
      status: Success             Other values: 'Prohibited', 'Conflict', 'DbLcoked', 'BadToken', 'MergeRequired'
                                  (for the testrun: 'CanMerge', 'CanSaveAsIs' 
      title: Wikipedia:ArticleA   Always returns normalized title
      ns: 4                       Show title's namespace except when ns=0
      id: 12345                   On success, the ID of the page
      revid: 67891                On success, the new latest revision id
      redirect:                   On success, when saved page is now treated as a redirect
      content: wiki content       When used with testrun, this field will be set to the merge result

Moving/Renaming Pages

Request
api.php ? action=move & mvfrom=OldTitle & mvto=NewTitle & mvtoken=123ABC [& mvoverride]


Implementation Strategy

See /Implementation Strategy.

Wikimania 2006 API discussion

See /Wikimania 2006 API discussion.

Useful Links