Jump to content

Pywikibot/replace.py

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by John Vandenberg (talk | contribs) at 09:07, 24 September 2007 (→‎Examples: "append" example needs to be a multi-line match). It may differ significantly from the current version.

Replace.py is part of the Pywikipedia bot framework.

This bot will make direct text replacements. It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.

Parameters

Local

You can run replace.py with the following parameters (for example, python replace.py -file:articles_list.txt "errror" "error").

Source
-xml Retrieve information from a local XML dump (pages_current, see http://download.wikimedia.org). Argument can also be given as "-xml:filename".
-file Work on all pages given in a local text file. Will read any wiki link and use these articles. Argument can also be given as "-file:filename".
-cat Work on all pages which are in a specific category. Argument can also be given as "-cat:categoryname".
-page Only edit a specific page. Argument can also be given as "-page:pagetitle". You can give this parameter multiple times to edit multiple pages.
-ref Work on all pages that link to a certain page. Argument can also be given as "-ref:referredpagetitle".
-filelinks Works on all pages that link to a certain image. Argument can also be given as "-filelinks:ImageName".
-links Work on all pages that are linked to from a certain page. Argument can also be given as "-links:linkingpagetitle".
-start Work on all pages in the wiki, starting at a given page. Choose "-start:!" to start at the beginning. Note: You are advised to use -xml instead of this option; this is meant for cases where there is no recent XML dump.
Replace parameters
-except:XYZ Ignore pages which contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
-summary:XYZ Set the summary message text, bypassing the default edit summaries.
-fix:XYZ Perform one of the predefined replacements tasks, which are given in the dictionary 'fixes' defined inside the file fixes.py. The -regex argument and given replacements will be ignored if you use -fix. Currently available predefined fixes are:
  • HTML - convert HTML tags to wiki syntax, and fix XHTML.
  • syntax - try to fix bad wiki markup.
  • case-de - fix case errors in German.
  • grammar-de - fix grammar and typography in German.
-namespace:n Number of namespace to process. The parameter can be used multiple times. It works in combination with all other parameters except for the -start parameter. (If you want to change all pages in a particular namespace, add the namespace prefix; for example, -start:User:!.)
unnamed First unnamed argument is the old text, second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \\1 or \g<name>.
Options
-always Don't prompt you for each replacement.
-recursive Recurse replacement until possible.
-nocase Use case insensitive regular expressions.
-allowoverlap When occurrences of the pattern overlap, replace all of them. Warning! Don't use this option if you don't know what you're doing, because it might easily lead to infinite loops then.
-regex Make replacements using regular expressions. If this argument isn't given, the bot will make simple text replacements.

Examples

If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (page table) from http://download.wikimedia.org, then use this command:

   python replace.py -xml -regex "{{msg:(.*?)}}" "{{\1}}"

Note that the you can match patterns across more than one line:

   python replace.py -regex -start:! "First line\nSecond line" ""

Replace.py can be used to insert or append text to a page (note the replacement text has an embedded new line):

   python replace.py -regex '(?ms)^(.*)$' "\1
    > [[Category:NewCat]]"

If you have a dump called foobar.xml and want to fix typos, e.g. Errror -> Error, use this:

   python replace.py -xml:foobar.xml "Errror" "Error"

If you have a page called 'John Doe' and want to convert HTML tags to wiki syntax, use:

   python replace.py -page:John_Doe -fix:HTML

If you run the bot without arguments you will be prompted multiple times for replacements:

   python replace.py -file:blah.txt

The script asks the user before modifying an article. It is recommended to double-check the result to be sure that the bot did not introduce errors (especially with misspelled words). It is possible to specify a set of articles with an external text file containing Wiki links :

 [[plane]]
 [[vehicle]]
 [[train]]
 [[car]]

The bot is then called using something like :

 python replace.py [global-arguments] -file:articles_list.txt "errror" "error"