Wikipedia:AutoWikiBrowser/Typos/Guide: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m →‎Writing typo rules: Don't create a rule where the regex does not match the value of the regex match, since AWB can't show such replacements in the edit summary.
Line 25: Line 25:
*Don't expect rules to be applied in the order they appear.
*Don't expect rules to be applied in the order they appear.
*Each rule must be completely independent.
*Each rule must be completely independent.
*Don't create a rule where the regex does not match the value of the regex match, since AWB can't show such replacements in the edit summary. (For example, don't use a lookahead at the end of a regex.)


===Testing typo rules===
===Testing typo rules===

Revision as of 13:19, 28 June 2010

These are the typo regular expressions for RegExTypoFix. Development has been open to the public since 2006.

Please add to or improve these regular expressions!

Description

These are regular expressions that catch and fix common misspellings. The syntax of the expressions is described in full on the MSDN website, though for the purposes of this page the Well House summary is probably easier to use.

Although this project was started with the aim of 100% accuracy, the less accurate but more inclusive list we have now is better.

It is the responsibility of everyone using RegExTypoFix, or any semi-automated tool, to use it responsibly. Check every edit before you make it. If in doubt, skip.

This typo list is also used by the in-browser editor and Wikipedia gadget wikEd.

Adding/changing a misspelling

  • If you don't know how to make a change, suggest it here, where a knowledgeable user will add it for you.
  • Avoid having a rule detect a correct spelling ("false positive").
  • Aim to have a single rule for each root word, prefix, and suffix.
  • Keep in mind that every addition/possibility of a word uses more CPU and slows scanning.
  • Update the rule name if you change something that affects it.
  • In editing a rule, edit only the smallest appropriate part of this page, rather than the whole page, which taxes CPU and bandwidth.
  • Note that only words outside wikimarkup are fixed, so a rule to fix, say, a wiki template will not work.

Writing typo rules

  • Don't use the quantifiers * and + with anything but a single character. Avoid them entirely, if possible, as they put extra strain on CPU and are apt to do other than what you expect.
  • Don't expect rules to be applied in the order they appear.
  • Each rule must be completely independent.
  • Don't create a rule where the regex does not match the value of the regex match, since AWB can't show such replacements in the edit summary. (For example, don't use a lookahead at the end of a regex.)

Testing typo rules

  • With the AWB Regular Expression tester or something similar before adding here.
  • With AWB or WikiEd immediately after you add them. If they don't work, remove first, analyze later.

To do

  • Remove duplicates.
  • Expand rules to accept more suffixes (e.g., "-ing", "-ed", "-able") and prefixes.
    • Note that some regular expressions purposely correct only certain versions of a word to avoid false positives. These should be marked with an underscore character "_" at the beginning or end of the word= field.
  • Remove rare words.
  • Keep lists sorted alphabetically by root word; e.g., put "(Un)Equal" just before "(In)Equality" among the "E" words. Don't sort by, say, ASCII character value.
  • Ignore words surrounded by "." as in www.harvard.edu.

Typo list

All changes to this list are live. AWB loads directly from this list whenever someone invokes the RETF option.