Wikipedia:AutoWikiBrowser/Typos/Guide: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 22: Line 22:


===Writing typo rules===
===Writing typo rules===
*Don't use the quantifiers <code>*</code> and <code>+</code> with anything but a single character. Avoid them entirely, if possible. Quantifiers put extra strain on CPU and are prone to do other than what you expect
*Don't use the quantifiers <code>*</code> and <code>+</code> with anything but a single character. Avoid them entirely, if possible, as they put extra strain on CPU and are apt to do other than what you expect.
*Don't expect rules to be applied in the order they appear, or in any particular order at all.
*Don't expect rules to be applied in the order they appear.
*Each rule must be completely independent.
*Each rule must be completely independent.



Revision as of 07:03, 20 March 2010

These are the typo regular expressions for RegExTypoFix. Development has been open to the public since 2006.

Please add to or improve these regular expressions!

Description

These are regular expressions that catch and fix common misspellings. The syntax of the expressions is described in full on the MSDN website, though for the purposes of this page the Well House summary is probably easier to use.

Although this project was started with the aim of 100% accuracy, the less accurate but more inclusive list we have now is better.

It is the responsibility of everyone using RegExTypoFix, or any semi-automated tool, to use it responsibly. Check every edit before you make it. If in doubt, skip.

This typo list is also used by the in-browser editor and Wikipedia gadget wikEd.

Adding/changing a misspelling

  • If you don't know how to make a change, suggest it here, where a knowledgeable user will add it for you.
  • Avoid having a rule detect a correct spelling ("false positive").
  • Aim to have a single rule for each root word, prefix, and suffix.
  • Keep in mind that every addition/possibility of a word uses more CPU and slows scanning.
  • Update the rule name if you change something that affects it.
  • In editing a rule, edit only the smallest appropriate part of this page, rather than the whole page, which taxes CPU and bandwidth.
  • Note that only words outside wikimarkup are fixed, so a rule to fix, say, a wiki template will not work.

Writing typo rules

  • Don't use the quantifiers * and + with anything but a single character. Avoid them entirely, if possible, as they put extra strain on CPU and are apt to do other than what you expect.
  • Don't expect rules to be applied in the order they appear.
  • Each rule must be completely independent.

Testing typo rules

  • With the AWB Regular Expression tester or something similar before adding here.
  • With AWB or WikiEd immediately after you add them. If they don't work, remove first, analyze later.

To do

  • Remove duplicates.
  • Expand rules to accept more suffixes (e.g., "-ing", "-ed", "-able") and prefixes.
    • Note that some regular expressions purposely correct only certain versions of a word to avoid false positives. These should be marked with an underscore character "_" at the beginning or end of the word= field.
  • Remove rare words.
  • Keep lists sorted alphabetically by root word; e.g., put "(Un)Equal" just before "(In)Equality" among the "E" words. Don't sort by, say, ASCII character value.
  • Ignore words surrounded by "." as in www.harvard.edu.

Typo list

All changes to this list are live. AWB loads directly from this list whenever someone invokes the RETF option.