Help:Special characters: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
→‎Greek letters and math symbols: fixed garbled sentences in last paragraph
MaxEnt (talk | contribs)
remove duplicate instance of degree symbol from math characters
 
(247 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{H:h|editor toc}}
{{H:h|editor toc}}


From MediaWiki 1.5, all projects use '''[[w:Unicode|Unicode]] ([[w:UTF-8|UTF-8]])''' [[w:character encoding|character encoding]]. Many characters, including [[w:CJK|CJK]] characters, can be in the wikitext itself. They use a variable number of bytes per character.
==Systems for character encoding==
From MediaWiki 1.5, each project uses '''[[Unicode]] ([[UTF-8]])''' [[w:character encoding|character encoding]].


==Important special characters==
Until the end of June 2005, when this new version is becoming operational in Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used [[w:ISO 8859-1|ISO-8859-1]] (also called Latin-1). During some time after that, pages are converted on the fly when loaded for the first time.
Umlauts and accents:
À Á Â Ã Ä Å
Æ Ç È É Ê Ë
Ì Í Î Ï Ñ Ò
Ó Ô Œ Õ Ö Ø Ù
Ú Û Ü ß à á
â ã ä å æ ç
è é ê ë ì í
î ï ñ ò ó ô
œ õ ö ø ù ú
û ü ÿ


Punctuation:
*'''Unicode (UTF-8)'''
¿ ¡ « » § ¶
**a variable number of bytes per character
† ‡ • - – —
**special characters, including [[w:CJK|CJK]] characters, can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.
*'''ISO 8859-1'''
**one byte per character
**special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character € the '''named character reference''' € and the '''decimal character reference''' € and the '''hexadecimal character reference''' €. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and [[Help:Automatic conversion of wikitext|automatically converted]] to their decimal codes on Preview or Save.
**the most common special characters, such as é, are in the character set, so code like é, although allowed, is not needed.


Commercial symbols:
Note that Special:Export exports using UTF-8 even if the database is encoded in ISO 8859-1, at least that was the case for the English Wikipedia, already when it used version 1.4.
™ © ® ¢ € ¥ £ ¤


Greek characters:
To find out which character set applies in a project, use the browser's "View Source" feature and look for such as this:
α β γ δ ε ζ
η θ ι κ λ μ ν
ξ ο π ρ σ ς
τ υ φ χ ψ ω
Γ Δ Θ Λ Ξ Π
Σ Φ Ψ Ω


Math characters:
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />
∫ ∑ ∏ √ − ± ∞
≈ ∝ ≡ ≠ ≤ ≥
× · ÷ ∂ ′ ″
∇ ‰ ° ∴ ø
∈ ∩ ∪ ⊂ ⊃ ⊆ ⊇
¬ ∧ ∨ ∃ ∀ ⇒ ⇔
→ ↔ ↑ ℵ ∉


:For more, see [[w:Table of mathematical symbols]].
or


Subscripts and superscripts as special characters (here shown with x):
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
x₀ x₁ x₂ x₃ x₄
x₅ x₆ x₇ x₈ x₉
x⁰ x¹ x² x³ x⁴
x⁵ x⁶ x⁷ x⁸ x⁹


:Compare, as alternative and for other sub- and superscripts:
===Esperanto===
:*{{xpdplain|x<|sub>k<|/sub>|d=}}
Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.
:*{{xpdplain|x<|sup>k<|/sup>|d=}}
:*{{xpdop3c|#tag:math|x_k|d=}}
:*{{xpdop3c|#tag:math|x^k|d=}}


==Editing==
The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (A, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.


Ways to enter a non-ASCII character into the wikitext:
<table border=1>
<tr><td>in edit box<td>in database and output</tr>
<tr><td>S<td>S</tr>
<tr><td>Sx<td>Ŝ</tr>
<tr><td>Sxx<td>Sx</tr>
<tr><td>Sxxx<td>Ŝx</tr>
<tr><td>Sxxxx<td>Sxx</tr>
<tr><td>Sxxxxx<td>Ŝxx</tr>
</table>


* Use a link to a special character listed under the edit box to insert that character. Wikis need [[mw:Extension:CharInsert|Extension:CharInsert]] for this. Which characters are displayed depends on the wiki, and on user preference settings; sometimes lists are collapsible, or there is a menu to select a list.
* Copy the character from some list on a webpage, like that above, or from a locally stored page. The character should not be an image or part of an image, hence for example not an image produced by the [[Help:Displaying a formula|TeX feature]] of the wiki. Thus one can copy for example from the characters in the first column of [[w:Table of mathematical symbols]].
* Use a special keyboard function (or enter the character directly from a foreign keyboard).
* Use a special browser function.
* Use an [[Help:HTML in wikitext|HTML]] named character entity reference like <code>&amp;agrave;</code> or HTML numeric character reference like <code>&amp;#161;</code>, and copy the character from [[Help:preview|preview]]. In the past the code itself had to be stored in the wikitext. Such codes may still be present on some pages. Results of the internal search function may be affected by this. On the other hand, this search function cannot find some characters, including "→", while if it is coded as "&amp;rarr;", it can be found by searching for "rarr". See also [[Help:Searching]].


===Esperanto===
For example, the interlanguage link <nowiki>[[en:Luxury car]]</nowiki> to
<table class="wikitable" style="float: right; margin-left: .5em;">
[[en:Luxury car]] has to be entered in the edit box as <nowiki>[[en:Luxxury car]]</nowiki> on [[eo:]]. This has caused problems with interwiki update bots in the past.
<tr><td>in edit box<td>in database and output</tr>

<tr><td>S<td>S</tr>
==Ways to enter special characters==
<tr><td>Sx<td>Ŝ</tr>

<tr><td>Sxx<td>Sx</tr>
Many characters not in the repertoire of standard [[Wikipedia:en:ASCII|ASCII]] will be useful&mdash;even necessary&mdash;for [[w:Wiki|wiki]] pages, especially for foreign language textbooks. This page contains recommendations for which characters are safe to use and how to use them. There are three ways to enter a non-ASCII character into the wikitext:
<tr><td>Sxxx<td>Ŝx</tr>

<tr><td>Sxxxx<td>Sxx</tr>
# Enter the character directly from a foreign keyboard, or by cut and paste from a "character map" type application, or by some special means provided by the operating system or text editing application. Some browsers will change characters outside the charset of the wiki into html numeric charater entities (see below).
<tr><td>Sxxxxx<td>Ŝxx</tr>
# Use an [[Wikipedia:en:HTML|HTML]] named character entity reference like <code>&amp;agrave;</code>. This is unambiguous even when the server does not announce the use of any special character set, and even when the character does not display properly on some browsers. However, it may cause difficulties with searches (see below).
# Use an HTML numeric character entity reference like <code>&amp;#161;</code>. Unfortunately some old browsers incorrectly interpret these as references to the native character set.<!--which ones?--> It is, however, the only way to enter [[Wikipedia:en:Unicode|Unicode]] values for which there is no named entity, such as the [[MediaWiki User's Guide: Creating special characters/Turkish|Turkish]] letters. Note that because the code points 128 to 159 are unused in both [[en:ISO-8859-1]] and [[Wikipedia:en:Unicode|Unicode]], character references in that range such as <code>&amp;#131;</code> are illegal and ambiguous, though they are commonly used by many web sites. (Note they are not technically unused, but they map to rare control codes that are illegal in html.) Also note that almost all browsers treat iso-8859-1 as windows-1252, which does have printable characters in that space, and they often find their way into article titles on en, which really causes confusion when trying to create interwiki links to said pages.

Generally speaking, Western European languages such as Spanish, French, and German pose few problems. For specific details about other languages, see: [[MediaWiki User's Guide: Creating special characters/Turkish|Turkish]]. (More will be added to this list as contributors in other languages appear.)

For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odili&euml;nberg can only be found by searching for Odili, euml and|or nberg; this is actually a bug that should be fixed&mdash;the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also [[Help:Searching]].

== ISO-8859-1 Characters ==

The following [[Wikipedia:en:extended ASCII|extended ASCII]] characters are safe for use in all Wiki pages. The table below shows the character itself, lists the code for each character in hexadecimal and decimal, shows the HTML entity name, and gives the common name of the character.

<table border=1 cellpadding=5 cellspacing=0>
<tr><th>Literal<th>Hex<th>Dec<th>Entity<th>Character
<tr><td>&nbsp; <td>00A0 <td>0160 <td>&amp;nbsp; <td>[[w:no-break space|no-break space]]
<tr><td>&iexcl; <td>00A1 <td>0161 <td>&amp;iexcl; <td>[[w:inverted exclamation|inverted exclamation]]
<tr><td>&cent; <td>00A2 <td>0162 <td>&amp;cent; <td>[[w:cent sign|cent sign]]
<tr><td>&pound; <td>00A3 <td>0163 <td>&amp;pound; <td>[[pound sign]]
<tr><td>&curren;<td>00A4 <td>0164 <td>&amp;curren;<td>[[intl. currency sign]]
<tr><td>&yen; <td>00A5 <td>0165 <td>&amp;yen; <td>[[yen sign]]
<tr><td>&sect; <td>00A7 <td>0167 <td>&amp;sect; <td>[[section sign]]
<tr><td>&uml; <td>00A8 <td>0168 <td>&amp;uml; <td>[[diaeresis]] (umlaut)
<tr><td>&copy; <td>00A9 <td>0169 <td>&amp;copy; <td>[[copyright sign]]
<tr><td>&ordf; <td>00AA <td>0170 <td>&amp;ordf; <td>[[feminine ordinal]]
<tr><td>&laquo; <td>00AB <td>0171 <td>&amp;laquo; <td>[[left double-angle quote]]
<tr><td>&not; <td>00AC <td>0172 <td>&amp;not; <td>[[not sign]]
<tr><td>&reg; <td>00AE <td>0174 <td>&amp;reg; <td>[[registered trademark sign]]
<tr><td>&macr; <td>00AF <td>0175 <td>&amp;macr; <td>[[macron]]
<tr><td>&deg; <td>00B0 <td>0176 <td>&amp;deg; <td>[[degree sign]]
<tr><td>&plusmn;<td>00B1 <td>0177 <td>&amp;plusmn;<td>[[plus-minus sign]]
<tr><td>&acute; <td>00B4 <td>0180 <td>&amp;acute; <td>[[acute accent]]
<tr><td>&micro; <td>00B5 <td>0181 <td>&amp;micro; <td>[[micro sign]]
<tr><td>&para; <td>00B6 <td>0182 <td>&amp;para; <td>[[Wikipedia:en:pilcrow|pilcrow]] (paragraph) sign
<tr><td>&middot;<td>00B7 <td>0183 <td>&amp;middot;<td>[[middle dot (Georgian comma)]]
<tr><td>&cedil; <td>00B8 <td>0184 <td>&amp;cedil; <td>[[cedilla]]
<tr><td>&ordm; <td>00BA <td>0186 <td>&amp;ordm; <td>[[masculine ordinal]]
<tr><td>&raquo; <td>00BB <td>0187 <td>&amp;raquo; <td>[[right double-angle quote]]
<tr><td>&iquest;<td>00BF <td>0191 <td>&amp;iquest;<td>[[inverted question]]
<tr><td>&Agrave;<td>00C0 <td>0192 <td>&amp;Agrave;<td>[[A grave]]
<tr><td>&Aacute;<td>00C1 <td>0193 <td>&amp;Aacute;<td>[[A acute]]
<tr><td>&Acirc; <td>00C2 <td>0194 <td>&amp;Acirc; <td>[[A circumflex]]
<tr><td>&Atilde;<td>00C3 <td>0195 <td>&amp;Atilde;<td>[[A tilde]]
<tr><td>&Auml; <td>00C4 <td>0196 <td>&amp;Auml; <td>[[A diaeresis]]
<tr><td>&Aring; <td>00C5 <td>0197 <td>&amp;Aring; <td>[[A ring]]
<tr><td>&AElig; <td>00C6 <td>0198 <td>&amp;AElig; <td>[[AE ligature]]
<tr><td>&Ccedil;<td>00C7 <td>0199 <td>&amp;Ccedil;<td>[[C cedilla]]
<tr><td>&Egrave;<td>00C8 <td>0200 <td>&amp;Egrave;<td>[[E grave]]
<tr><td>&Eacute;<td>00C9 <td>0201 <td>&amp;Eacute;<td>[[E acute]]
<tr><td>&Ecirc; <td>00CA <td>0202 <td>&amp;Ecirc; <td>[[E circumflex]]
<tr><td>&Euml; <td>00CB <td>0203 <td>&amp;Euml; <td>[[E diaeresis]]
<tr><td>&Igrave;<td>00CC <td>0204 <td>&amp;Igrave;<td>[[I grave]]
<tr><td>&Iacute;<td>00CD <td>0205 <td>&amp;Iacute;<td>[[I acute]]
<tr><td>&Icirc; <td>00CE <td>0206 <td>&amp;Icirc; <td>[[I circumflex]]
<tr><td>&Iuml; <td>00CF <td>0207 <td>&amp;Iuml; <td>[[I diaeresis]]
<tr><td>&Ntilde;<td>00D1 <td>0209 <td>&amp;Ntilde;<td>[[N tilde]]
<tr><td>&Ograve;<td>00D2 <td>0210 <td>&amp;Ograve;<td>[[O grave]]
<tr><td>&Oacute;<td>00D3 <td>0211 <td>&amp;Oacute;<td>[[O acute]]
<tr><td>&Ocirc; <td>00D4 <td>0212 <td>&amp;Ocirc; <td>[[O circumflex]]
<tr><td>&Otilde;<td>00D5 <td>0213 <td>&amp;Otilde;<td>[[O tilde]]
<tr><td>&Ouml; <td>00D6 <td>0214 <td>&amp;Ouml; <td>[[O diaeresis]]
<tr><td>&Oslash;<td>00D8 <td>0216 <td>&amp;Oslash;<td>[[O stroke]]
<tr><td>&Ugrave;<td>00D9 <td>0217 <td>&amp;Ugrave;<td>[[U grave]]
<tr><td>&Uacute;<td>00DA <td>0218 <td>&amp;Uacute;<td>[[U acute]]
<tr><td>&Ucirc; <td>00DB <td>0219 <td>&amp;Ucirc; <td>[[U circumflex]]
<tr><td>&Uuml; <td>00DC <td>0220 <td>&amp;Uuml; <td>[[U diaeresis]]
<tr><td>&szlig; <td>00DF <td>0223 <td>&amp;szlig; <td>[[sharp s (ess-zed)]]
<tr><td>&agrave;<td>00E0 <td>0224 <td>&amp;agrave;<td>[[a grave]]
<tr><td>&aacute;<td>00E1 <td>0225 <td>&amp;aacute;<td>[[a acute]]
<tr><td>&acirc; <td>00E2 <td>0226 <td>&amp;acirc; <td>[[a circumflex]]
<tr><td>&atilde;<td>00E3 <td>0227 <td>&amp;atilde;<td>[[a tilde]]
<tr><td>&auml; <td>00E4 <td>0228 <td>&amp;auml; <td>[[a diaeresis]]
<tr><td>&aring; <td>00E5 <td>0229 <td>&amp;aring; <td>[[a ring]]
<tr><td>&aelig; <td>00E6 <td>0230 <td>&amp;aelig; <td>[[ae ligature]]
<tr><td>&ccedil;<td>00E7 <td>0231 <td>&amp;ccedil;<td>[[c cedilla]]
<tr><td>&egrave;<td>00E8 <td>0232 <td>&amp;egrave;<td>[[e grave]]
<tr><td>&eacute;<td>00E9 <td>0233 <td>&amp;eacute;<td>[[e acute]]
<tr><td>&ecirc; <td>00EA <td>0234 <td>&amp;ecirc; <td>[[e circumflex]]
<tr><td>&euml; <td>00EB <td>0235 <td>&amp;euml; <td>[[e diaeresis]]
<tr><td>&igrave;<td>00EC <td>0236 <td>&amp;igrave;<td>[[i grave]]
<tr><td>&iacute;<td>00ED <td>0237 <td>&amp;iacute;<td>[[i acute]]
<tr><td>&icirc; <td>00EE <td>0238 <td>&amp;icirc; <td>[[i circumflex]]
<tr><td>&iuml; <td>00EF <td>0239 <td>&amp;iuml; <td>[[i diaeresis]]
<tr><td>&ntilde;<td>00F1 <td>0241 <td>&amp;ntilde;<td>[[n tilde]]
<tr><td>&ograve;<td>00F2 <td>0242 <td>&amp;ograve;<td>[[o grave]]
<tr><td>&oacute;<td>00F3 <td>0243 <td>&amp;oacute;<td>[[o acute]]
<tr><td>&ocirc; <td>00F4 <td>0244 <td>&amp;ocirc; <td>[[o circumflex]]
<tr><td>&otilde;<td>00F5 <td>0245 <td>&amp;otilde;<td>[[o tilde]]
<tr><td>&ouml; <td>00F6 <td>0246 <td>&amp;ouml; <td>[[o diaeresis]]
<tr><td>&divide;<td>00F7 <td>0247 <td>&amp;divide;<td>[[divide sign]]
<tr><td>&oslash;<td>00F8 <td>0248 <td>&amp;oslash;<td>[[o stroke]]
<tr><td>&ugrave;<td>00F9 <td>0249 <td>&amp;ugrave;<td>[[u grave]]
<tr><td>&uacute;<td>00FA <td>0250 <td>&amp;uacute;<td>[[u acute]]
<tr><td>&ucirc; <td>00FB <td>0251 <td>&amp;ucirc; <td>[[u circumflex]]
<tr><td>&uuml; <td>00FC <td>0252 <td>&amp;uuml; <td>[[u diaeresis]]
<tr><td>&yuml; <td>00FF <td>0255 <td>&amp;yuml; <td>[[y diaeresis]]
</table>
</table>


MediaWiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.
These characters are a subset of the most common extended ASCII character set in use on the [[Wikipedia:en:Internet|Internet]], [[Wikipedia:en:ISO 8859-1|ISO 8859-1]]. MediaWiki pages are identified by the server as containing ISO-8859-1 text. The characters above are a subset selected to improve compatibility with other machines.


The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. You may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.
For example, the [[Wikipedia:en:Apple Macintosh|Apple Macintosh]] is in common use on the Internet, is not limited to any specific language, and its native character set (which is not ISO-8859-1) contains many of the common international characters. Many Macintosh browsers will correctly translate ISO text into the native character set, as long as the characters used are available. So the table above is the subset of ISO-8859-1 characters that are also available on the native Macintosh character set. (This is the situation up through [[Wikipedia:en:Mac OS 9|Mac OS 9]].x, at any rate; [[Wikipedia:en:Mac OS X|Mac OS X]] appears to use Unicode as its native encoding.)


For example, the interlanguage link <nowiki>[[en:Luxury car]]</nowiki> to [[en:Luxury car]] has to be entered in the edit box as <nowiki>[[en:Luxxury car]]</nowiki> on [[eo:]]. This has caused problems with interwiki update bots in the past.
[[Wikipedia:en:Microsoft Windows|Microsoft Windows]] standard code page 1252 set is a superset of ISO-8859-1, so these characters will be readable as is on Windows machines. The most common Latin character sets other than ISO-8859-1 are MS-DOS (pre-Windows) code page 437, Macintosh Roman, and other ISO sets such as ISO-8859-2. The number of pre-Windows MS-DOS machines with web browsers is small and they are often dedicated-purpose machines that wouldn't be using MediaWiki anyway, so it is reasonably safe to sacrifice compatibility with them for the sake of needed foreign characters. Other ISO sets are generally intended to be read by other browsers using those same sets in the same country, and so those pages should use a language-specific set.


===Browser issues===
These characters can be entered either as HTML named character entity references such as '''&amp;agrave;''', directly from foreign keyboards, or with whatever facilities are available to the Wiki author for entering these characters. For example, Wiki authors using Windows machines can enter these by holding down the Alt key while typing the 4-digit decimal code of the character on the numeric pad of the keyboard. It is important that all 4 digits (including the leading 0) be typed; typing a 3-digit code will enter characters from the obsolete code page 437. Wiki authors using Macintosh machines should take care to either use special facilities to enter these in ISO-8859-1 format rather than with the native character set, or else use HTML named character entity references. Note that some Windows users may have trouble with versions of Microsoft Internet Explorer that use "Alt-Left-Arrow" and "Alt-Right-Arrow" for page movement. These will interfere with entering codes that contain the digits 4 and 6. Use HTML named character entity references in this case.
Some browsers are known to do nasty things to text in the edit box. Most commonly they convert it to an encoding native to the platform (whilst the NT line of Windows is internally [[w:UTF-16/UCS-2|UCS-2LE]] (2 Byte subset of UTF-16) it has a complete duplicate set of APIs in the Windows ANSI code page and many older apps tend to use these, especially for things like edit boxes). Then they let the user edit it using a standard edit control and convert it back. The result is that any characters that do not exist in the encoding used for editing get replaced with something that does (often a question mark though at least one browser has been reported to actually transliterate text!).


====IE for the Mac====
The characters from the table above can be used directly as 8-bit characters in all Wiki pages, and are sufficient for all pages primarily in English, Spanish, French, German, and languages that require no more special characters than those (such as Catalan). These are also generally safe to use in titles, except for a few characters like double quotes, less than and greater than, and a few others.
This relatively common browser translates to [[w:mac-roman|mac-roman]] for the edit box with the result it munges most Unicode stuff (usually but not always by replacing them with a question mark). It also munges things that are in ISO-8859-1 but not mac-roman (specifically ¤ ¦ ¹ ² ³ ¼ ½ ¾ Ð × Ý Þ ð ý þ and the soft hyphen) so the problems it causes are not limited to Unicode wikis (though they tend to be much worse on Unicode wikis because they affect actual text and interwiki links rather than just fairly obscure symbols).


=== Unsafe characters ===
====Netscape 4.x====
Similar issues to IE Mac though the character set converted to and from will obviously not always be mac-roman.


====Console browsers====
Note especially what is missing here from the full ISO-8859-1 set: The broken bar (<code>0166=&amp;brvbar;</code> [&brvbar;]¹), soft hyphen (<code>0173=&amp;shy;</code> [&shy;]¹), superscript digits (<code>0178=&amp;sup2;</code> [&sup2;]¹, <code>0179=&amp;sup3;</code> [&sup3;]¹), vulgar fractions (<code>0188=&amp;frac14;</code> [&frac14;]¹, <code>0189=&amp;frac12;</code> [&frac12;]¹, <code>0190=&amp;frac34;</code> [&frac34;]¹), Old English (and [[Wikipedia:en:Icelandic language|Icelandic]] and [[Wikipedia:en:Old Norse language|Old Norse language]]) eth and thorn (<code>0208=&amp;ETH;</code> [&ETH;]¹, <code>0240=&amp;eth;</code> [&eth;]¹, <code>0222=&amp;THORN;</code> [&THORN;]¹, <code>0254=&amp;thorn;</code> [&thorn;]¹), and multiply sign (<code>0215=&amp;times;</code> [&times;]¹). These should be considered unsafe (and adequate substitutes are available for most of them).
Lynx, Links (in text mode) and W3M convert to the console character set (Lynx and Links actually using a transliteration engine) for editing and convert back on save. If the console character set is UTF-8 then these browsers are Unicode safe but if it isn't they aren't. With Lynx and Links a possible detection method would be to add another edit box to the login form but this won't work for W3M as it doesn't convert the text to the console character set until the user actually attempts to edit it.


====The workaround====
Special care should be taken with characters that do exist in the native character set of popular machines but not in the above set. These are not safe, even though they may display correctly to you when you use them. Characters from Windows code page 1252 not in ISO-8859-1 include the euro sign (<code>&amp;euro;</code> [&euro;]¹), dagger and double dagger (<code>&amp;dagger;</code> [&dagger;]¹, <code>&amp;Dagger;</code> [&Dagger;]¹), bullet (<code>&amp;bull;</code> [&bull;]¹), trade mark sign (<code>&amp;trade;</code> [&trade;]¹), typeset-style punctuation (see below), per mille sign (<code>&amp;permil;</code> [&permil;]¹), some Eastern European caron-accented letters, and the oe/OE ligatures. Characters from the Macintosh Roman set not in ISO-8859-1 include dagger and double dagger, bullet, trade mark sign, a few math symbols such as infinity (<code>&amp;infin;</code> [&infin;]¹) and not equal (<code>&amp;ne;</code> [&ne;]¹), a few commonly-used Greek letters such as pi (<code>&amp;pi;</code> [&pi;]¹), ligatures like oe/OE and fl, typeset-style punctuation, per mille sign, and lone accents such as the breve, [[Wikipedia:en:ogonek|ogonek]], and caron.
<table class="wikitable" style="float: right; margin-left: .5em;">
<tr>
<td>In database and edit<br>box for normal browsers</td>
<td>In editbox for<br />[[mw:Manual:$wgBrowserBlackList|trouble browsers]]</td>
</tr>
<tr>
<td>œ<td>&amp;#x153;</td>
</tr>
<tr>
<td>&amp;#x153;<td>&amp;#x0153;</td>
</tr>
<tr>
<td>&amp;#x0153;<td>&amp;#x00153;</td>
</tr>
</table>
After English Wikipedia switched to UTF-8 and interwiki bots started replacing html entities in interwikis with literal unicode text, edits that broke unicode characters became so common they could no longer be ignored. A workaround was developed to allow the problematic browsers to edit safely provided that MediaWiki knew they have problems.


Browsers listed in the setting [[mw:Manual:$wgBrowserBlackList|$wgBrowserBlackList]] (a list of regexps that match against user agent strings) are supplied text for editing in a special form. Existing hexadecimal html entities in the page have an extra leading zero added, non-ascii characters that are stored in the wikitext are represented as hexadecimal html entities with no leading zeros.
[http://www.w3.org/TR/html4/ HTML 4.0] defines named character entities for some Latin characters not in ISO-8859-1 that are used by popular languages, such as OE ligature (<code>&amp;OElig;</code> [&OElig;]¹, <code>&amp;oelig;</code> [&oelig;]¹), uppercase Y with diaeresis (<code>&amp;Yuml;</code> [&Yuml;]¹), and some Eastern European accented characters like <code>&amp;scaron;</code> [&scaron;]¹. These are also unsafe; though if they entered as HTML named character entity references, they may display on some machines.


Currently the default settings only have IE mac and a specific version of netscape 4.x for linux in the blacklist. Nevertheless it seems to have stopped most of the problem.
In short, don't assume that it is safe to use a special character just because it looks correct on your machine. Use the ones from the table above, and read and understand how to use others shown below.


==Viewing==
:<small>¹ sample in square brackets to see if they work on your configuration</small>
Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. On English Wikipedia there are a set of templates to do this. For example {{tlw|unicode}} for general Unicode text, {{tlw|polytonic}} for [[w:polytonic Greek|polytonic Greek]] and {{tlw|IPA}} for the [[w:International Phonetic Alphabet|International Phonetic Alphabet]]. The stuff in [[w:Windows Glyph List 4|Windows Glyph List 4]] should be safe to use without such special measures.


<nowiki><font face="Arial Unicode MS">...</font></nowiki> may work, but only for people with that font.
== Possibly usable non-ISO characters ==


==Displaying special characters==
Some characters not listed as safe above may still be usable when entered as named HTML character entity references, because web browsers will recognize them and render them correctly, perhaps by switching to alternate fonts as needed. All of these should be considered less safe to use than those above, but only in the sense that they may not display properly, though in the form of HTML character entity references they are unambiguous, and preserve data integrity.


To display Unicode or special characters on web page(s), one or more of the [[w:List of typefaces#Unicode_fonts|Unicode fonts]] need to be present or installed in your computer, first. For proper working functionality, ''setup'' or ''configuration'' or ''settings'' from the web page viewing browser software also needs to be modified.
For many of these, adequate substitutes and workarounds are available, and should be used when the value of making the text available to users of older computers and software exceeds the value of good presentation to those with newer software (in the judgment of the author or editor).


The default font for Latin scripts in [[w:Internet Explorer|Internet Explorer]](IE) web browser for Windows is [[w:Times New Roman|Times New Roman]]. It doesn't include many [[w:Mapping of Unicode characters|Unicode blocks]]. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as [[w:Lucida Sans Unicode|Lucida Sans Unicode]] font, which comes with Windows XP, [[w:DejaVu Sans|DejaVu Sans]], [[w:TITUS Cyberbit Basic|TITUS Cyberbit]], [[w:GNU Unifont|GNU Unifont]] which are freely available, or [[w:Arial Unicode MS|Arial Unicode MS]], which comes with Microsoft Office. &nbsp;See subsection below for specific instructions.
=== Typeset-style Punctuation ===


Alternatively, the style sheet page related to the web page(s), could also try using Unicode-range specifications to note the gaps where ''Times New Roman'' does not have glyphs from Unicode blocks, such as, Hawaiian [[w:Okina|‘okina]] (glottal stop), etc. and thus force the browser to check further down the list of next fonts to try to display those special characters.
Absent from the ISO-8859-1 character set, but commonly used and present in both Macintosh Roman and Windows code page 1252 character sets, are proper English quotation marks and dashes. These can be entered as character entity references, and should appear correctly on most machines running recent software. Even on ISO-based machines such as [[Wikipedia:en:Unix|Unix]]/[[Wikipedia:en:X Window System|X]], browsers should be able to interpret these references and make appropriate substitutes using plain ASCII straight quotes and hyphens. ([[Wikipedia:en:Mozilla|Mozilla]] does this correctly, for example.) These references were not present in older versions of HTML, so may not be recognized by older software. Since using these characters maintains data integrity even on those machines that may not display them correctly, it should be considered safe to use these unless proper display on old software is critical. German "low-9" quotation marks are a similar case, but are less commonly translated by browsing software, and so are not quite as safe. The table below shows these characters next to a capital letter "O" for better visibility:


Special symbols should display properly without further configuration with [[w:Mozilla Firefox|Mozilla Firefox]], [[w:Konqueror|Konqueror]], [[w:Opera (Internet suite)|Opera]], [[w:Safari (web browser)|Safari]] and most other recent browsers. An optional step can be taken for better (and correct) display of characters with [[w:Ligature (typography)|ligature]] forms, [[w:Combining character|combined characters]], after the previously mentioned steps were followed, is to install a [[w:Unicode#Multilingual_text-rendering_engines|rendering engine]] software.
<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&lsquo;O</td><td>&amp;lsquo;</td><td>left single quote</td>
<td>&mdash;O</td><td>&amp;mdash;</td><td>em dash</td></tr>
<tr><td>&rsquo;O</td><td>&amp;rsquo;</td><td>right single quote</td>
<td>&ndash;O</td><td>&amp;ndash;</td><td>en dash</td></tr>
<tr><td>&ldquo;O</td><td>&amp;ldquo;</td><td>left double quote</td>
<td>&sbquo;O</td><td>&amp;sbquo;</td><td>single low-9 quote</td></tr>
<tr><td>&rdquo;O</td><td>&amp;rdquo;</td><td>right double quote</td>
<td>&bdquo;O</td><td>&amp;bdquo;</td><td>double low-9 quote</td></tr>
</table>


To use one of the available Unicode fonts for displaying special characters inside a [[w:HTML Table|table]] or chart or box, specify the '''class="Unicode"''' in the table's '''TR''' row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in [[Help:Table|wiki table]] code, use that after the (TR equivalent) "'''&#124;-'''" (like, '''&#124;-&nbsp;class="Unicode"''').
Many web sites targeted for a Windows-using audience use code page 1252 references for these characters: for example, using <code>&amp;#151;</code> for the em dash. This is not a recommended practice. To ensure future data integrity and maximum compatibility, recode these as named references such as <code>&amp;mdash;</code>. If you really want to use a number, you can use <code>&amp;#8210;</code>.


For displaying individual special character, template code '''&#123;&#123;Unicode|'''''char'''''&#125;&#125;''' for each character can be used. HTML decimal or [[w:hexadecimal|hexadecimal]] numeric entity codes can be used in the place of the ''char''. If a paragraph with lots of special Unicode characters need to be displayed, then, '''&lt;p&nbsp;class="Unicode">''' ... '''&lt;/p>''', or, '''&lt;span&nbsp;class="Unicode">''' ... '''&#60;/span>''' code can also be used.
Be aware that if you edit text in a separate [[Wikipedia:en:word processor|word processor]] or other program to cut and paste into your browser, and it "automatically" converts quotes to the left and right "smart quotes" for you, you may unknowingly mangle markup, either your own or already existing, by replacing the standard quotes in HTML tags &amp; properties with the smart quotes, which will cause the tags to fail in various ways. Furthermore, some people consider the extra encoding of smart quotes, fancy "&amp;rsquo;" apostrophes used in possessives and contractions, etc., to be a waste of bytes that could be put to better use, and will replace them with the standard single characters at will.


The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s), are mostly covering fewer Unicode blocks, related to [[w:Unicode Latin|latin scripts]], then '''class="latinx"''' can be used. For special characters or symbols related to [[w:International Phonetic Alphabet|International Phonetic Alphabet]], '''class="IPA"''' can be used. For [[w:Polytonic orthography|polytonic (Greek)]] characters or related symbols, '''class="polytonic"''' can be used.
Set your wordprocessor options such as Auto Edit and Auto Correction such that undesired replacements do not occur.


==== Changing Internet Explorer's (IE) default font ====
== Greek letters and math symbols ==


From the IE menu bar, follow this path''':''' &nbsp;{{nowrap|Tools -> Internet Options -> Fonts -> Webpage Font:}}<br>
Compare &amp;nabla; and <nowiki><math>\nabla</math></nowiki>, giving &nabla; and <math>\nabla</math>, respectively. Depending on [[Help:Preferences#Rendering_math|preferences]], the second may be the same as the first (HTML rendering), or an image. The HTML symbol depends on the font size and type, the image has a fixed size in terms of pixels. The color of symbol and background in the first case are those of text in general, according to the settings, and for the image they are black on white.
to a scrolling list of fonts. As indicated above, the default selection for Windows is [[w:Times New Roman|Times New Roman]]. For viewing of many special characters, select a different font, such as [[w:Lucida Sans Unicode|Lucida Sans Unicode]], and then select '''OK'''.


==Linking text with special characters==
:''Note: much of the text below regarding mathematical symbols is obsolete now that MediaWiki supports embedded [[Wikipedia:en:TeX|TeX]] within pages. Non-trivial mathematical equations are probably best notated in TeX using the MediaWiki math tags. See the page [[MediaWiki User's Guide: Editing mathematical formulae]] for more on this.''
Many users have settings giving underlined links. When linking a special character, in some cases the result may be mistaken for another character with a different meaning:


Linking + − < > ⊂ ⊃ gives [[+]] [[−]] [[Inequality|<]] [[Inequality|>]] [[⊂]] [[⊃]] which may look like ± = ≤ ≥ ⊆ ⊇. In such cases one can better use a separate link:
Web standards for writing about mathematics are very recent (In fact MathML 2.0 was just released in February of 2001.), so many browsers made before these standards were in place try to compensate by at least allowing characters commonly used in mathematics, including most of the [[Wikipedia:en:Greek alphabet|Greek alphabet]]. These are necessarily entered as character entity references. Browsers might render these by switching to a "Symbol" font or something similar.
* A ⊂ B (see [[w:Subset|subset]])


There is less risk of confusion if more than one character is linked, e.g. [[x|''x'' > 3]].
Upper- and lowercase Greek letters simply use their full names for character entities. These should, of course, only be used for occasional Greek letters in primarily-Latin text. (Large quantities of Greek-language text should be written using an editor with native [[Wikipedia:en:UTF-8|UTF-8]] [[Wikipedia:en:Unicode|Unicode]] support to facilitate editing and reduce page bloat). Here are a few samples:


== Alt keycodes ==
<table border="1" cellspacing="0" cellpadding="3">
&#160;&#160;''See also : [[w:Alt codes|Alt codes]], [[w:Windows Alt keycodes|Windows Alt keycodes]]''
<tr><td>&alpha;</td><td>&amp;alpha;</td><td>&Gamma;</td><td>&amp;Gamma;</td></tr>
<tr><td>&beta;</td><td>&amp;beta;</td><td>&Lambda;</td><td>&amp;Lambda;</td></tr>
<tr><td>&gamma;</td><td>&amp;gamma;</td><td>&Sigma;</td><td>&amp;Sigma;</td></tr>
<tr><td>&pi;</td><td>&amp;pi;</td><td>&Pi;</td><td>&amp;Pi;</td></tr>
<tr><td>&sigma;</td><td>&amp;sigma;</td><td>&Omega;</td><td>&amp;Omega;</td></tr>
<tr><td>&sigmaf;</td><td colspan="3">&amp;sigmaf; (final sigma, lowercase only)</td></tr>
</table>


Many special characters which have decimal equivalent codepoint numbers that are below 256, can be typed in by using the keyboard's '''Alt + Decimal''' equivalent code numbers keys.
Other common math symbols


For example, the character '''é''' (Small e with acute accent, html entity code "&amp;eacute;") can be obtained by pressing Alt + 130.
<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&ne;</td><td>&amp;ne;</td><td>&prime;</td><td>&amp;prime;</td></tr>
<tr><td>&le;</td><td>&amp;le;</td><td>&Prime;</td><td>&amp;Prime;</td></tr>
<tr><td>&ge;</td><td>&amp;ge;</td><td>&part;</td><td>&amp;part;</td></tr>
<tr><td>&equiv;</td><td>&amp;equiv;</td><td>&int;</td><td>&amp;int;</td></tr>
<tr><td>&asymp;</td><td>&amp;asymp;</td><td>&sum;</td><td>&amp;sum;</td></tr>
<tr><td>&infin;</td><td>&amp;infin;</td><td>&prod;</td><td>&amp;prod;</td></tr>
<tr><td>&radic;</td><td>&amp;radic;</td><td colspan="2">&nbsp;</td></tr>
</table>


Which means, first press the "Alt" key and keep on pressing it (or keep on holding it), with your left hand, then press the digit keys 1, 3, 0, in sequence, one by one, in the right-side Numeric Keypad part of the keyboard, then release the Alt key.
It was once customary to use the Adobe Symbol Symbol character set to render Greek letters and mathematical symbols. Both Macintosh and Windows operating systems provided a Symbol font using this set; a compatible Symbol font was included in most laser printers along with external truetype or postscript versions for computer use; and public domain Truetype and Postscript symbol fonts using this set were easily found. However, in web use, characters greater than hex 7F often did not transfer consistently between operating systems.


But special characters, for example, &lambda; (small lambda) cannot be obtained from its decimal code 955 or 0955, by using it with the Alt key, if used inside Notepad or Internet Explorer ([[w:Internet Explorer|IE]]). You'll get wrong character "╗" or "»".
However, all of these characters were included in Unicode from the beginning and all are now firmly part of Unicode. Also many browsers no longer support separate Symbol fonts as their encoding methods break HTML rules. Accordingly use of the Symbol character set is strongly discouraged. Some products such as [[Wikipedia:en:TtH|TtH]] still use a special hacked Symbol font to render equations which can be viewed on such browsers as do not support a normal Symbol font, but you should be aware that if you create text requiring such a font, you are restricting your audience to users who also have this font. (Whether or not that's acceptable is a judgement you will have to make as an author.)


The "Wordpad" (Windows Operating system) editor accepts the decimal (numeric entity codepoints) values above 256, so it can be used to obtain the Special/Unicode characters, then copy-paste where you need.
== Other common symbols ==


To obtain such special characters correctly, which have decimal codepoint values above the 256, another option is to use or type its hex equivalent codepoint first, then press '''Alt+X''' keys. To do this, open or start ''Wordpad'', ''Word'', etc editing application software, (this Alt+X process will not work in Internet Explorer, Notepad, etc). Type in '''3BB''', which is a hexadecimal equivalent numeric codepoint of the character '''&lambda;''', then press Alt+X. Hexcode ''3BB'' will convert/turn into the ''&lambda;'' character. If you press the Alt+X key combination again, then &lambda; character will convert back to its hex equivalent codepoint, ''3BB''. Now character(s) can be copy pasted, where you want to use, or, (in [[w:Internet Explorer|IE]]) use its html hexadecimal equivalent code &amp;#x3BB; or its html decimal equivalent code &amp;lambda;.
Some characters such as the bullet, [[Wikipedia:en:Euro|Euro]] currency sign, and trade mark sign are special cases. They are likely to be understood and rendered in some way by many browsers. Because they are important for international trade, many computers specifically add them to fonts at some non-standard location and render them when requested, or else render them in special ways that don't require them to be present in a font. See below for how your browser renders these:


==Characters and formulas which are not directly entered as wikitext==
<table border="1" cellspacing="0" cellpadding="3">
*{{xpdplain|x<|sub>k<|/sub>|d=}}
<tr><td>&bull;</td><td>&amp;bull;</td><td>[[Wikipedia:en:Bullet_(typography)|bullet]]</td></tr>
*{{xpdplain|x<|sup>k<|/sup>|d=}}
<tr><td>&euro;</td><td>&amp;euro;</td><td>euro currency sign</td></tr>
<tr><td>&trade;</td><td>&amp;trade;</td><td>trade mark sign</td></tr>
</table>


Alternative wikitext for characters that can directly be entered as wikitext:
Other somewhat less commonly used symbols include these:
*<code>&amp;rarr;</code> gives &rarr;, etc.


===Characters and formulas displayed as image===
<table border="1" cellspacing="0" cellpadding="3">
Displaying additional characters and also formulas:
<tr><td>&dagger;</td><td>&amp;dagger;</td><td>[[Wikipedia:en:dagger_(typography)|dagger]]</td>
<td ROWSPAN="7">&#12288;</td>
<td>&spades;</td><td>&amp;spades;</td><td>black spade suit</td></tr>
<tr><td>&Dagger;</td><td>&amp;Dagger;</td><td>[[Wikipedia:en:dagger_(typography)|double dagger]]</td>
<td>&clubs;</td><td>&amp;clubs;</td><td>black club suit</td></tr>
<tr><td>&loz;</td><td>&amp;loz;</td><td>lozenge</td>
<td>&hearts; ''or'' <font face="sans-serif" color="red">&hearts;</font></td><td>&amp;hearts; (see below)</td><td>red heart suit</td></tr>
<tr><td>&larr;</td><td>&amp;larr;</td><td>leftward arrow</td>
<td>&diams; ''or'' <font face="sans-serif" color="red">&diams;</font></td><td>&amp;diams; (see below)</td><td>red diamond suit</td></tr>
<tr><td>&uarr;</td><td>&amp;uarr;</td><td>upward arrow</td>
<td>&lsaquo;</td><td>&amp;lsaquo;</td><td>single left-pointing angle quote</td></tr>
<tr><td>&rarr;</td><td>&amp;rarr;</td><td>rightward arrow</td>
<td>&rsaquo;</td><td>&amp;rsaquo;</td><td>single right-pointing angle quote</td></tr>
<tr><td>&darr;</td><td>&amp;darr;</td><td>downward arrow</td>
<td>&permil;</td><td>&amp;permil;</td><td>per mille sign</td></tr>
</table>


For example: {{xpdop3c|#tag:math|\sqrt x|d=}}
These should be considered unsafe to use except perhaps on pages intended for a specific audience likely to have very up-to-date software on popular machines. Even then, in some cases, [[Wikipedia:en:Internet Explorer|IE]] 6.0 does not show the diamond symbol above. The regular diamond &diams; displays in IE 5 but not 6. The alternative code for the red diamond <font face="sans-serif" color="red">&diams;</font>, which works in IE 6 but not 5, is <nowiki><font face="sans-serif" color="red">&amp;diams;</font></nowiki>.


A [[Help:Preferences#Math|user preference setting]] controls to what extent HTML code is used, if possible, and to what extent images. See [[Help:Displaying a formula]].
== Unicode ==


----
The official [[Wikipedia:en:character set|character set]] of [http://www.w3.org/TR/html4/charset.html HTML 4.01] is the [[Wikipedia:en:ISO 10646|ISO 10646]] [[Wikipedia:en:UCS|Universal Character Set]], which is equivalent to the character set defined by [[Wikipedia:en:Unicode|Unicode]]. Many browsers, though, are only capable of displaying a small subset of the full UCS repertoire.


Egyptian hieroglyphs:
Numeric character entity references are the only way to enter these characters into a Wiki page at present.


For example: {{xpdop3c|#tag:hiero|a-p:t-q|d=}}
There are two ways:
*decimal, e.g. <code><b><font style="font-size:120%"> &amp;#1049;</font></b></code> giving <b><font style="font-size:120%"> &#1049;</font></b> on your browser
*hexadecimal, in this case <code><b><font style="font-size:120%"> &amp;#x419;</font></b></code> giving <b><font style="font-size:120%"> &#x419;</font></b>.


See [[mw:Extension:WikiHiero/Syntax]].
These should be the same. However, decimal encoding will increase the number of browsers on which they will work. [http://unicode.coeurlumiere.com/] shows for all possible values whether they work and how they look in your browser, using decimal code.

For example, the codes <code>&amp;#1049; &amp;#1511; &amp;#1605;</code> display on your browser as '''&#1049;''', '''&#1511;''', and '''&#1605;''', which ideally look like the [[Wikipedia:en:Cyrillic alphabet|Cyrillic]] letter "Short I", the [[Wikipedia:en:Hebrew alphabet|Hebrew]] letter "Qof", and the [[Wikipedia:en:Arabic alphabet|Arabic]] letter "Meem", respectively. It is unlikely that your computer has all of those fonts and will display them all correctly unless you have a Macintosh or have installed the fonts, though it may display a subset of them. Because they are encoded according to the standard, though, they ''will'' display correctly on any system that is compliant and has the characters available.

These characters should not be used in MediaWiki pages unless they make no
difference to the understanding of the text, and are just extra information.

See [[Wikipedia:en:Unicode and HTML|Unicode and HTML]] for character entities tables.

Most wikimedia wikis have now switched to utf-8 allowing direct entry of unicode text however care must still be taken to avoid overuse of strange unicode charactors in places where people are likely to be unable to see them.

== Advanced Entities ==

The following additional entities are available. On some browsers, these are converted to Unicode equivalents.

''[table missing]''

Special Note: The Del symbol ("nabla;"), among others, is not supported on Windows 95 or 98. On the English Wikipedia it has been uploaded as an image, and can there be referenced as <nowiki>[[Image:Del.gif]]</nowiki>, or here and some other projects as <nowiki>http://en.wikipedia.org/upload/d/db/Del.gif</nowiki>, and looks like this: http://en.wikipedia.org/upload/d/db/Del.gif. On projects where this does not work, upload a copy of the image to that project.

However, the del symbol is usually found in formulæ which are better facilitated using [[MediaWiki User's Guide: Editing mathematical formulae]].

==Egyptian Hieroglyphs==

E.g. <nowiki><hiero>P2</hiero></nowiki> gives <hiero>P2</hiero> See [[Help:WikiHiero syntax]].

This is not dependent on browser capabilities, because it uses images on the servers.

==Browser differences==

Not all characters are displayed in all browsers. Also, since the font in the edit box may well be different from that of the rendered page, the browser may show the characters properly in one of the two areas and not in the other. For each, try to choose fonts which show all characters you need.

In the case of ISO-8859-1 encoding, special characters in the edit box are converted to code that consists of the common characters &, #, digits and a semi-colon, which are always displayed properly.

The HTML source code anyway shows the codes of both the characters that are displayed and those that are not. The HTML source code of a preview webpage also shows these for the wikitext.

Note that as a reader, it is best to use a browser with maximum capabilities, but as an author the least capable of the common browsers is a better guideline.

Alternatives include using a similar, more common symbol, or using an image, e.g. [[eo:&#348;ablono:El]]: http://eo.wikipedia.org/upload/d/db/Ikono_tero_malgranda.png.

Also you can describe the character.


==See also==
==See also==
*[[Help:Advanced editing#Special characters]]

*[[Help:Displaying a formula]]
*[[Help:Editing]] - [[Help:Editing#Character_formatting|character formatting]] - [[Help:Editing#subscript|subscript]]
*[[Help:Formula]]
*[[Help:URL]]
*[[Help:URL]]
*[[Help:Romanian characters]]
*[[Help:Romanian characters]]
*[[Help:Turkish characters]]
*[[Help:Turkish characters]]
*[[w:Talk:Runic alphabet]]
*[[w:Help:Special characters]]
*[[w:en:Character encodings in HTML/all characters]]
*[[w:Mapping of Unicode characters|Mapping of Unicode characters]]
*[[w:Talk:Runic alphabet|Runic alphabet]]
*[[w:en:Alphabets derived from the Latin]]
*[[w:Alphabets derived from the Latin|Alphabets derived from the Latin]]
*[[w:Unicode#Input_methods|Unicode input methods]]
*[[w:Windows Alt keycodes|Windows Alt keycodes]] chart and Alt+X keycodes chart.
*[[w:Help:Wiki markup#Special characters]]
*[[w:User_talk:GregU/hotkeys.js|hotkeys.js]] – tool for easily entering special characters via definable Ctrl-key mappings


==External links==
==External links==


*http://www.unicode.org/charts/ Unicode character charts; hexadecimal numbers only; PDF files showing all characters independent of browser capabilities
*http://www.unicode.org/charts/ Unicode character charts; hexadecimal numbers only; PDF files showing all characters independent of browser capabilities
*http://www.unicode.org/help/display_problems.html Help for enabling Unicode support on most platforms
* [http://unicode.coeurlumiere.com/ Table of Unicode characters from 1 to 65535] - shows how the decimal character references look in one's browser
* [http://unicode.coeurlumiere.com/ Table of Unicode characters from 1 to 65535] - shows how the decimal character references look in one's browser
*[http://www.alanwood.net/demos/ent4_frame.html HTML 4.0 Character Entity References] - shows how the named and decimal character references look in one's browser
*[http://www.alanwood.net/demos/ent4_frame.html HTML 4.0 Character Entity References] - shows how the named and decimal character references look in one's browser
*[http://www.fileformat.info/info/unicode/block/index.htm FileFormat.Info] - details of many Unicode characters, including the named, decimal and hexadecimal character reference, showing how it should look and for each, how it looks in one's browser
*[http://www.fileformat.info/info/unicode/block/index.htm FileFormat.Info] - details of many Unicode characters, including the named, decimal and hexadecimal character reference, showing how it should look and for each, how it looks in one's browser
*[http://www.alanwood.net/unicode/index.html Alan Wood's Unicode Resources] - comprehensive resource with character test pages for all Unicode ranges, as well as OS-specific Unicode support information and links to fonts and utilities.
*[http://www.alanwood.net/unicode/index.html Alan Wood's Unicode Resources] - comprehensive resource with character test pages for all Unicode ranges, as well as OS-specific Unicode support information and links to fonts and utilities.
*[http://www.tacowidgets.com/widgets/characterpal/ CharacterPal] - Free Mac OS X Dashboard Widget that displays key combinations for special characters.
* A [http://rishida.net/tools/conversion/ convertor] that helps you find the right escape sequence to use - helps when you need to escape ASCII/Unicode characters that are special characters in wiki markup


{{H:f|langs=|enname=Special characters}}
{{H:f|langs=|enname=Special characters}}

[[Category:Editor handbook]]

Latest revision as of 22:50, 30 October 2023

From MediaWiki 1.5, all projects use Unicode (UTF-8) character encoding. Many characters, including CJK characters, can be in the wikitext itself. They use a variable number of bytes per character.

Important special characters[edit]

Umlauts and accents: À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Œ Õ Ö Ø Ù Ú Û Ü ß à á â ã ä å æ ç è é ê ë ì í î ï ñ ò ó ô œ õ ö ø ù ú û ü ÿ

Punctuation: ¿ ¡ « » § ¶ † ‡ • - – —

Commercial symbols: ™ © ® ¢ € ¥ £ ¤

Greek characters: α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ ς τ υ φ χ ψ ω Γ Δ Θ Λ Ξ Π Σ Φ Ψ Ω

Math characters: ∫ ∑ ∏ √ − ± ∞ ≈ ∝ ≡ ≠ ≤ ≥ × · ÷ ∂ ′ ″ ∇ ‰ ° ∴ ø ∈ ∩ ∪ ⊂ ⊃ ⊆ ⊇ ¬ ∧ ∨ ∃ ∀ ⇒ ⇔ → ↔ ↑ ℵ ∉

For more, see w:Table of mathematical symbols.

Subscripts and superscripts as special characters (here shown with x): x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇ x₈ x₉ x⁰ x¹ x² x³ x⁴ x⁵ x⁶ x⁷ x⁸ x⁹

Compare, as alternative and for other sub- and superscripts:
  • x<sub>k</sub> → xk [1]
  • x<sup>k</sup> → xk [2]
  • {{#tag:math|x_k}} [3]
  • {{#tag:math|x^k}} [4]

Editing[edit]

Ways to enter a non-ASCII character into the wikitext:

  • Use a link to a special character listed under the edit box to insert that character. Wikis need Extension:CharInsert for this. Which characters are displayed depends on the wiki, and on user preference settings; sometimes lists are collapsible, or there is a menu to select a list.
  • Copy the character from some list on a webpage, like that above, or from a locally stored page. The character should not be an image or part of an image, hence for example not an image produced by the TeX feature of the wiki. Thus one can copy for example from the characters in the first column of w:Table of mathematical symbols.
  • Use a special keyboard function (or enter the character directly from a foreign keyboard).
  • Use a special browser function.
  • Use an HTML named character entity reference like &agrave; or HTML numeric character reference like &#161;, and copy the character from preview. In the past the code itself had to be stored in the wikitext. Such codes may still be present on some pages. Results of the internal search function may be affected by this. On the other hand, this search function cannot find some characters, including "→", while if it is coded as "&rarr;", it can be found by searching for "rarr". See also Help:Searching.

Esperanto[edit]

in edit boxin database and output
SS
SxŜ
SxxSx
SxxxŜx
SxxxxSxx
SxxxxxŜxx

MediaWiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.

The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. You may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.

For example, the interlanguage link [[en:Luxury car]] to en:Luxury car has to be entered in the edit box as [[en:Luxxury car]] on eo:. This has caused problems with interwiki update bots in the past.

Browser issues[edit]

Some browsers are known to do nasty things to text in the edit box. Most commonly they convert it to an encoding native to the platform (whilst the NT line of Windows is internally UCS-2LE (2 Byte subset of UTF-16) it has a complete duplicate set of APIs in the Windows ANSI code page and many older apps tend to use these, especially for things like edit boxes). Then they let the user edit it using a standard edit control and convert it back. The result is that any characters that do not exist in the encoding used for editing get replaced with something that does (often a question mark though at least one browser has been reported to actually transliterate text!).

IE for the Mac[edit]

This relatively common browser translates to mac-roman for the edit box with the result it munges most Unicode stuff (usually but not always by replacing them with a question mark). It also munges things that are in ISO-8859-1 but not mac-roman (specifically ¤ ¦ ¹ ² ³ ¼ ½ ¾ Ð × Ý Þ ð ý þ and the soft hyphen) so the problems it causes are not limited to Unicode wikis (though they tend to be much worse on Unicode wikis because they affect actual text and interwiki links rather than just fairly obscure symbols).

Netscape 4.x[edit]

Similar issues to IE Mac though the character set converted to and from will obviously not always be mac-roman.

Console browsers[edit]

Lynx, Links (in text mode) and W3M convert to the console character set (Lynx and Links actually using a transliteration engine) for editing and convert back on save. If the console character set is UTF-8 then these browsers are Unicode safe but if it isn't they aren't. With Lynx and Links a possible detection method would be to add another edit box to the login form but this won't work for W3M as it doesn't convert the text to the console character set until the user actually attempts to edit it.

The workaround[edit]

In database and edit
box for normal browsers
In editbox for
trouble browsers
œ&#x153;
&#x153;&#x0153;
&#x0153;&#x00153;

After English Wikipedia switched to UTF-8 and interwiki bots started replacing html entities in interwikis with literal unicode text, edits that broke unicode characters became so common they could no longer be ignored. A workaround was developed to allow the problematic browsers to edit safely provided that MediaWiki knew they have problems.

Browsers listed in the setting $wgBrowserBlackList (a list of regexps that match against user agent strings) are supplied text for editing in a special form. Existing hexadecimal html entities in the page have an extra leading zero added, non-ascii characters that are stored in the wikitext are represented as hexadecimal html entities with no leading zeros.

Currently the default settings only have IE mac and a specific version of netscape 4.x for linux in the blacklist. Nevertheless it seems to have stopped most of the problem.

Viewing[edit]

Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. On English Wikipedia there are a set of templates to do this. For example {{unicode}} for general Unicode text, {{polytonic}} for polytonic Greek and {{IPA}} for the International Phonetic Alphabet. The stuff in Windows Glyph List 4 should be safe to use without such special measures.

<font face="Arial Unicode MS">...</font> may work, but only for people with that font.

Displaying special characters[edit]

To display Unicode or special characters on web page(s), one or more of the Unicode fonts need to be present or installed in your computer, first. For proper working functionality, setup or configuration or settings from the web page viewing browser software also needs to be modified.

The default font for Latin scripts in Internet Explorer(IE) web browser for Windows is Times New Roman. It doesn't include many Unicode blocks. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as Lucida Sans Unicode font, which comes with Windows XP, DejaVu Sans, TITUS Cyberbit, GNU Unifont which are freely available, or Arial Unicode MS, which comes with Microsoft Office.  See subsection below for specific instructions.

Alternatively, the style sheet page related to the web page(s), could also try using Unicode-range specifications to note the gaps where Times New Roman does not have glyphs from Unicode blocks, such as, Hawaiian ‘okina (glottal stop), etc. and thus force the browser to check further down the list of next fonts to try to display those special characters.

Special symbols should display properly without further configuration with Mozilla Firefox, Konqueror, Opera, Safari and most other recent browsers. An optional step can be taken for better (and correct) display of characters with ligature forms, combined characters, after the previously mentioned steps were followed, is to install a rendering engine software.

To use one of the available Unicode fonts for displaying special characters inside a table or chart or box, specify the class="Unicode" in the table's TR row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in wiki table code, use that after the (TR equivalent) "|-" (like, |- class="Unicode").

For displaying individual special character, template code {{Unicode|char}} for each character can be used. HTML decimal or hexadecimal numeric entity codes can be used in the place of the char. If a paragraph with lots of special Unicode characters need to be displayed, then, <p class="Unicode"> ... </p>, or, <span class="Unicode"> ... </span> code can also be used.

The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s), are mostly covering fewer Unicode blocks, related to latin scripts, then class="latinx" can be used. For special characters or symbols related to International Phonetic Alphabet, class="IPA" can be used. For polytonic (Greek) characters or related symbols, class="polytonic" can be used.

Changing Internet Explorer's (IE) default font[edit]

From the IE menu bar, follow this path:  Tools -> Internet Options -> Fonts -> Webpage Font:
to a scrolling list of fonts. As indicated above, the default selection for Windows is Times New Roman. For viewing of many special characters, select a different font, such as Lucida Sans Unicode, and then select OK.

Linking text with special characters[edit]

Many users have settings giving underlined links. When linking a special character, in some cases the result may be mistaken for another character with a different meaning:

Linking + − < > ⊂ ⊃ gives + < > which may look like ± = ≤ ≥ ⊆ ⊇. In such cases one can better use a separate link:

There is less risk of confusion if more than one character is linked, e.g. x > 3.

Alt keycodes[edit]

  See also : Alt codes, Windows Alt keycodes

Many special characters which have decimal equivalent codepoint numbers that are below 256, can be typed in by using the keyboard's Alt + Decimal equivalent code numbers keys.

For example, the character é (Small e with acute accent, html entity code "&eacute;") can be obtained by pressing Alt + 130.

Which means, first press the "Alt" key and keep on pressing it (or keep on holding it), with your left hand, then press the digit keys 1, 3, 0, in sequence, one by one, in the right-side Numeric Keypad part of the keyboard, then release the Alt key.

But special characters, for example, λ (small lambda) cannot be obtained from its decimal code 955 or 0955, by using it with the Alt key, if used inside Notepad or Internet Explorer (IE). You'll get wrong character "╗" or "»".

The "Wordpad" (Windows Operating system) editor accepts the decimal (numeric entity codepoints) values above 256, so it can be used to obtain the Special/Unicode characters, then copy-paste where you need.

To obtain such special characters correctly, which have decimal codepoint values above the 256, another option is to use or type its hex equivalent codepoint first, then press Alt+X keys. To do this, open or start Wordpad, Word, etc editing application software, (this Alt+X process will not work in Internet Explorer, Notepad, etc). Type in 3BB, which is a hexadecimal equivalent numeric codepoint of the character λ, then press Alt+X. Hexcode 3BB will convert/turn into the λ character. If you press the Alt+X key combination again, then λ character will convert back to its hex equivalent codepoint, 3BB. Now character(s) can be copy pasted, where you want to use, or, (in IE) use its html hexadecimal equivalent code &#x3BB; or its html decimal equivalent code &lambda;.

Characters and formulas which are not directly entered as wikitext[edit]

  • x<sub>k</sub> → xk [5]
  • x<sup>k</sup> → xk [6]

Alternative wikitext for characters that can directly be entered as wikitext:

  • &rarr; gives →, etc.

Characters and formulas displayed as image[edit]

Displaying additional characters and also formulas:

For example: {{#tag:math|\sqrt x}} [7]

A user preference setting controls to what extent HTML code is used, if possible, and to what extent images. See Help:Displaying a formula.


Egyptian hieroglyphs:

For example: {{#tag:hiero|a-p:t-q}}

ap
t
q

[8]

See mw:Extension:WikiHiero/Syntax.

See also[edit]

External links[edit]

  • http://www.unicode.org/charts/ Unicode character charts; hexadecimal numbers only; PDF files showing all characters independent of browser capabilities
  • http://www.unicode.org/help/display_problems.html Help for enabling Unicode support on most platforms
  • Table of Unicode characters from 1 to 65535 - shows how the decimal character references look in one's browser
  • HTML 4.0 Character Entity References - shows how the named and decimal character references look in one's browser
  • FileFormat.Info - details of many Unicode characters, including the named, decimal and hexadecimal character reference, showing how it should look and for each, how it looks in one's browser
  • Alan Wood's Unicode Resources - comprehensive resource with character test pages for all Unicode ranges, as well as OS-specific Unicode support information and links to fonts and utilities.
  • CharacterPal - Free Mac OS X Dashboard Widget that displays key combinations for special characters.
  • A convertor that helps you find the right escape sequence to use - helps when you need to escape ASCII/Unicode characters that are special characters in wiki markup

Links to other help pages

Help contents
Meta · Wikinews · Wikipedia · Wikiquote · Wiktionary · Commons: · Wikidata · MediaWiki · Wikibooks · Wikisource · MediaWiki: Manual · Google
Versions of this help page (for other languages see further)
What links here on Meta or from Meta · Wikipedia · MediaWiki
Reading
Go · Search · Namespace · Page name · Section · Backlinks · Redirect · Category · Image page · Special pages · Printable version
Tracking changes
Recent changes (enhanced) | Related changes · Watching pages · Diff · Page history · Edit summary · User contributions · Minor edit · Patrolled edit
Logging in and preferences
Logging in · Preferences
Editing
Starting a new page · Advanced editing · Editing FAQ · Export · Import · Shortcuts · Edit conflict · Page size
Referencing
Links · URL · Interwiki linking · Footnotes
Style and formatting
Wikitext examples · CSS · Reference card · HTML in wikitext · Formula · Lists · Table · Sorting · Colors · Images and file uploads
Fixing mistakes
Show preview · Reverting edits
Advanced functioning
Expansion · Template · Advanced templates · Parser function · Parameter default · Magic words · System message · Substitution · Array · Calculation · Transclusion
Others
Special characters · Renaming (moving) a page · Preparing a page for translation · Talk page · Signatures · Sandbox · Legal issues for editors
Other languages: