Main Page: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
No edit summary
trim
 
(295 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
<templatestyles src="Template:Main Page/styles.css" />
'''Internationalization & Localization'''
<div class="MainPage__row MainPage__header">
<div class="MainPage__block MainPage__intro">
<h1><translate><!--T:16--> Meta-Wiki</translate></h1>
<div id="mf-intro" class="MainPage__block_contents">
<translate>
<!--T:8-->
'''Welcome to [[<tvar name="m-about">Special:MyLanguage/Meta:About</tvar>|Meta-Wiki]]''', the global community site for the [[<tvar name="wmf">Special:MyLanguage/Wikimedia Foundation</tvar>|Wikimedia Foundation's]] [[<tvar name="wm-projects">Special:MyLanguage/Wikimedia projects</tvar>|projects]] and related projects, from coordination and documentation to planning and analysis.


<!--T:9-->
Well, Whats this all About??? Folks, a few info for developers and testers who would be involved in developming and testing web applications for international market.
Other meta-focused wikis such as [[<tvar name=outreach-wiki>outreach:</tvar>|Wikimedia Outreach]] are specialized projects that have their roots in Meta-Wiki. Related discussions also take place on Wikimedia [[<tvar name="maillists">Special:MyLanguage/Mailing lists/Overview</tvar>|mailing lists]] (particularly '''[[<tvar name="wikimedia-l">Special:MyLanguage/Mailing lists#Wikimedia mailing list</tvar>|wikimedia-l]]''', with its low-traffic equivalent [[<tvar name="WikimediaAnnounce">Special:MyLanguage/Mailing lists#Wikimedia Announcements mailing list</tvar>|WikimediaAnnounce]]), [[<tvar name="irc">Special:MyLanguage/IRC</tvar>|IRC channels]] on Libera, individual wikis of [[<tvar name="affiliates">Special:MyLanguage/Wikimedia movement affiliates</tvar>|Wikimedia affiliates]], and other places.

</translate>
Hmm, What does it mean? i18n & l10n?
</div><!-- end of contents -->

</div><!-- end of block -->
Here is what I know. i18n refers to Internationalization where "i" would be the first letter in the word and "n" being the last one in between we have 18 letters making it i18n. Its just a easy way to remember it. Similarly l10n represents localization.
<div class="MainPage__block MainPage__menu">

<div class="MainPage__block_contents">
Here is something for you. We have compiled a good content.
<ul>

<li>[[Special:MyLanguage/Mission|<translate><!--T:1--> Mission</translate>]]</li>

<li>[[Special:MyLanguage/Wikimedia projects|<translate><!--T:2--> Projects</translate>]]<br> ([[Special:MyLanguage/Complete list of Wikimedia projects|<translate><!--T:3--> complete list</translate>]])</li>
Read On....
<li>[[Special:MyLanguage/Research:Index|<translate><!--T:4--> Research</translate>]]</li>

<li>[[Special:MyLanguage/Meta:Babylon|<translate><!--T:5--> Translation</translate>]]<br> ([[Translation requests|<translate><!--T:6--> requests</translate>]])</li>
'''Definitions:'''
<li>[[Special:MyLanguage/Vision|<translate><!--T:7--> Vision</translate>]]</li>

</ul>
'''What is Internationalization ?'''
</div><!-- end of contents -->
Internationalization is the practice of developing applications/products that are “Locale” independent. Meaning, all the language-specific, market-specific content resides outside the core application. In this process, the changes made to an existing application or the steps taken to develop a new application that can be localized and all the contents are presented in the format, which the end user is accustomed.
</div><!-- end of block -->
'''What is Localization?'''
</div><!-- end of row -->
Localization is the process of adapting the application/product a particular country or region. This includes translation but it’s not limited to this.
<div class="MainPage__row">
Translation is the process of adapting the meaning from one language to another. This is not literal word-to-word process, rather its translating the meaning behind a word in one language to the target language which should convey the same meaning.
<div class="MainPage__block MainPage__current_events">

<div class="MainPage__block_heading">[[File:Wikimedia Community Logo optimized.svg|20x20px|link=|alt=]] <!--
'''Development Guidelines'''
--><translate><!--T:13--> Current events</translate>
Many programs are not internationalized when first written. These programs may have started as prototypes, or perhaps they were not intended for international distribution. If you must internationalize an existing program, take the following steps:
</div><!-- end of heading -->

<div class="MainPage__block_contents">
Note: All the demo programs discussed are specific to Java and are available in the zip file attached @ the end of this document
{{ Template:Main Page/WM News
'''Checklist'''
| lang = <translate><!--T:15--> en</translate> | dir = {{dir|{{int:lang}}}}<!-- TO BE MIGRATED -->
Following list of things would help in dealing with developing an application for international market
}}

</div><!-- end of contents -->
'''''Identify Culturally Dependent Data'''''
</div><!-- end of block -->
Text messages are the most obvious form of data that varies with culture. However, other types of data may vary with region or language. The following list contains examples of culturally dependent data:
<div class="MainPage__block MainPage__requests">
 Messages
<div class="MainPage__block_heading">[[File:Wikimedia Community Logo optimized.svg|20x20px|link=|alt=]] <!--
 Labels on GUI components
--><translate><!--T:14--> Requests</translate>
 Online help
</div><!-- end of heading -->
 Sounds
<div class="MainPage__block_contents">
 Colors
<div class="floatright">{{Translate|Template:Main Page/Requests|+/&minus;}}</div>
 Graphics
{{Main Page/Requests}}
 Icons
</div><!-- end of contents -->
 Dates
</div><!-- end of block -->
 Times
</div><!-- end of row -->
 Numbers
<div class="MainPage__row">
 Currencies
<div class="MainPage__block MainPage__community_communication">
 Measurements
<div class="MainPage__block_heading">[[File:Wikimedia Community Logo optimized.svg|20x20px|link=|alt=]] <!--
 Phone numbers
--><translate><!--T:10--> Community and communication</translate>
 Honorifics and personal titles
</div><!-- end of heading -->
 Postal addresses
<div class="MainPage__block_contents">
 Page layouts
<div class="floatright">{{Translate|Template:Main Page/Community and communication|+/&minus;}}</div>
 Tooltips and alerts
{{Main Page/Community and communication}}

</div><!-- end of contents -->
'''''Isolate Translatable Text in Resource Bundles'''''
</div><!-- end of block -->
Translation is costly. You can help reduce costs by isolating the text that must be translated in ResourceBundle objects. Translatable text includes status messages, error messages, log file entries, and GUI component labels. This text is hard coded into programs that haven't been internationalized. You need to locate all occurrences of hard coded text that is displayed to end users. For example, you should clean up code like this:
<div class="MainPage__block MainPage__issues_collaboration">

<div class="MainPage__block_heading">[[File:Wikimedia Community Logo optimized.svg|20x20px|link=|alt=]] <!--
String buttonLabel = "OK";
--><translate><!--T:11--> Core issues and collaboration</translate>
...
</div><!-- end of heading -->
Button okButton = new Button(buttonLabel);
<div class="MainPage__block_contents">
See the section Isolating Locale-Specific Data for details.
<div class="floatright">{{Translate|Template:Main Page/Core issues and collaboration|+/&minus;}}</div>
4.1.3 Deal with Compound Messages
{{Main Page/Core issues and collaboration}}
Compound messages contain variable data. In the message "The disk contains 1100 files." the integer 1100 may vary. This message is difficult to translate because the position of the integer in the sentence is not the same in all languages. The following message is not translatable, because the order of the sentence elements is hard coded by concatenation:
</div><!-- end of contents -->

</div><!-- end of block -->
Whenever possible, you should avoid constructing compound messages, because they are difficult to translate. However, if your application requires compound messages, you can handle them with the techniques described in the section Messages.
</div><!-- end of row -->

<div class="MainPage__row">
''Format Numbers and Currencies''
<div class="MainPage__block MainPage__sister_projects">
If your application displays numbers and currencies, you must format them in a locale-independent manner. The following code is not yet internationalized, because it will not display the number correctly in all countries:
<div class="MainPage__block_heading">[[File:Wikimedia Community Logo optimized.svg|20x20px|link=|alt=]] <!--
You should replace the code that displays a number or currency, with a routine that formats them correctly. These are discussed in the section Numbers and Currencies.
--><translate><!--T:12--> Wikimedia Foundation, Meta-Wiki, and its sister projects</translate>

</div><!-- end of heading -->
''Format Dates and Times''
<div class="MainPage__block_contents">
Date and time formats differ with region and language. If your code contains statements like the following, you need to change it:
<div class="floatright">{{Translate|Template:Main Page/Wikimedia Foundation|+/&minus;}}</div><div>{{Main Page/Wikimedia Foundation}}</div>{{clear}}

<div class="floatright">{{Translate|Template:Main Page/Sisterprojects|+/&minus;}}</div><div>{{Main Page/Sisterprojects}}</div>
Date currentDate = new Date();
</div><!-- end of contents -->
TextField dateField;
</div><!-- end of block -->
...
</div><!-- end of row -->
String dateString = currentDate.toString();
<div class="MainPage__row" style="font-size: smaller;"><languages/></div>
dateField.setText(dateString);
__NOTOC__

__NOEDITSECTION__
If you use the date-formatting classes, your application can display dates and times correctly around the world. For examples and instructions, see the section Dates and Times.

''Use Unicode Character Properties''
The following code tries to verify that a character is a letter:
char ch;
...
if ((ch >= 'a' && ch <= 'z') ||
(ch >= 'A' && ch <= 'Z')) // WRONG!


Watch out for code like this, because it won't work with languages other than English. For example, the if statement misses the character ü in the German word Grün.
The Character comparison methods use the Unicode standard to identify character properties. Thus you should replace the previous code with the following:
char ch;
...
if (Character.isLetter(ch))

For more information on the Character comparison methods, see the section Checking Character Properties.

''Comparing Strings Properly''

When sorting text we often compare strings. If the text is displayed one shouldn’t use methods which do binary comparisons. Check out Comparing Strings section for more details.




'''''Setting the locale'''''

An internationalized program can display information differently throughout the world. For example, the program will display different messages in Paris, Tokyo, and New York. If the localization process has been fine-tuned, the program will display different messages in New York and London to account for the differences between American and British English. How does an internationalized program identify the appropriate language and region of its end users? Easy. It references a Locale object.

A Locale object is an identifier for a particular combination of language and region. If a class varies its behavior according to Locale, it is said to be locale-sensitive. For example, the NumberFormat class is locale-sensitive; the format of the number it returns depends on the Locale. Thus NumberFormat may return a number as 902 300 (France), or 902.300 (Germany), or 902,300 (United States). Locale objects are only identifiers. The real work, such as formatting and detecting word boundaries, is performed by the methods of the locale-sensitive classes.
The following sections explain how to work with Locale objects:

''Creating a locale''

To create a Locale object, you typically specify the language code and the country code

For example:

In java

Locale loc = new Locale(“en”,”US”);

The first argument would be the language code and the second would be the country code.

''Identifying available locales''

Objects which react based on the locale are termed as Locale Sensitive objects.
Locale is just an identifier. It’s the job of the locale sensitive object to perform or operate based on the locale that has been passed.

Typically not all locales are supported by locale-sensitive classes. For example in java, to find out what all locales a DateFormater class would support we may execute the below code snippet to determine the same.

import java.util.*;
import java.text.*;

public class Available {
static public void main(String[] args) {
Locale list[] = DateFormat.getAvailableLocales();
for (Locale aLocale : list) {
System.out.println(aLocale.toString());
}
}
}

''Locale Sensitive Services SPI (Specific to JAVA)''

This feature enables the plug-in of locale-dependent data and services. In this way, third parties are able to provide implementations of most locale-sensitive classes in the java.text and java.util packages.
The implementation of SPIs (Service Provider Interface) is based on abstract classes and Java interfaces that are implemented by the service provider. At runtime the Java class loading mechanism is used to dynamically locate and load classes that implement the SPI.
You can use the locale-sensitive services SPI to provide the following locale sensitive implementations:
 BreakIterator objects
 Collator objects
 Language code, Country code, and Variant name for the Locale class
 Time Zone names
 Currency symbols
 DateFormat objects
 DateFormatSymbol objects
 NumberFormat objects
 DecimalFormatSymbols objects
The corresponding SPIs are contained both in java.text.spi and in java.util.spi packages:
java.util.spi java.text.spi
• CurrencyNameProvider
• LocaleServiceProvider
• TimeZoneNameProvider • BreakIteratorProvider
• CollatorProvider
• DateFormatProvider
• DateFormatSymbolsProvider
• DecimalFormatSymbolsProvider
• NumberFormatProvider
For example, if you would like to provide a NumberFormat object for a new locale, you have to implement the java.text.spi.NumberFormatProvider class. You need to extend this class and implement its methods:
 getCurrencyInstance(Locale locale)
 getIntegerInstance(Locale locale)
 getNumberInstance(Locale locale)
 getPercentInstance(Locale locale)

Locale loc = new Locale("da", "DK");
NumberFormat nf = NumberFormatProvider.getNumberInstance(loc);
These methods first check whether the Java runtime environment supports the requested locale; if so, they use that support. Otherwise, the methods call the getAvailableLocales() methods of installed providers for the appropriate interface to find a provider that supports the requested locale.

'''''Isolating Locale-Specific data'''''

Locale-specific data must be tailored according to the conventions of the end user's language and region. The text displayed by a user interface is the most obvious example of locale-specific data. For example, an application with a Cancel button in the U.S. will have an Abbrechen button in Germany. In other countries this button will have other labels. Obviously you don't want to hardcode this button label. Wouldn't it be nice if you could automatically get the correct label for a given Locale? Fortunately you can, provided that you isolate the locale-specific objects
In java this is done by using ResourceBungle class

''The ResourceBundle Class''
A ResourceBundle object contains locale specific objects. When you need a locale-specific object, you fetch it from a ResourceBundle, which returns the object that matches the end user's Locale.
Conceptually each ResourceBundle is a set of related subclasses that share the same base name. The list that follows shows a set of related subclasses. ButtonLabel is the base name. The characters following the base name indicate the language code, country code, and variant of a Locale. ButtonLabel_en_GB, for example, matches the Locale specified by the language code for English (en) and the country code for Great Britain (GB).
ButtonLabel
ButtonLabel_de
ButtonLabel_en_GB
ButtonLabel_fr_CA_UNIX

To select the appropriate ResourceBundle, invoke the ResourceBundle.getBundle method. The following example selects the ButtonLabel ResourceBundle for the Locale that matches the French language, the country of Canada, and the UNIX platform.
Locale currentLocale = new Locale("fr", "CA", "UNIX");
ResourceBundle introLabels =
ResourceBundle.getBundle("ButtonLabel", currentLocale);

There are two sub classes for ResourceBundle class in Java. PropertyResourceBundle and ListResourceBundle.
A PropertyResourceBundle is backed by a properties file. A properties file is a plain-text file that contains translatable text.
If you need to store other types of objects, use a ListResourceBundle instead.

The ListResourceBundle class manages resources with a convenient list. Each ListResourceBundle is backed by a class file. You can store any locale-specific object in a ListResourceBundle. To add support for an additional Locale, you create another source file and compile it into a class file.

ResourceBundle objects contain an array of key-value pairs. You specify the key, which must be a String, when you want to retrieve the value from the ResourceBundle. The value is the locale-specific object. The keys in the following example are the OkKey and CancelKey strings:
class ButtonLabel_en extends ListResourceBundle {
// English version
public Object[][] getContents() {
return contents;
}
static final Object[][] contents = {
{"OkKey", "OK"},
{"CancelKey", "Cancel"},
};
}
To retrieve the OK String from the ResourceBundle, you would specify the appropriate key when invoking getString:
String okLabel = ButtonLabel.getString("OkKey");
A properties file contains key-value pairs. The key is on the left side of the equal sign, and the value is on the right. Each pair is on a separate line. The values may represent String objects only. The following example shows the contents of a properties file named ButtonLabel.properties:
OkKey = OK
CancelKey = Cancel

''Preparing to use ResourceBundle''
If your application has a user interface, it contains many locale-specific objects. To get started, you should go through your source code and look for objects that vary with Locale. Your list might include objects instantiated from the following classes:

 String
 Image
 Color
 AudioClip
You'll notice that this list doesn't contain objects representing numbers, dates, times, or currencies. The display format of these objects varies with Locale, but the objects themselves do not. For example, you format a Date according to Locale, but you use the same Date object regardless of Locale. Instead of isolating these objects in a ResourceBundle, you format them with special locale-sensitive formatting classes. You'll learn how to do this in the Dates and Times section of the Formatting lesson.
In general, the objects stored in a ResourceBundle are predefined and ship with the product. These objects are not modified while the program is running. For instance, you should store a Menu label in a ResourceBundle because it is locale-specific and will not change during the program session. However, you should not isolate in a ResourceBundle a String object the end user enters in a TextField. Data such as this String may vary from day to day. It is specific to the program session, not to the Locale in which the program runs.
Usually most of the objects you need to isolate in a ResourceBundle are String objects. However, not all String objects are locale-specific. For example, if a String is a protocol element used by interprocess communication, it doesn't need to be localized, because the end users never see it.
The decision whether to localize some String objects is not always clear. Log files are a good example. If a log file is written by one program and read by another, both programs are using the log file as a buffer for communication. Suppose that end users occasionally check the contents of this log file. Shouldn't the log file be localized? On the other hand, if end users rarely check the log file, the cost of translation may not be worthwhile. Your decision to localize this log file depends on a number of factors: program design, ease of use, cost of translation, and supportability.

''Backing a ResourceBundle with Properties Files''

''Create the Default Properties File''
A properties file is a simple text file. You can create and maintain a properties file with just about any text editor.
You should always create a default properties file. The name of this file begins with the base name of your ResourceBundle and ends with the .properties suffix. In the PropertiesDemo program the base name is LabelsBundle. Therefore the default properties file is called LabelsBundle.properties. This file contains the following lines:
# This is the default LabelsBundle.properties file
s1 = computer
s2 = disk
s3 = monitor
s4 = keyboard
Note that in the preceding file the comment lines begin with a pound sign (#). The other lines contain key-value pairs. The key is on the left side of the equal sign and the value is on the right. For instance, s2 is the key that corresponds to the value disk.
The key is arbitrary. We could have called s2 something else, like msg5 or diskID. Once defined, however, the key should not change because it is referenced in the source code. The values may be changed. In fact, when your localizers create new properties files to accommodate additional languages, they will translate the values into various languages.
4.3.3.2 Create Additional Properties Files as Needed
To support an additional Locale, your localizers will create a new properties file that contains the translated values. No changes to your source code are required, because your program references the keys, not the values.
For example, to add support for the German language, your localizers would translate the values in LabelsBundle.properties and place them in a file named LabelsBundle_de.properties. Notice that the name of this file, like that of the default file, begins with the base name LabelsBundle and ends with the .properties suffix. However, since this file is intended for a specific Locale, the base name is followed by the language code (de). The contents of LabelsBundle_de.properties are as follows:
# This is the LabelsBundle_de.properties file
s1 = Computer
s2 = Platte
s3 = Monitor
s4 = Tastatur
The PropertiesDemo sample program ships with three properties files:
LabelsBundle.properties
LabelsBundle_de.properties
LabelsBundle_fr.properties

''Specify the Locale''

The PropertiesDemo program creates the Locale objects as follows:
Locale[] supportedLocales = {
Locale.FRENCH,
Locale.GERMAN,
Locale.ENGLISH
};
These Locale objects should match the properties files created in the previous two steps. For example, the Locale.FRENCH object corresponds to the LabelsBundle_fr.properties file. The Locale.ENGLISH has no matching LabelsBundle_en.properties file, so the default file will be used.

''Create the ResourceBundle''

This step shows how the Locale, the properties files, and the ResourceBundle are related. To create the ResourceBundle, invoke the getBundlemethod, specifying the base name and Locale:
ResourceBundle labels =
ResourceBundle.getBundle("LabelsBundle", currentLocale);
The getBundle method first looks for a class file that matches the base name and the Locale. If it can't find a class file, it then checks for properties files. In the PropertiesDemo program we're backing the ResourceBundle with properties files instead of class files. When the getBundle method locates the correct properties file, it returns a PropertyResourceBundle object containing the key-value pairs from the properties file.

''Fetch the Localized Text''

To retrieve the translated value from the ResourceBundle, invoke the getString method as follows:
String value = labels.getString(key);
The String returned by getString corresponds to the key specified. The String is in the proper language, provided that a properties file exists for the specified Locale.

''Iterate through All the Keys''

This step is optional. When debugging your program, you might want to fetch values for all of the keys in a ResourceBundle. The getKeys method returns an Enumeration of all the keys in a ResourceBundle. You can iterate through the Enumeration and fetch each value with the getString method. The following lines of code, which are from the PropertiesDemo program, show how this is done:
ResourceBundle labels =
ResourceBundle.getBundle("LabelsBundle", currentLocale);
Enumeration bundleKeys = labels.getKeys();
while (bundleKeys.hasMoreElements()) {
String key = (String)bundleKeys.nextElement();
String value = labels.getString(key);
System.out.println("key = " + key + ", " +
"value = " + value);
}
Run the Demo Program
Running the PropertiesDemo program generates the following output. The first three lines show the values returned by getString for various Locale objects. The program displays the last four lines when iterating through the keys with the getKeys method.
Locale = fr, key = s2, value = Disque dur
Locale = de, key = s2, value = Platte
Locale = en, key = s2, value = disk

key = s4, value = Clavier
key = s3, value = Moniteur
key = s2, value = Disque dur
key = s1, value = Ordinateur

''Using a ListResourceBundle''

''Create the ListResourceBundle Subclasses''

A ListResourceBundle is backed up by a class file. Therefore the first step is to create a class file for every supported Locale. In the ListDemo program the base name of the ListResourceBundle is StatsBundle. Since ListDemo supports three Locale objects, it requires the following three class files:
StatsBundle_en_CA.class
StatsBundle_fr_FR.class
StatsBundle_ja_JP.class
The StatsBundle class for Japan is defined in the source code that follows. Note that the class name is constructed by appending the language and country codes to the base name of the ListResourceBundle. Inside the class the two-dimensional contents array is initialized with the key-value pairs. The keys are the first element in each pair: GDP, Population, and Literacy. The keys must be String objects and they must be the same in every class in the StatsBundle set. The values can be any type of object. In this example the values are two Integer objects and a Double object.
import java.util.*;
public class StatsBundle_ja_JP extends ListResourceBundle {
public Object[][] getContents() {
return contents;
}
private Object[][] contents = {
{ "GDP", new Integer(21300) },
{ "Population", new Integer(125449703) },
{ "Literacy", new Double(0.99) },
};
}
''Specify the Locale''

The ListDemo program defines the Locale objects as follows:
Locale[] supportedLocales = {
new Locale("en", "CA"),
new Locale("ja", "JP"),
new Locale("fr", "FR")
};
Each Locale object corresponds to one of the StatsBundle classes. For example, the Japanese Locale, which was defined with the ja and JP codes, matches StatsBundle_ja_JP.class.
''Create the ResourceBundle''
To create the ListResourceBundle, invoke the getBundle method. The following line of code specifies the base name of the class (StatsBundle) and the Locale:
ResourceBundle stats =
ResourceBundle.getBundle("StatsBundle", currentLocale);
The getBundle method searches for a class whose name begins with StatsBundle and is followed by the language and country codes of the specified Locale. If the currentLocale is created with the ja and JP codes, getBundle returns a ListResourceBundle corresponding to the class StatsBundle_ja_JP, for example.
''Fetch the Localized Objects''
Now that the program has a ListResourceBundle for the appropriate Locale, it can fetch the localized objects by their keys. The following line of code retrieves the literacy rate by invoking getObject with the Literacy key parameter. Since getObject returns an object, cast it to a Double:
Double lit = (Double)stats.getObject("Literacy");

''Run the Demo Program''
ListDemo program prints the data it fetched with the getBundle method:
Locale = en_CA
GDP = 24400
Population = 28802671
Literacy = 0.97

Locale = ja_JP
GDP = 21300
Population = 125449703
Literacy = 0.99

Locale = fr_FR
GDP = 20200
Population = 58317450
Literacy = 0.99

'''''Formatting'''''

This section explains how to format numbers, currencies, dates, times, and text messages. Because end users can see these data elements, their format must conform to various cultural conventions. Following the examples in this lesson will teach you how to:
 Format data elements in a locale-sensitive manner
 Keep your code locale-independent
 Avoid the need to write formatting routines for specific locales

''Numbers & Currencies''
Programs store and operate on numbers in a locale-independent way. Before displaying or printing a number, a program must convert it to a String that is in a locale-sensitive format. For example, in France the number 123456.78 should be formatted as 123 456,78, and in Germany it should appear as 123.456,78. In this section, you will learn how to make your programs independent of the locale conventions for decimal points, thousands-separators, and other formatting properties.
''Using Predefined Formats (Specific to Java)''
Bye invoking the methods provided by NumberFormat class one can format the numbers and currencies according to the locale.

''Number''
You can use the NumberFormat methods to format primitive-type numbers, such as double, and their corresponding wrapper objects, such as Double.
The following code example formats a Double according to Locale. Invoking the getNumberInstance method returns a locale-specific instance of NumberFormat. The format method accepts the Double as an argument and returns the formatted number in a String.
Double amount = new Double(345987.246);
NumberFormat numberFormatter;
String amountOut;

numberFormatter = NumberFormat.getNumberInstance(currentLocale);
amountOut = numberFormatter.format(amount);
System.out.println(amountOut + " " +
currentLocale.toString());
The output from this example shows how the format of the same number varies with Locale:
345 987,246 fr_FR
345.987,246 de_DE
345,987.246 en_US

''Currencies''
If you're writing business applications, you'll probably need to format and to display currencies. You format currencies in the same manner as numbers, except that you call getCurrencyInstance to create a formatter. When you invoke the format method, it returns a String that includes the formatted number and the appropriate currency sign.
This code example shows how to format currency in a locale-specific manner:
Double currency = new Double(9876543.21);
NumberFormat currencyFormatter;
String currencyOut;

currencyFormatter = NumberFormat.getCurrencyInstance(currentLocale);
currencyOut = currencyFormatter.format(currency);
System.out.println(currencyOut + " " +
currentLocale.toString());
The output generated by the preceding lines of code is as follows:
9 876 543,21 F fr_FR
9.876.543,21 DM de_DE
$9,876,543.21 en_US
At first glance this output may look wrong to you, because the numeric values are all the same. Of course, 9 876 543,21 F is not equivalent to 9.876.543,21 DM. However, bear in mind that the NumberFormat class is unaware of exchange rates. The methods belonging to the NumberFormat class format currencies but do not convert them.

''Dates & Times''

Date objects represent dates and times. You cannot display or print a Date object without first converting it to a String that is in the proper format. Just what is the "proper" format? First, the format should conform to the conventions of the end user's Locale. For example, Germans recognize 20.4.98 as a valid date, but Americans expect that same date to appear as 4/20/98. Second, the format should include the necessary information. For instance, a program that measures network performance may report on elapsed milliseconds. An online appointment calendar probably won't display milliseconds, but it will show the days of the week.
This section explains how to format dates and times in various ways and in a locale-sensitive manner. If you follow these techniques your programs will display dates and times in the appropriate Locale, but your source code will remain independent of any specific Locale.

''Using Predefined Format (Specific to java)''

The DateFormat class allows you to format the dates & times to any specific locale.

''Dates''
Formatting dates with the DateFormat class is a two-step process. First, you create a formatter with the getDateInstance method. Second, you invoke the format method, which returns a String containing the formatted date. The following example formats today's date by calling these two methods:
Date today;
String dateOut;
DateFormat dateFormatter;

dateFormatter = DateFormat.getDateInstance(DateFormat.DEFAULT,
currentLocale);
today = new Date();
dateOut = dateFormatter.format(today);

System.out.println(dateOut + " " + currentLocale.toString());
The output generated by this code follows. Notice that the formats of the dates vary with Locale. Since DateFormat is locale-sensitive, it takes care of the formatting details for each Locale.
9 avr 98 fr_FR
9.4.1998 de_DE
09-Apr-98 en_US
The preceding code example specified the DEFAULT formatting style. The DEFAULT style is just one of the predefined formatting styles that the DateFormat class provides, as follows:
• DEFAULT
• SHORT
• MEDIUM
• LONG
• FULL
The following table shows how dates are formatted for each style with the U.S. and French locales:
Sample Date Formats
Style U.S. Locale French Locale
DEFAULT 10-Apr-98 10 avr 98
SHORT 4/10/98 10/04/98
MEDIUM 10-Apr-98 10 avr 98
LONG April 10, 1998 10 avril 1998
FULL Friday, April 10, 1998 vendredi, 10 avril 1998

''Times''

Date objects represent both dates and times. Formatting times with the DateFormat class is similar to formatting dates, except that you create the formatter with the getTimeInstance method, as follows:
DateFormat timeFormatter =
DateFormat.getTimeInstance(DateFormat.DEFAULT,
currentLocale);
The table that follows shows the various predefined format styles for the U.S. and German locales:
Sample Time Formats
Style U.S. Locale German Locale
DEFAULT 3:58:45 PM 15:58:45
SHORT 3:58 PM 15:58
MEDIUM 3:58:45 PM 15:58:45
LONG 3:58:45 PM PDT 15:58:45 GMT+02:00
FULL 3:58:45 oclock PM PDT 15.58 Uhr GMT+02:00

''Both Dates and Times''

To display a date and time in the same String, create the formatter with the getDateTimeInstance method. The first parameter is the date style, and the second is the time style. The third parameter is the Locale . Here's a quick example:
DateFormat formatter =
DateFormat.getDateTimeInstance(DateFormat.LONG,
DateFormat.LONG,
currentLocale);
The following table shows the date and time formatting styles for the U.S. and French locales:
Sample Date and Time Formats
Style U.S. Locale French Locale
DEFAULT 25-Jun-98 1:32:19 PM 25 jun 98 22:32:20
SHORT 6/25/98 1:32 PM 25/06/98 22:32
MEDIUM 25-Jun-98 1:32:19 PM 25 jun 98 22:32:20
LONG June 25, 1998 1:32:19 PM PDT 25 juin 1998 22:32:20 GMT+02:00
FULL Thursday, June 25, 1998 1:32:19 o'clock PM PDT jeudi, 25 juin 1998 22 h 32 GMT+02:00


''Messages''

We all like to use programs that let us know what's going on. Programs that keep us informed often do so by displaying status and error messages. Of course, these messages need to be translated so they can be understood by end users around the world. The section Isolating Locale-Specific Data discusses translatable text messages. Usually, you're done after you move a message String into a ResourceBundle. However, if you've embedded variable data in a message, you'll have to take some extra steps to prepare it for translation.
A compound message contains variable data. In the following list of compound messages, the variable data is underlined:
The disk named MyDisk contains 300 files.
The current balance of account #34-98-222 is $2,745.72.
405,390 people have visited your website since January 1, 1998.
Delete all files older than 120 days.
You might be tempted to construct the last message in the preceding list by concatenating phrases and variables as follows: double numDays;
ResourceBundle msgBundle;
...
String message = msgBundle.getString("deleteolder"
+ numDays.toString()
+ msgBundle.getString("days"));
This approach works fine in English, but it won't work for languages in which the verb appears at the end of the sentence. Because the word order of this message is hardcoded, your localizers won't be able to create grammatically correct translations for all languages.
How can you make your program localizable if you need to use compound messages? You can do so by using the MessageFormat class, which is the topic of this section.
________________________________________
Caution: Compound messages are difficult to translate because the message text is fragmented. If you use compound messages, localization will take longer and cost more. Therefore you should use compound messages only when necessary.
________________________________________

''Dealing with Compound Messages''

A compound message may contain several kinds of variables: dates, times, strings, numbers, currencies, and percentages. To format a compound message in a locale-independent manner, you construct a pattern that you apply to a MessageFormat object, and store this pattern in a ResourceBundle.
The following section walks through an example program (Specific to Java)
1. Identify the Variables in the Message
Suppose that you want to internationalize the following message:
Notice that we've underlined the variable data and have identified what kind of objects will represent this data.
2. Isolate the Message Pattern in a ResourceBundle
tore the message in a ResourceBundle named MessageBundle, as follows:
ResourceBundle messages =
ResourceBundle.getBundle("MessageBundle", currentLocale);
This ResourceBundle is backed by a properties file for each Locale. Since the ResourceBundle is called MessageBundle, the properties file for U.S. English is named MessageBundle_en_US.properties. The contents of this file is as follows:
template = At {2,time,short} on {2,date,long}, we detected \
{1,number,integer} spaceships on the planet {0}.
planet = Mars
The first line of the properties file contains the message pattern. If you compare this pattern with the message text shown in step 1, you'll see that an argument enclosed in braces replaces each variable in the message text. Each argument starts with a digit called the argument number, which matches the index of an element in an Object array that holds the argument values. Note that in the pattern the argument numbers are not in any particular order. You can place the arguments anywhere in the pattern. The only requirement is that the argument number have a matching element in the array of argument values.
The next step describes the argument value array, but first let's look at each of the arguments in the pattern. The following table provides some details about the arguments:
Arguments for template in MessageBundle_en_US.properties
Argument Description
{2,time,short} The time portion of a Date object. The short style specifies the DateFormat.SHORT formatting style.
{2,date,long} The date portion of a Date object. The same Date object is used for both the date and time variables. In the Object array of arguments the index of the element holding the Date object is 2. (This is described in the next step.)
{1,number,integer} A Number object, further qualified with the integer number style.
{0} The String in the ResourceBundle that corresponds to the planet key.
3. Set the Message Arguments
The following lines of code assign values to each argument in the pattern. The indexes of the elements in the messageArguments array match the argument numbers in the pattern. For example, the Integer element at index 1 corresponds to the {1,number,integer} argument in the pattern. Because it must be translated, the String object at element 0 will be fetched from the ResourceBundle with the getString method. Here is the code that defines the array of message arguments:

Object[] messageArguments = {
messages.getString("planet"),
new Integer(7),
new Date()
};
4. Create the Formatter
Next, create a MessageFormat object. You set the Locale because the message contains Date and Number objects, which should be formatted in a locale-sensitive manner.

MessageFormat formatter = new MessageFormat("");
formatter.setLocale(currentLocale);
5. Format the Message Using the Pattern and the Arguments
This step shows how the pattern, message arguments, and formatter all work together. First, fetch the pattern String from the ResourceBundle with the getString method. The key to the pattern is template. Pass the pattern String to the formatter with the applyPattern method. Then format the message using the array of message arguments, by invoking the format method. The String returned by the format method is ready to be displayed. All of this is accomplished with just two lines of code:

formatter.applyPattern(messages.getString("template"));
String output = formatter.format(messageArguments);
6. Run the Demo Program
The demo program prints the translated messages for the English and German locales and properly formats the date and time variables. Note that the English and German verbs ("detected" and "entdeckt") are in different locations relative to the variables:
currentLocale = en_US
At 1:15 PM on April 13, 1998, we detected 7 spaceships
on the planet Mars.
currentLocale = de_DE
Um 13.15 Uhr am 13. April 1998 haben wir 7 Raumschiffe
auf dem Planeten Mars entdeckt.

''Handling Plurals''

The words in a message may vary if both plural and singular word forms are possible. With the ChoiceFormat class, you can map a number to a word or a phrase, allowing you to construct grammatically correct messages.
In English the plural and singular forms of a word are usually different. This can present a problem when you are constructing messages that refer to quantities. For example, if your message reports the number of files on a disk, the following variations are possible:

There are no files on XDisk.
There is one file on XDisk.
There are 2 files on XDisk.

The fastest way to solve this problem is to create a MessageFormat pattern like this:
There are {0,number} file(s) on {1}.
Unfortunately the preceding pattern results in incorrect grammar:
There are 1 file(s) on XDisk.
1. Define the Message Pattern
First, identify the variables in the message:
Next, replace the variables in the message with arguments, creating a pattern that can be applied to a MessageFormat object:
There {0} on {1}.
The argument for the disk name, which is represented by{1}, is easy enough to deal with. You just treat it like any other String variable in a MessageFormat pattern. This argument matches the element at index 1 in the array of argument values.
Dealing with argument{0} is more complex, for a couple of reasons:
• The phrase that this argument replaces varies with the number of files. To construct this phrase at run time, you need to map the number of files to a particular String. For example, the number 1 will map to the String containing the phrase is one file. The ChoiceFormat class allows you to perform the necessary mapping.
• If the disk contains multiple files, the phrase includes an integer. The MessageFormat class lets you insert a number into a phrase.
2. Create a ResourceBundle
Because the message text must be translated, isolate it in a ResourceBundle:
ResourceBundle bundle =
ResourceBundle.getBundle("ChoiceBundle", currentLocale);
The sample program backs the ResourceBundle with properties files.
The ChoiceBundle_en_US.properties contains
pattern = There {0} on {1}.
noFiles = are no files
oneFile = is one file
multipleFiles = are {2} files
The contents of this properties file show how the message will be constructed and formatted. The first line contains the pattern for MessageFormat .
The other lines contain phrases that will replace argument {0} in the pattern. The phrase for the multipleFiles key contains the argument {2}, which will be replaced by a number.
Here is the French version of the properties file, ChoiceBundle_fr_FR.properties contains the following

pattern = Il {0} sur {1}.
noFiles = n'y a pas de fichiers
oneFile = y a un fichier
multipleFiles = y a {2} fichiers

3. Create a Message Formatter
In this step you instantiate MessageFormat and set its Locale:
MessageFormat messageForm = new MessageFormat("");
messageForm.setLocale(currentLocale);

4. Create a Choice Formatter
The ChoiceFormat object allows you to choose, based on a double number, a particular String. The range of double numbers, and the String objects to which they map, are specified in arrays:
double[] fileLimits = {0,1,2};
String [] fileStrings = {
bundle.getString("noFiles"),
bundle.getString("oneFile"),
bundle.getString("multipleFiles")
};
ChoiceFormat maps each element in the double array to the element in the String array that has the same index. In the sample code the 0 maps to the String returned by calling bundle.getString("noFiles"). By coincidence the index is the same as the value in the fileLimits array. If the code had set fileLimits[0] to seven, ChoiceFormat would map the number 7 to fileStrings[0].
You specify the double and String arrays when instantiating ChoiceFormat:
ChoiceFormat choiceForm = new ChoiceFormat(fileLimits,
fileStrings);

5. Apply the Pattern
Remember the pattern you constructed in step 1? It's time to retrieve the pattern from the ResourceBundle and apply it to the MessageFormat object:
String pattern = bundle.getString("pattern");
messageForm.applyPattern(pattern);

6. Assign the Formats
In this step you assign to the MessageFormat object the ChoiceFormat object created in step 4:
Format[] formats = {choiceForm, null,
NumberFormat.getInstance()};
messageForm.setFormats(formats);
The setFormats method assigns Format objects to the arguments in the message pattern. You must invoke the applyPattern method before you call the setFormats method. The following table shows how the elements of the Format array correspond to the arguments in the message pattern:
The Format Array of the ChoiceFormatDemo Program
Array Element Pattern Argument
choiceForm {0}
null {1}
NumberFormat.getInstance() {2}

7. Set the Arguments and Format the Message
At run time the program assigns the variables to the array of arguments it passes to the MessageFormat object. The elements in the array correspond to the arguments in the pattern. For example, messageArgument[1] maps to pattern argument {1}, which is a String containing the name of the disk. In the previous step the program assigned a ChoiceFormat object to argument {0} of the pattern. Therefore the number assigned to messageArgument[0] determines which String the ChoiceFormat object selects. If messageArgument[0] is greater than or equal to 2, the String containing the phrase are {2} files replaces argument {0} in the pattern. The number assigned to messageArgument[2] will be substituted in place of pattern argument {2}. Here's the code that tries this out:
Object[] messageArguments = {null, "XDisk", null};
for (int numFiles = 0; numFiles < 4; numFiles++) {
messageArguments[0] = new Integer(numFiles);
messageArguments[2] = new Integer(numFiles);
String result = messageForm.format(messageArguments);
System.out.println(result);
}
8. Run the Demo Program
Compare the messages displayed by the program with the phrases in the ResourceBundle of step 2. Notice that the ChoiceFormat object selects the correct phrase, which the MessageFormat object uses to construct the proper message. The output of the ChoiceFormatDemo program is as follows:
currentLocale = en_US
There are no files on XDisk.
There is one file on XDisk.
There are 2 files on XDisk.
There are 3 files on XDisk.

currentLocale = fr_FR
Il n'y a pas des fichiers sur XDisk.
Il y a un fichier sur XDisk.
Il y a 2 fichiers sur XDisk.
Il y a 3 fichiers sur XDisk.




'''''Working With Text'''''

Nearly all programs with user interfaces manipulate text. In an international market the text your programs display must conform to the rules of languages from around the world.

''Checking Character Properties''

You can categorize characters according to their properties. For instance, X is an uppercase letter and 4 is a decimal digit. Checking character properties is a common way to verify the data entered by end users. If you are selling books online, for example, your order entry screen should verify that the characters in the quantity field are all digits.
Developers who aren't used to writing global software might determine a character's properties by comparing it with character constants. For instance, they might write code like this:
char ch;
...

// This code is WRONG!

if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z'))
// ch is a letter
...
if (ch >= '0' && ch <= '9')
// ch is a digit
...
if ((ch == ' ') || (ch =='\n') || (ch == '\t'))
// ch is a whitespace
The preceding code is wrong because it works only with English and a few other languages. To internationalize the previous example, replace it with the following statements (Specific to Java):

char ch;
...

// This code is OK!

if (Character.isLetter(ch))
...
if (Character.isDigit(ch))
...
if (Character.isSpaceChar(ch))

The Character methods rely on the Unicode Standard for determining the properties of a character. Unicode is a 16-bit character encoding that supports the world's major languages. In the Java programming language char values represent Unicode characters. If you check the properties of a char with the appropriate Character method, your code will work with all major languages. For example, the Character.isLetter method returns true if the character is a letter in Chinese, German, Arabic, or another language.

''Comparing Strings''

Applications that sort through text perform frequent string comparisons. For example, a report generator performs string comparisons when sorting a list of strings in alphabetical order.
If your application audience is limited to people who speak English, you can probably perform string comparisons with the String.compareTo method. The String.compareTo method performs a binary comparison of the Unicode characters within the two strings. For most languages, however, this binary comparison cannot be relied on to sort strings, because the Unicode values do not correspond to the relative order of the characters.

''Performing Locale Independent Comparisons(Specific to Java)''

Collation rules define the sort sequence of strings. These rules vary with locale, because various natural languages sort words differently. You can use the predefined collation rules provided by the Collator class to sort strings in a locale-independent manner.
To instantiate the Collator class invoke the getInstance method. Usually, you create a Collator for the default Locale, as in the following example:
Collator myDefaultCollator = Collator.getInstance();
You can also specify a particular Locale when you create a Collator, as follows:
Collator myFrenchCollator = Collator.getInstance(Locale.FRENCH);
The getInstance method returns a RuleBasedCollator, which is a concrete subclass of Collator. The RuleBasedCollator contains a set of rules that determine the sort order of strings for the locale you specify. These rules are predefined for each locale. Because the rules are encapsulated within the RuleBasedCollator, your program won't need special routines to deal with the way collation rules vary with language.
You invoke the Collator.compare method to perform a locale-independent string comparison. The compare method returns an integer less than, equal to, or greater than zero when the first string argument is less than, equal to, or greater than the second string argument. The following table contains some sample calls to Collator.compare:
Collator.compare Examples
Example Return Value Explanation
myCollator.compare("abc", "def") -1 "abc" is less than "def"
myCollator.compare("rtf", "rtf") 0 the two strings are equal
myCollator.compare("xyz", "abc") 1 "xyz" is greater than "abc"
You use the compare method when performing sort operations.
This following program shows what can happen when you sort the same list of words with two different collators:
Collator fr_FRCollator = Collator.getInstance(new Locale("fr","FR"));

Collator en_USCollator = Collator.getInstance(new Locale("en","US"));
The method for sorting, called sortStrings, can be used with any Collator. Notice that the sortStrings method invokes the compare method:
public static void sortStrings(Collator collator,
String[] words) {
String tmp;
for (int i = 0; i < words.length; i++) {
for (int j = i + 1; j < words.length; j++) {
if (collator.compare(words[i], words[j]) > 0) {
tmp = words[i];
words[i] = words[j];
words[j] = tmp;
}
}
}
}
The English Collator sorts the words as follows:
peach
péché
pêche
sin
According to the collation rules of the French language, the preceding list is in the wrong order. In French péché should follow pêche in a sorted list. The French Collator sorts the array of words correctly, as follows:
peach
pêche
péché
sin


''Detecting Text Boundaries''

Applications that manipulate text need to locate boundaries within the text. For example, consider some of the common functions of a word processor: highlighting a character, cutting a word, moving the cursor to the next sentence, and wrapping a word at a line ending. To perform each of these functions, the word processor must be able to detect the logical boundaries in the text. Fortunately you don't have to write your own routines to perform boundary analysis. Instead, you can take advantage of the methods provided by the BreakIterator class.

''About the BreakIterator Class (Java only)''

The BreakIterator class is locale-sensitive, because text boundaries vary with language. For example, the syntax rules for line breaks are not the same for all languages. To determine which locales the BreakIterator class supports, invoke the getAvailableLocales method, as follows:
Locale[] locales = BreakIterator.getAvailableLocales();
You can analyze four kinds of boundaries with the BreakIterator class: character, word, sentence, and potential line break. When instantiating a BreakIterator, you invoke the appropriate factory method:
o getCharacterInstance
o getWordInstance
o getSentenceInstance
o getLineInstance
Each instance of BreakIterator can detect just one type of boundary. If you want to locate both character and word boundaries, for example, you create two separate instances.
A BreakIterator has an imaginary cursor that points to the current boundary in a string of text. You can move this cursor within the text with the previous and the next methods. For example, if you've created a BreakIterator with getWordInstance, the cursor moves to the next word boundary in the text every time you invoke the next method. The cursor-movement methods return an integer indicating the position of the boundary. This position is the index of the character in the text string that would follow the boundary. Like string indexes, the boundaries are zero-based. The first boundary is at 0, and the last boundary is the length of the string. The following figure shows the word boundaries detected by the next and previous methods in a line of text:


This figure has been reduced to fit on the page.
Click the image to view it at its natural size.
You should use the BreakIterator class only with natural-language text. To tokenize a programming language, use the StreamTokenizer class.
The sections that follow give examples for each type of boundary analysis.

''Character Boundaries''

You need to locate character boundaries if your application allows the end user to highlight individual characters or to move a cursor through text one character at a time. To create a BreakIterator that locates character boundaries, you invoke the getCharacterInstance method, as follows:
BreakIterator characterIterator =
BreakIterator.getCharacterInstance(currentLocale);
This type of BreakIterator detects boundaries between user characters, not just Unicode characters.
A user character may be composed of more than one Unicode character. For example, the user character ü can be composed by combining the Unicode characters \u0075 (u) and \u00a8 (¨). This isn't the best example, however, because the character ü may also be represented by the single Unicode character \u00fc. We'll draw on the Arabic language for a more realistic example.
In Arabic the word for house is:
This word contains three user characters, but it is composed of the following six Unicode characters:
String house = "\u0628" + "\u064e" + "\u064a" +
"\u0652" + "\u067a" + "\u064f";
The Unicode characters at positions 1, 3, and 5 in the house string are diacritics. Arabic requires diacritics because they can alter the meanings of words. The diacritics in the example are nonspacing characters, since they appear above the base characters. In an Arabic word processor you cannot move the cursor on the screen once for every Unicode character in the string. Instead you must move it once for every user character, which may be composed by more than one Unicode character. Therefore you must use a BreakIterator to scan the user characters in the string.
The program passes this BreakIterator, along with the String object created previously, to a method named listPositions:
BreakIterator arCharIterator =
BreakIterator.getCharacterInstance(new Locale ("ar","SA"));

listPositions (house, arCharIterator);
The listPositions method uses a BreakIterator to locate the character boundaries in the string. Note that the BreakIteratorDemo assigns a particular string to the BreakIterator with the setText method. The program retrieves the first character boundary with the first method and then invokes the next method until the constant BreakIterator.DONE is returned. The code for this routine is as follows:
static void listPositions(String target, BreakIterator iterator) {
iterator.setText(target);
int boundary = iterator.first();

while (boundary != BreakIterator.DONE) {
System.out.println (boundary);
boundary = iterator.next();
}
}
The listPositions method prints out the following boundary positions for the user characters in the string house. Note that the positions of the diacritics (1, 3, 5) are not listed:
0
2
4
6

''Word Boundaries''

You invoke the getWordIterator method to instantiate a BreakIterator that detects word boundaries:
BreakIterator wordIterator =
BreakIterator.getWordInstance(currentLocale);
You'll want to create such a BreakIterator when your application needs to perform operations on individual words. These operations might be common word- processing functions, such as selecting, cutting, pasting, and copying. Or, your application may search for words, and it must be able to distinguish entire words from simple strings.
When a BreakIterator analyzes word boundaries, it differentiates between words and characters that are not part of words. These characters, which include spaces, tabs, punctuation marks, and most symbols, have word boundaries on both sides.
The program creates the BreakIterator and then calls the markBoundaries method:
Locale currentLocale = new Locale ("en","US");

BreakIterator wordIterator =
BreakIterator.getWordInstance(currentLocale);

String someText = "She stopped. " +
"She said, \"Hello there,\" and then went on.";

markBoundaries(someText, wordIterator);
The markBoundaries method is defined in BreakIteratorDemo.java. This method marks boundaries by printing carets (^) beneath the target string. In the code that follows, notice the while loop where markBoundaries scans the string by calling the next method:
static void markBoundaries(String target, BreakIterator iterator) {

StringBuffer markers = new StringBuffer();
markers.setLength(target.length() + 1);
for (int k = 0; k < markers.length(); k++) {
markers.setCharAt(k,' ');
}

iterator.setText(target);
int boundary = iterator.first();

while (boundary != BreakIterator.DONE) {
markers.setCharAt(boundary,'^');
boundary = iterator.next();
}

System.out.println(target);
System.out.println(markers);
}
The output of the markBoundaries method follows. Note where the carets (^) occur in relation to the punctuation marks and spaces:
She stopped. She said, "Hello there," and then went on.
^ ^^ ^^ ^ ^^ ^^^^ ^^ ^^^^ ^^ ^^ ^^ ^^
The BreakIterator class makes it easy to select words from within text. You don't have to write your own routines to handle the punctuation rules of various languages; the BreakIterator class does this for you.
The extractWords method in the following example extracts and prints words for a given string. Note that this method uses Character.isLetterOrDigit to avoid printing "words" that contain space characters.
static void extractWords(String target, BreakIterator wordIterator) {

wordIterator.setText(target);
int start = wordIterator.first();
int end = wordIterator.next();

while (end != BreakIterator.DONE) {
String word = target.substring(start,end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
The BreakIteratorDemo program invokes extractWords, passing it the same target string used in the previous example. The extractWords method prints out the following list of words:
She
stopped
She
said
Hello
there
and
then
went
on

''Sentence Boundaries''

You can use a BreakIterator to determine sentence boundaries. You start by creating a BreakIterator with the getSentenceInstance method:
BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(currentLocale);
To show the sentence boundaries, the program uses the markBoundaries method, which is discussed in the section Word Boundaries. The markBoundaries method prints carets (^) beneath a string to indicate boundary positions. Here are some examples:
She stopped. She said, "Hello there," and then went on.
^ ^ ^

He's vanished! What will we do? It's up to us.
^ ^ ^ ^

Please add 1.5 liters to the tank.
^ ^
''Line Boundaries''

Applications that format text or that perform line wrapping must locate potential line breaks. You can find these line breaks, or boundaries, with a BreakIterator that has been created with the getLineInstance method:
BreakIterator lineIterator =
BreakIterator.getLineInstance(currentLocale);
This BreakIterator determines the positions in a string where text can break to continue on the next line. The positions detected by the BreakIterator are potential line breaks. The actual line breaks displayed on the screen may not be the same.
The two examples that follow use the markBoundaries method of BreakIteratorDemo.java to show the line boundaries detected by a BreakIterator. The markBoundaries method indicates line boundaries by printing carets (^) beneath the target string.
According to a BreakIterator, a line boundary occurs after the termination of a sequence of whitespace characters (space, tab, new line). In the following example, note that you can break the line at any of the boundaries detected:
She stopped. She said, "Hello there," and then went on.
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
Potential line breaks also occur immediately after a hyphen:
There are twenty-four hours in a day.
^ ^ ^ ^ ^ ^ ^ ^ ^
The next example breaks a long string of text into fixed-length lines with a method called formatLines. This method uses a BreakIterator to locate the potential line breaks. The formatLines method is short, simple, and, thanks to the BreakIterator, locale-independent. Here is the source code:
static void formatLines(String target, int maxLength,
Locale currentLocale) {

BreakIterator boundary =
BreakIterator.getLineInstance(currentLocale);
boundary.setText(target);
int start = boundary.first();
int end = boundary.next();
int lineLength = 0;

while (end != BreakIterator.DONE) {
String word = target.substring(start,end);
lineLength = lineLength + word.length();
if (lineLength >= maxLength) {
System.out.println();
lineLength = word.length();
}
System.out.print(word);
start = end;
end = boundary.next();
}
}
The BreakIteratorDemo program invokes the formatLines method as follows:
String moreText = "She said, \"Hello there,\" and then " +
"went on down the street. When she stopped " +
"to look at the fur coats in a shop window, " +
"her dog growled._ \"Sorry Jake,\" she said. " +
" \"I didn't know you would take it personally.\"";

formatLines(moreText, 30, currentLocale);
The output from this call to formatLines is:
She said, "Hello there," and
then went on down the
street. When she stopped to
look at the fur coats in a
shop window, her dog
growled. "Sorry Jake," she
said. "I didn't know you
would take it personally."

''Converting Non-Unicode Text (Specific to Java)''

In the Java programming language char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages.
Few text editors currently support Unicode text entry. The text editor we used to write this section's code examples supports only ASCII characters, which are limited to 7 bits. To indicate Unicode characters that cannot be represented in ASCII, such as ö, we used the \uXXXX escape sequence. Each X in the escape sequence is a hexadecimal digit. The following example shows how to indicate the ö character with an escape sequence:
String str = "\u00F6";
char c = '\u00F6';
Character letter = new Character('\u00F6');
A variety of character encodings are used by systems around the world. Currently few of these encodings conform to Unicode. Because your program expects characters in Unicode, the text data it gets from the system must be converted into Unicode, and vice versa. Data in text files is automatically converted to Unicode when its encoding matches the default file encoding of the Java Virtual Machine. You can identify the default file encoding by creating an OutputStreamWriter using it and asking for its canonical name:
OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
System.out.println(out.getEncoding());
If the default file encoding differs from the encoding of the text data you want to process, then you must perform the conversion yourself. You might need to do this when processing text from another country or computing platform.

''Byte Encodings and Strings''

If a byte array contains non-Unicode text, you can convert the text to Unicode with one of the String constructor methods. Conversely, you can convert a String object into a byte array of non-Unicode characters with the String.getBytes method. When invoking either of these methods, you specify the encoding identifier as one of the parameters.
The example that follows converts characters between UTF-8 and Unicode. UTF-8 is a transmission format for Unicode that is safe for UNIX file systems.
The full source code for the example is in the file StringConverter.java.
The StringConverter program starts by creating a String containing Unicode characters:
String original = new String("A" + "\u00ea" + "\u00f1" +
"\u00fc" + "C");
When printed, the String named original appears as:
AêñüC
To convert the String object to UTF-8, invoke the getBytes method and specify the appropriate encoding identifier as a parameter. The getBytes method returns an array of bytes in UTF-8 format. To create a String object from an array of non-Unicode bytes, invoke the String constructor with the encoding parameter. The code that makes these calls is enclosed in a try block, in case the specified encoding is unsupported:
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();

String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
The StringConverter program prints out the values in the utf8Bytes and defaultBytes arrays to demonstrate an important point: The length of the converted text might not be the same as the length of the source text. Some Unicode characters translate into single bytes, others into pairs or triplets of bytes.
The printBytes method displays the byte arrays by invoking the byteToHex method, which is defined in the source file, UnicodeFormatter.java. Here is the printBytes method:
public static void printBytes(byte[] array, String name) {
for (int k = 0; k < array.length; k++) {
System.out.println(name + "[" + k + "] = " + "0x" +
UnicodeFormatter.byteToHex(array[k]));
}
}
The output of the printBytes method follows. Note that only the first and last bytes, the A and C characters, are the same in both arrays:
utf8Bytes[0] = 0x41
utf8Bytes[1] = 0xc3
utf8Bytes[2] = 0xaa
utf8Bytes[3] = 0xc3
utf8Bytes[4] = 0xb1
utf8Bytes[5] = 0xc3
utf8Bytes[6] = 0xbc
utf8Bytes[7] = 0x43
defaultBytes[0] = 0x41
defaultBytes[1] = 0xea
defaultBytes[2] = 0xf1
defaultBytes[3] = 0xfc
defaultBytes[4] = 0x43

''Character and Byte Streams''

The StreamConverter program converts a sequence of Unicode characters from a String object into a FileOutputStream of bytes encoded in UTF-8. The method that performs the conversion is called writeOutput:
static void writeOutput(String str) {

try {
FileOutputStream fos = new FileOutputStream("test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(str);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
The readInput method reads the bytes encoded in UTF-8 from the file created by the writeOutput method. An InputStreamReader object converts the bytes from UTF-8 into Unicode and returns the result in a String. The readInput method is as follows:
static String readInput() {

StringBuffer buffer = new StringBuffer();
try {
FileInputStream fis = new FileInputStream("test.txt");
InputStreamReader isr = new InputStreamReader(fis,
"UTF8");
Reader in = new BufferedReader(isr);
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char)ch);
}
in.close();
return buffer.toString();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
The main method of the StreamConverter program invokes the writeOutput method to create a file of bytes encoded in UTF-8. The readInput method reads the same file, converting the bytes back into Unicode. Here is the source code for the main method:
public static void main(String[] args) {

String jaString =
new String("\u65e5\u672c\u8a9e\u6587\u5b57\u5217");

writeOutput(jaString);
String inputString = readInput();
String displayString = jaString + " " + inputString;
new ShowString(displayString, "Conversion Demo");
}
The original string (jaString) should be identical to the newly created string (inputString). To show that the two strings are the same, the program concatenates them and displays them with a ShowString object. The ShowString class displays a string with the Graphics.drawString method. The source code for this class is in ShowString.java. When the StreamConverter program instantiates ShowString, the following window appears. The repetition of the characters displayed verifies that the two strings are identical:

''Exception Handling''

The exception logging can occur in the language of your operating system, the messages should still be displayed in the language of the user’s choice. This is not much of a concern if the message is generic. For instance (specific to Java), a message shown to the user on RemoteException is identified by the key rmi.error. The key can have a generic message in the resource bundle. However the problem starts when the message has to get specific or the message requires replacement values. There are two possible solutions to this problem neither of which is ideal.
Here is the first approach: If you want to keep the internationalization in the web tier, then the specific exceptions from the server side should encapsulate the resource bundle keys and some (if not all) replacement values in them. The key and the replacement values can be exposed through getter methods on the exception class. This approach makes the server side code dependent on the web tier resource bundle. This also requires a programmatic exception handling since you have to pass appropriate replacement values to the ActionError.
The second approach is to send the user’s Locale as one of the arguments to the server side and let the server side generate the entire message. This removes the server’s dependency on the web tier code, but requires the Locale to be sent as a argument on every method call to the server.



'''''Testing Guidelines'''''

''Functional testing''

One advantage of a well-designed and developed globalized application is that function testing is unnecessary for every localized version. Since all language versions use the same set of programs (the Single Executable 1), we have good reason to assume that a localized version should work exactly in the same way as the source language product in business functions. Therefore, only the source language version must undergo all function testing cases in order to ensure basic functional competence.

Functional testing is also called as BlackBox testing where in, the code flow is not checked instead the input and output are the only things that are verified.
The application is treated as a black box and its subjected to different kinds of inputs leading to different outputs. The input and outputs are verified.

Functional testing usually is preceded by “Integration Testing”.




''Translation testing''

Translation testing has two major purposes:
1. To check translation accuracy and contextual pertinence
2. To improve translation quality
Localization service providers in all supported locales join the testing team to test the Web
site translation in their respective languages.
During the Localization phase, if the source localization pack is in XML format, it is difficult for
translators to catch the meaning of separate words or phrases without knowing and
understanding their context. During translation testing, testers review all translated Web
pages to pick out words and phrases that might have been inappropriately translated.

For example, assume a web application which does hotel room bookings. In this context, room type of “Single” means that the room is for one person only.

However, if a translator who has been hired to translate the text “Single” to other language, does not know the context he might translate the text to “not married”.

Such translation defects should be discovered during the translation testing.

This has to be done by the person who is a language expert and is also aware of the context/functionality of the web application.


Globalization feature testing

The objective of globalization feature testing is to ensure that the application provides
globalization feature correctly. For this reason, globalization feature testing should be
performed on all localized versions. Common concerns include:
_ Whether the user language interface conforms to the locale selected by users
_ Whether the character set is called correctly, especially multi-byte character sets such as
Chinese, Japanese, and Korean
_ Whether locale-sensitive information is displayed in a correct way, including date and time
format, name and address format, number and currency format, and dictionary sorting
_ Whether the bi-directional data display is adequately supported, especially for Arabic and
Hebrew
Local testers should check the following points according to their own cultural conventions:
_ If all field names are displayed in the correct language.
_ If the date field conforms to the conventional date format for the current user
locale.
The date fields format if month-day-year for en_US locale, the same section in the Web page for a Chinese, Japanese, or Korean user should be year-month-day. Testers should verify this feature.

_ If the format for a person's name (including honorific) follows the cultural convention of the
current user locale.
For example, for an American user, the order is honorific, first name, middle name, and
last name, while for a Japanese user, it should be last name, first name, and honorific (and the middle name field should not be displayed at all, since Japanese do not have middle names).

_ If the default value for the Country/Region selection is set to be the country/region corresponding to the current user locale.
_ If there are any list of country/region and state names is in the dictionary sort sequence according to the cultural convention of the current user locale.

_If all the postal address fields order follows the corresponding cultural convention, and when the Current Address country/region is changed, if the address format changes accordingly.

_When the user locale is ar_EG or iw_IL, if the bi-directional data display is correct

This list is just a few and not limited to the kind of Globalization testing that can be performed


''Browser testing''

Browser testing involves tasks users execute within a browser. Since people in different locales can have different browser preferences, testing multilingual applications covers more browser activities than testing monolingual ones. The following are some of the more common concerns for browser testing:

_ Browser-dependent user operations. For example, what will happen if the Back, Forward, or Reset buttons are selected during the transaction cycle?

_ Cookie-related activities, such as what will happen if the user enables or disables the cookie settings.

_ Vendor-specific performance. For example, whether the application can survive under both Internet Explorer and Netscape Navigator.

_ Artwork-related problems. For example, different fonts might display different glyphs against the same code entry, so that the Web page should be designed in such a way that the content in all supported languages can be displayed neatly and professionally.

''Usability testing''

Usability testing evaluates how user-friendly an application is. From a globalization perspective, testing should evaluate whether customers in various locales can interact freely with a multilingual application and thus make full use of it.
It is not easy to form a technical testing team, because ideally such a team should include end users who speak the native language in various locales. There are two alternatives — formal review and extensive survey.

1. Formal review
Good candidates for a formal review are professional translators, technical engineers, globalization consultants, and actual customers. By introducing pertinent suggestions, such a review can help to enhance the technical and cultural correctness of a multilingual application.

2. Extensive Survey
An extensive survey can be made among companies that will market, sell, and/or support the localized version of a multilingual application, or through online surveys aimed at the end users of a Web application.

Usability has several things to it. For example

a. Tab order of the fields on the web page. A Tab could be well used on a web page which is for en_US locale. But when this is changed to ar_EG locale, the same tab order might not work well which could be difficult for the user
b. Sorting logic in drop downs and list items should be checked based on the user locale
c. Text justification should also be checked
d. Numbers and currencies should be displayed in the locale specific way.
e. Sentence, Word & Character braking is done correctly or not

Such things are a few and are not limited for testing the usability.

Latest revision as of 15:28, 27 December 2023

Meta-Wiki

Welcome to Meta-Wiki, the global community site for the Wikimedia Foundation's projects and related projects, from coordination and documentation to planning and analysis.

Other meta-focused wikis such as Wikimedia Outreach are specialized projects that have their roots in Meta-Wiki. Related discussions also take place on Wikimedia mailing lists (particularly wikimedia-l, with its low-traffic equivalent WikimediaAnnounce), IRC channels on Libera, individual wikis of Wikimedia affiliates, and other places.

Current events

May 2024

May 30: Community Affairs Committee: Live call #2 on the Procedure for Sibling Project Lifecycle at 16:00 UTC
May 23: Community Affairs Committee: Live call #1 on the Procedure for Sibling Project Lifecycle at 02:00 UTC
May 13–June 23: Community Affairs Committee: Call for feedback on the proposed Procedure for Sibling Project Lifecycle
May 10–May 12: ESEAP Conference in Kota Kinabalu, Malaysia
May 8–May 29: 2024 Board election: Call for candidates
May 8–June 12: 2024 Board election: Call for questions for candidates
May 8–June 3: Call for new members of the Conference Fund Committee
May 3–May 5: Wikimedia Hackathon 2024 in Tallinn, Estonia
May 2–May 5: WikiNusantara in Bogor, Indonesia
May 2–May 4: Global Wiki Advocacy Meet-up in Santiago, Chile
April 25–May 9: UCoC Coordinating Committee election: Voting period (information for voters / list of all candidates)

April 2024

April 30: Community Resilience and Sustainability conversation hour with Maggie Dennis at 18:00 UTC
April 19–April 21: Wikimedia Summit 2024 in Berlin, Germany
April 2–April 30: Movement Charter: Wikimedia communities review of the Movement Charter full draft (talk page discussions / regional conversations)


Community and communication
Wikimedia Foundation, Meta-Wiki, and its sister projects
The Wikimedia Foundation is the overarching non-profit foundation that owns the Wikimedia servers along with the domain names, logos and trademarks of all Wikimedia projects and MediaWiki. Meta-Wiki is the coordination wiki for the various Wikimedia wikis.