English is one of the most easily represents languages for computers, because it's words doesn't has too much tittles and neither special words as "ñ" in spanish.
But there are many other languages which has special characters that the computer cannot represent them correctly.
Source: Daniel Lew's Coding Thoughts
The example above shows the representation of the Vietnamese in Android. You don't have to know Vietnamese to notice that something went wrong: Some characters with more than one accent, some characters with a horizontal line in the upper part, etc.
One of the solutions for this problem could be the unicode normalization, which "replace equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. " Wikipedia - Unicode equivalent
There are two notions to represent texts with similar code points: canonical equivalence and compatibility. The first one defines code points sequences that have the same appearance and meaning when printed or displayed, while the second one defines code points that have the save meaning in some context, but they could have possibly distinct appearances.
For each one of the notions, the text could be fully composed, where utilize a single points to replace the multiple code points where possible, or fully decomposed, where single points are split into multiple ones.
So, there are four Unicode normalization forms:
- NFD: Normalization Form Canonical Decomposition
- NFC: Normalization Form Canonical Composition
- NFKD: Normalization Form Compatibility Decomposition
- NFKC: Normalization Form Compatibility Composition
http://unicode.org/reports/tr15/ or/and this article of Wikipedia:
2. Normalization in Android
In Android, the texts are localized in two places:
- Statical way, using a xml file in "res/values/", where the texts are in format: <string name="app_name">SampleAppName</string>
- Dynamica way, because the text could be changed during running process or because the text is unknown before running, for example, come from the server.
For the second case, Android utilizes the same way as Java do, by using the class java.text.Normalizer. This class provides two methods:
- boolean isNormalized(CharSequence src, Normalizer.Form)
- This method checks if a char sequence has been normalized to a specific normalization form.
- String normalize (CharSequence src, Normalizer.Form)
- This method normalize a char sequence to a specific normalization form.
Here is an example of usage:
String textNormalized = Normalizer.normalize(text, Normalizer.Form.NFD);
Source2: Daniel Lew's Coding Thoughts
3. Normalization in iOS
In Object-C, the normalization is provided by the class NSString.
The methods are:
- - (NSString *)decomposedStringWithCanonicalMapping
- Equivalent to NFD in Java/Android
- - (NSString *)decomposedStringWithCompatibilityMapping
- Equivalent to NFKD in Java/Android
- - (NSString *)precomposedStringWithCanonicalMapping
- Equivalente to NFC in Java/Android
- - (NSString *)precomposedStringWithCompatibilityMapping
- Equivalente to NFKC in Java/Android
Here is an example of the usage:
NSString *textNormalized = [text decomposedStringWithCanonicalMapping];