What is transliteration and why is it important for address verification?

Transliteration, not translation

Transliteration is not translation. Translation is the process of converting text in one language into text in another language while keeping the underlying meaning.

Transliteration is simply the process of converting fields or characters from one alphabet to another without keeping the underlying meaning. The sounds are being replicated phonetically in another language, not the meaning.

For example, if you go to a sushi restaurant, you might see 鰤 on a menu. It gets transliterated into Hamachi. If it was translated, it would be called Japanese amberjack or yellowtail. う なぎ is transliterated to Unagi, but translated to eel.

Why is transliteration so important for address verification?

Imagine that you have customer addresses from all over the world in your database. They are saved in their original languages. But since you can’t read Russian, Arabic or Chinese for example, you can’t tell where the customers are located. You’d like to see them spelled out in a common native language, in this instance, English, so you can do further analysis. That’s assuming that your CRM software is even capable of handling different languages. If it isn’t, then you simply won’t be able to service those customers.

Address verification software like ours helps you do that. It parses then standardises and verifies addresses from around the world so they can be read. However, we don’t guarantee you’ll be able to pronounce them. Here are some examples of transliterated addresses:

Cyrillic: Беловежская Улица 39, Можайский, Москва. 121353

Latin Transliteration: Belovezskaja Ulica 39. Mozaiskii, Moskva, 121353

Simplified Chinese {HANS }: 天津市天津市河北区长张屯村6

Latin Transliteration: 6 ChangZhangTunCun, HeBeiQu, TianJinShi

Hellenic {GREK }: Ανατολικης Θρακης 9, 156 69 Παπαγος

Latin Transliteration: Anatolikis Thrakis 9, 156 69 Papagos

Our verification technology can transliterate between native character sets and Latin across all our core verification and capture products. When the output script is set to Latin we will transliterate to English wherever possible.

First, our software performs field-based transliteration using a compiled set of words for commonly used address field values in each language. (It may also do translation in some cases.) If the field queries do not work, the software tries to character map between native and Latin character sets. If that doesn’t work, the native word is used.