Explaining and Inventing Your Own Unicode Jargon - Part 1

|

I love the i18n/UTF-8 process as much as anyone, but you have to admit that all those flying letters and number combinations can be a little overwhelming to the newcomer. So I think a primer is needed

There are some real glossaries out there such as the Unicode Glossary and the Penn State i18n glossary, and the IBM Glossary of Unicode Terms...but you really do learn more when you create your own material. So with that in mind, I present

Encoding in the World of Star Trek

I would like to believe that someday we will contact other civilizations (with some sort of encoded communication) and at that point there will need to expand and create new encodings (and of course new jargon) such as

Jargon of Process

Three current terms for the field of wrangling non-English text include i18n for "internationalization", g11n for "globalization" (both refer to making content/systems usable to people using any script) and the related l10n "localization" (adapting information from region one to a second region, (e.g. a Japanese product sold in the United States).

These terms have the same structure start with the first letter, end with the last letter and insert the number of letters in between. Thus internationalization (20 letters total, 18 between "i" and "n") becomes i18n.

You can apply this to any term such as "Romanization" and "transliteration" (see answers below for new terms), and in the future we will need alternate terms to include the fact that we are working with planets, not just nations. So maybe we will have

  • galaxification (g12n) - even greater than g11n
  • interplanetarization (i19n) - also greater i19n
  • astrointernationalization (a23n) - the biggest of them all
  • Romanization (r10n) - I made this up
  • transliteration (t13n) - this does exist, but is not frequently seen

FYI - Both r10n and t13n refer to the process of writing any language in the Roman (Western/Latin) alphabet. Japanese Romāji is an example of this process.

Local Government Standards

Before the days of Unicode, each region had established its own encoding standard for its own language(s). The most famous may be ASCII (American Standard Code for Information Interchange) from which we also got VISCII (Vietnamese), ISCII (India) and ArmSCII (Armenian).

Another pattern is to name the encoding standard after the governmental standards body and the number of the encoding scheme (usually a sequential number). This is how we arrive at TIS-620 (Thailand, Thai Industrial Standard #620), GB3212 (China) and ELOT 928 (Greece/Ellas). A governmental agency also gave names to Shift-JIS (Japan, combination of JIS X 0201 and JIS X 0208) and ANSI (U.S., American National Standards Institute).

Finally, if for some reason, the local government doesn't move as rapidly as needed , then a corporation will invent its own standard on the fly. In the U.S. we got both Windows-1252 (Win-1252) and MacRoman encodings this way. In Taiwan, they got Big5 (a Traditional Chinese encoding standard agreed upon by five corporations).

Future Local Planetary Encoding Standards

In the future, I will assume that each Star Trek planet has its own version of Unicode, but of course each will have its own encoding designation. Can you Star Trek fans guess where these are from?

  • KLISCII or TLHLSCII (depending on linguistic accuracy)
  • RIS-105
  • VSAUS-210A (because this planet uses hex numbers)
  • FMSS-13B1 (in duodecimal numbers because you can quickly divide by 3)
  • TUTF-32 (future name for an existing standard)

Since I will be talking cross-planetary standardization next time, I will add these potential encodings

  • ACS34 - Andorian Communication Standard #34
  • TelSCII - Tellarite Standard Code for Information Interchange
  • OTLC-10 - Orion Technology Limited Code #10
  • SuperSix - As agreed upon by six major Orion Trading Houses
  • BNTCXS - Betazed Non-Telepathic Communication Exchange Standard

And to finalize the list

  • KLISCII - Klingon Language Institute Standard Code for Information Exchange or
    TLHLSCII - tlhIngan Hol Language Institute Standard Code for Information Exchange
  • RIS-105 - Romulan Imperial Standard #105
  • VSAUS-210A - Vulcan Science Academy Unified Standard #210A
  • FMSS-13B1 - Ferengi Mercantile Society Standard #13BC
  • TUTF-32 - Terran Unicode (32 bit)

Final challenge - what encoding would you invent for the Cardassians?

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments