Explaining UTF-8


The UTF-8 encoding is not a straight encoding of Unicode code points, but rather a "compromise character encoding" which allows files with just ASCII characters to stay the same size as ASCII, but also include any Unicode code point, regardless of byte size.

If this is sounding a bit confusing, you may want to try this Game Dev article on UTF 8. It's still under review, but it does step through some parts of the the conversion from a Unicode code point to a UTF-8 representation.

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments