(X)HTML Markup: March 2009 Archives

Entity Code or Raw Text in HTML?


If you need to enter a non-English character like /ə/ (schwa) on a Web page, you actually have two choices - you can use an entity code like ə or you can use an input utility to enter in raw date (i.e. ə). You can see how the code looks below

Bahama = /bəhamə/

Bahama = <b>/b&#601;ham&#601;/</b>

Bahama = <b>/bəhamə/</b>

Which is Better?

The entity code was a solution from before browsers could reliably recognize true UTF-8 text or before a server or Web tool could "serve" it up properly. In the year 2000, it was the safest solution by far.

These days though the tide is shifting so that I would recommend avoiding entity codes unless your tech is not quite up to speed. For instance, this blog (served by Movable Type) hardly ever uses an entity code. I type or copy Unicode text and out it goes. The same is true for Facebook, Twitter and most Web 1.0 sites hosted at Penn State.

The advantage of NOT using entity codes is that it is easier to port content between file formats including RSS and other XML formats. RSS, unlike HTML, does not recognize the HTML entity codes. An entity code such as &#601 will be displayed as... &#601 (not schwa). Only ə is displayed as ə. The same is true if you want to include Unicode on your Facebook profile page.

The other advantage is debugging and proof. Which do you want to spell check? Русский or &#x0420;&#x0443;&#x0441;&#x0441;&#x043A;&#x0438;&#x0439;?

However there are cases where you need to use escape codes just to be safe. Often the problem is that you are using a server which can't deliver UTF-8 encoded text for whatever reason. One of these, unfortunately, has been our course management system - fortunately it's WYSIWYG editor converts non-English text to escape codes for you.

If you are working with a static page, there really should be no roadblock at this point...so long as your page has the correct UTF-8 meta tag. The cases where this isn't working is likely due to a under configured Apache set up.

Ironically though, I seem to see more under configured Apache issues than I used to...One step forward, one step back?


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments