Recently in (X)HTML Markup Category
Problem: The content management system I'm using takes any URL it recognizes and changes it into a link. That's normally good EXCEPT if you want to create a fake URL as an example.
Solution: Replace the slash with its numeric entity code (/). Voilà - the system can't find the slashes anymore, so leaves the URL alone.
By the way, the entity code hack looks like this
http://www....
Even ASCII characters sometimes need an entity code.
* Note: This entry was published elsewhere in 2006.
The Penn State server delivers UTF-8 Unicode pages. Dreamweaver creates Unicode pages. They appear fine in all my browsers without the entity code translation. So I should be able to include Unicode characters in server side includes - right? Not exactly. Hidden UTF-8 character seem to
Any .inc file must be encoded as ASCII and only include ASCII characters. Otherwise you will get an error that the file "cannot be processed". I suspect the culprit are some hidden Unicode control characters that the server doesn't recognize. If you want to include a Unicode character (like the £ symbol, you have to use an entity code like £ (all characters in the entity code are ASCII). If you enter raw Unicode, then users will see a question mark, even if the character is actually available in that font.
As for CSS stylesheets, there are not issues technically prohibiting .css files from being UTF-8, but I found out a few years ago that if I placed CSS in UTF-8 files, then attributes would mysteriously fail to apply even though the syntax was correct. Again it was probably a hidden UTF-8 character that was interfering. It's little glitches like these that make Unicode development still an entertaining adventure even in 2007.
What are "hidden" UTF-8 control characters? These are code points which don't represent a character but signify text formatting elements like right to left text vs. left to right text or which kind of line break you are using. ASCII has control characters just in positions #0-31 (and most software programs recognize them), but Unicode includes additional control characters that older programs don't recognize. The problem is that the new control character are included.
By the way, if you cut and paste from a UTF-8 file and see strange behavior in a software package, sometimes backspacing through a "space" will eliminate an unrecognized control character and fix the problem.
Superscripts in HTML
Both HTML and XHTML include the SUP tag for superscripts and the SUB tag for subscripts. Yet the Unicode specification also includes specific slots for individual superscript/subscript characters. For example the phrase “two to the fourth power” could be encoded as- 2<sup>4</sup> (SUP tag) = 24
- 2⁴ (numeric entity code) = 2⁴
- 2⁴ (raw Unicode data) = 2⁴
What’s the difference and which should you use? If you’re displaying static Web pages, there’s probably very minimal difference. Although the entity code &8303; takes up less file space than the SUP tag does, the SUP tag works across most browsers/fonts and can be styled.
The raw data method is the most correct, but also the most prone to cross-platform difficulties. For one thing, you MUST have the UTF-8 encoding header meta tag included or the display will be broken. Another issue is that some browsers (e.g. Mac/Firefox) include extra space around superscript entities or shrink the characters to unreadable sizes. If you’re working with XML though, then you may need to enter superscript/subscripts as raw data.
XML and Flash
On one project we had to feed data for College Algebra exercises into a Flash quiz application. The XML spec didn’t recognize numeric entity codes or the SUP/SUB tag, so we had to enter the superscripts as Unicode characters.The good news is that if you can create a UTF-8 text file and insert the symbols, it will import into Flash (at least in Flash 8.) For math, your best bet is usally to use the Windows Character Map utility and insert the symbols into a Notepad text file or use the Macintosh Character Palette with a Text Edit text file. The Penn State Unicode and XML page explains how to create UTF-8 encoded XML files.
Reason for Unicode Character Points
Ultimately, the reason why Unicode has positions for these characters isn’t to help Flash developers, but because the superscripts/subscripts do add content to a text string.If you’re exchanging raw data files, you may need to know whether a character is superscript or subscript, so it has to be encoded within Unicode. Hence, we have superscript/subscript characters
- Math Super/Subscripts
- Phonetics (organized by letter)
Recent Comments