<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Got Unicode?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/" />
    <link rel="self" type="application/atom+xml" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/atom.xml" />
    <id>tag:www.personal.psu.edu,2008-01-24:/ejp10/blogs/gotunicode//516</id>
    <updated>2009-11-12T16:19:10Z</updated>
    <subtitle>Elizabeth Pyatt&apos;s Unicode tips, resources and war stories.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.24-en</generator>

<entry>
    <title>Hexadecimal to Decimal in FileMaker 7+ (Revised)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/11/hexadecimal-to-decimal-in-file.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.150760</id>

    <published>2009-11-10T22:20:04Z</published>
    <updated>2009-11-12T16:19:10Z</updated>

    <summary>I&apos;m updating my FileMaker Unicode database database to reflect the changes in the recent versions of Unicode. As part of the database, I like to have the decimal version of the code point handy as well as the actual hexadecimal...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Software and Unicode" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>I'm updating my FileMaker Unicode database database to reflect the changes in the recent versions of Unicode. As part of the database, I like to have the decimal version of the code point handy as well as the actual hexadecimal version (it's good for debugging purposes).</p>

<p>Now the default version does not appear to have to hex to decimal conversion built in (not even in FileMaker 10), so here's my (updated) solution.</p>

<ol>
<li>In the main table corresponding to the list of code points, I created a field for the Hexadecimal Unicode code point value. I'll call this <b>HexValue</b> for now. It must be a <b>Text</b> field. You can create a Decimal field <b>(Calculated)</b>, but you won't be able to fill in the formula yet. </li>
<li>Then I created a second table to store the correspondence between a hex digit (0-F) and its decimal value (0-15). The <b>HexValue</b>field is Text, but the <b>DecValue</b> field is a <b>Number.</b> See the sample table below (some values skipped).
<table cellspacing="0" class="chart">
<tr>
<th scope="col">HexValue (Text)</th>
<th scope="col">DecValue (Number)</th>
</tr>
<tr><td>0</td><td>0</td></tr>
<tr><td>1</td><td>1</td></tr>
<tr><td>2</td><td>2</td></tr>
<tr><td>3</td><td>3</td></tr>
<tr><td>4...9 (1 row each)</td>
   <td>4...9</td>
</tr>
<tr><td>A</td><td>10</td></tr>
<tr><td>B</td><td>11</td></tr>
<tr><td>C...E (1 row each)</td><td>12...14</td></tr>
<tr><td>F</td><td>15</td>
</tr>
</table>
</li>
<li><p>To do all the conversions, you need to extract the text value of each position in the code point.  So, I created fields corresponding to the value for each place in the hex code point as shown in the list below. I'll explain the formulas below.</p >

<p><b>Note:</b> In case you're wondering, the name of the places are semi-inspired by Roman numerals and algebra.</p>

<ul>
<li>Rightmost digit  Units (n) : <b>nhex = Right(HexValue;1)</b></li>
<li>Penultimate digit (t) : <b>thex = Left(Right(UnicodeHex;2);1)</b></li>
<li>Antepenultimate digit (c) : <b>chex = Left(Right(UnicodeHex;3);1)</b></li>
<li>4th from right (m): <b>mhex =Left(Right(UnicodeHex;4);1)</b></li>
<li>5th from right (d): <b>dhex =If(Length (UnicodeHex)>4;Left(Right(UnicodeHex;5);1);"0")</b></li>
<li>6th from right (x): <b>xhex = If(Length (UnicodeHex)>5;Left(Right(UnicodeHex;6);1);"0")</b></li>
</ul>

<p>The challenge for modern Unicode is that code points now come in variable lengths (4-6), so if you count from the left you can't always know you are the appropriate digit. That means you have to count from the right, but there's no simple formula for picking the 2nd digit from the right. My solution is to take a rightmost chunk then count in from the left. So to get the 3rd hex digit from the left, I take the right most 3 digits, then find the leftmost digit in that chunk (hence the embedded left(right) formulas). 

<p>I also have to check to see if the length is greater than 4. When the length is 4, some digits are filled in with the value 0, otherwise you do a string extraction. Hence the formulas for dhex and xhex use conditional logic. Hopefully though, if Unicode adds more digits, these formulas will continue to work (unlike <a href="http://www.personal.psu.edu/ejp10/blogs/tlt/2007/06/hexadecimal-to-decimal-in-file-1.html">my original attempt which only assumed 4 digits</a> in the code point.</p>

</li>
<li>To convert each extracted digit to its decimal version. I need to set up some Relationships between tables so that each extracted digit can look up the decimal equivalent. For each of the intermediate digit fields above, I created a link to an instance of the Hexadecimal Lookup table (there are 4 instances total). It's important to make sure each instance has a name you can remember later; mine mention which digit I am working on.  See the Relationships diagram below.<br>
<img alt="HexRelationships.png" src="http://www.personal.psu.edu/ejp10/blogs/tlt/2007/06/22/HexRelationships.png" width="500" height="279" />
</li>
<li>Now we can finally get that decimal value! If you haven't already, create a <b>DecimalValue</b> field and make it <b>Calculated.</b></li>
<li>Here's my calculation. I'll explain what the parts mean below<br />
<b>HexLookup N::DecValue + 16*HexLookup T::DecValue + 16^2* HexLookup C::DecValue + 16^3*HexLookup M::DecValue + 16^4*HexLookup D::DecValue+16^5*HexLookup X::DecValue </b>
<ul>
<li>"HexLookupN::DecValue" means give me the equivalent decimal value column based on the hex value in the "HexLookupN" (units digit) table instance.</li>
<li>"HexLookup T::DecValue" does a look up for the tens unit. I multiply the value by 16 an add it to the ones value. Remember the hex #FF (F=15) means 15*16+15</li>
<li>I look up the hundreds place decimal value and multiply it by 16^2 (256), then the thousands place decimal and multiply it by 16^3 (4096).</li>
<li>I add up the results of each converted decimal digits times its appropriate power of 16.The calculation is complete.</li>
</ul>
</li>
</ol>]]>
        
    </content>
</entry>

<entry>
    <title>What did that font switch to in FileMaker (Mac)?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/11/what-did-that-font-switch-to-i.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.150575</id>

    <published>2009-11-10T15:59:12Z</published>
    <updated>2009-11-10T16:19:34Z</updated>

    <summary> Prelude about the Problem In terms of handling non-English characters, apps come in two types (at least on the Mac). There are apps which switch fonts behind the scenes without telling you, and those which don&apos;t...but then you have...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[
<h3>Prelude about the Problem</h3>
<p>In terms of handling non-English characters, apps come in two types (at least on the Mac). There are apps which switch fonts behind the scenes without telling you, and those which don't...but then you have to guess which font to use.</p>

<p>To take a concrete example, if I switch from the English keyboard to Japanese input in FileMaker, the font will automatically switch to one of the Japanese fonts. In theory, once I switch back to English, I should return to the original font (except when I don't...we'll get to that). The same principle applies in most apps including TextEdit, FileMaker and so forth. In contrast, if I switch to Japanese input in Adobe Photoshop, I also have to change fonts.</p>

<p>In theory, the automatic font switching sounds nice except when 1) when the font doesn't change back after typing the exotic character (this happens a lot in phonetic transcription and elsewhere) or 2) you're trying to figure if font X actually has that glyph (or whether it's the illusion of font switching in action. With the Adobe products, the manual font switching means you know exactly which font you are using at all times, which is important in desktop publishing.</p>

<h3>FileMaker</h3>

<p>For instance...I uploaded a version of the UCD Unicode files into FileMaker so I would have a searchable reference locally. An additional function is that I can display glyphs in different fonts for comparison. I have most of the mega fonts selected, but few fonts have everything, so I know there are gaps. </p>

<p>However, because FileMaker switches fonts behind the scenes, I can't always be sure if font X actually has that glyph. If I see a bunch of boxes with identical glyphs, I can suspect an unannounced font switch...but to what?

<h3>Solution</h3>
<p>The best solution now is to copy and paste the text into TextEdit then open up the font formatting palette (Command+T), and see what it says. Kind of dorky, but still more information than I had.</p>

<p>For the record, I understand why FileMaker is set up this way. For most purposes, you don't want your data entry operators to fidget with fonts. However, you can get inconsistent results if you are not careful. For instance, once I do switch to Japanese, I get the Japanese font, but if I return to English...I still get the Japanese font. I know Japanese fonts contain Latin characters, but the formatting is almost always NOT the one I intended.</p>

<p>It would be nice if FileMaker and the other apps (including Microsoft Office) could return you to your original English font formatting after your exotic sidetrip to the higher code points of Unicode.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Glyph DuJour: Romance Ordinal ª and º</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/11/glyph-dujour-romance-ordinal-a.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.148575</id>

    <published>2009-11-04T18:51:01Z</published>
    <updated>2009-11-04T19:13:34Z</updated>

    <summary>What these are The superscript a/o (sometimes underlined) are abbreviations for ordinal numbers used in Spanish, Italian and Portuguese similar to English -th (as in &quot;4th, 5th, 6th..&quot;). The use of &quot;o&quot; vs &quot;a&quot; depends on the gender of the...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Glyph Du Jour" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<h3>What these are</h3>
<p>The superscript <cite>a/o</cite> (sometimes underlined) are abbreviations for ordinal numbers used in Spanish, Italian and Portuguese similar to English <cite>-th</cite> (as in "4th, 5th, 6th.."). The use of "o" vs "a" depends on the gender of the noun. For instance, the "1st American woman" would be <cite>1ª americana</cite> in Spanish and the "1st American man" would be <cite>1º americano</cite>. The 5th Amercan woman and man would be <cite>5ª americana/5º americano.</cite></p>


<h3>The Codes</h3>
<p>I got a request for putting codes for these on the Penn State Web Computing with Accents Web site in various locations, so I thought I would summarize the codes here.</p>


<table cellspacing="0" class="chart">
  <tr>
    <th scope="col">&nbsp;</th>
    <th scope="col">Feminine Ordinal (ª)</th>
    <th scope="col">Masculine Ordinal (º)</th>
  </tr>
  <tr>
    <td><b>Unicode Code Point</b></td>
    <td align="center">U+00AA (170)</td>
    <td align="center">U+00BA (186)</td>
  </tr>
  <tr>
    <td><b>Windows Alt Code</b></td>
    <td align="center">ALT+0170</td>
    <td align="center">ALT+0186</td>
  </tr>
  <tr>
    <td><b>Mac Option Code</b></td>
    <td align="center">Option+9</td>
    <td align="center">Option+0</td>
  </tr>
  <tr>
    <td><b>HTML Entity Code</b></td>
    <td align="center">&amp;ordf;</td>
    <td align="center">&amp;ordm;</td>
  </tr>
</table>


<h3>But Wait There's More</h3>
<p>But in the land of Unicode, there's always more to know...such as that in Spanish 1º <cite>primero</cite> '1st.masc' or '1º'  may be shortened to <cite>primer</cite> which can be abbreviated as '1<sup>er</sup>'...or that you may write <cite>octavo</cite> 'eight.masc' as 8º or 8.º or possibly 8<sup>vo</sup>...although Google tends to have more instances of 8º. </p>

<p>What's important though is that only º and ª have their own code points in Unicode. For English <sup>-th, -nd, -rd</sup> or Spanish <sup>-vo,-er</sup> you have to rely on the old fashioned SUP (superscript) tag or its equivalent in CSS.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Ancient Egyptian &amp; Other Additions in Unicode 5.2</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/ancient-egyptian-other-additio.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.144919</id>

    <published>2009-10-26T19:26:15Z</published>
    <updated>2009-10-26T19:47:01Z</updated>

    <summary>The latest Unicode Standard, Version 5.2, was released at the beginning of October, 2009. A lot is added each standard, but I confess that the most noteworthy for me was that an Egyptian Heiroglyphic block (U+13000 to U+1342E) was added....</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The latest <a href="http://www.unicode.org/versions/Unicode5.2.0/">Unicode Standard, Version 5.2</a>, was released at the beginning of October, 2009. A lot is added each standard, but I confess that the most noteworthy for me was that an <a href="http://www.unicode.org/charts/PDF/Unicode-5.2/U52-13000.pdf">Egyptian Heiroglyphic block</a> (U+13000 to U+1342E) was added. It was certainly the largest block added at 1071 code points.</p>

<p>Additional code points added included blocks for Avestan, Old South Arabic, Samaratian, Imperial Aramaic, Inscriptional Parthian, Old Turkic. In addition, supporting characters were added for the Coptic, Devanagari (esp Vedic support), Hangul (Old Korean), Phonecian and other ancient script blocks.</p>

<p>In South and Southeast Asia, support was added for Javanese, Tai Tham, Lisu, Kaithi, Meitei Mayak,  Myanmar (new points), New Tai Lue (new points) and others. In other regions, a new  Caniadian Aboriginal Syllabics Extended block was created with 80 additional code points. Some African scripts were also encoded including the Banum script and Rumi numerals. Additions were also made to various math and symbol blocks.</p>

<p>For a complete list of changes, see the information on the <a href="http://">DerivedAge.txt file</a> (scroll to end) and <a href="http://www.unicode.org/charts/PDF/Unicode-5.2/">Revised Unicode 5.2 charts.</a> In terms of support, there may be freeware (or commercial) fonts available, but time will be needed to develop the input utilities and then for these glyphs to be incorporated into major operating systems.</p>

<p>Until then...there's always Unicode 6.0.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Emoji at Unicode 33</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/emoji-at-unicode-33.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.143360</id>

    <published>2009-10-22T19:12:42Z</published>
    <updated>2009-11-04T19:21:23Z</updated>

    <summary>Defining Emoji There were lots of interesting sessions at last week&apos;s Unicode conference, but the one that I think non-experts can relate to the most was the one about Emoji or those little tiny icons popular in Japanese e-mail messages....</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<h3>Defining Emoji</h3>
<p>There were lots of interesting sessions at last week's Unicode conference, but the one that I think non-experts can relate to the most was the one about Emoji or those little tiny <a href="http://talkingpyjamas.wordpress.com/2009/09/26/free-emoji-on-the-iphone-iconboard/">icons popular in Japanese e-mail messages</a>. </p>

<p>A rough translation of <cite lang="ja">emoji</cite> might be <cite>emoticon</cite>, but the range of images goes way beyond smiley faces to include weather symbols, hearts, beer steins, sports icons, high heels,fast food,  astrological signs, warnings, hand gestures and bikinis. </p>

<h3>Why Unicode?</h3>
<p>It's good to catalog and standardize any symbol set, but in this case economic necessity is driving this campaign. Specifically, Google and Apple (and its iPhone) who want to expand more into the Japanese market.</p>

<p>According to our presenters, the three major Japanese cell phone carriers all support emoji, and these images are popular with most adults (even the ones over 30). It's an important enough feature that iPhone (and iChat), Gmail and even Twitter support emoji.</p>

<p>But really it would be good <a href="http://www.unicode.org/%7Escherer/emoji4unicode/snapshot/utc.html">to support one encoded set of emoji,</a> not a hack of three emoji encodings from the Japanese cell phone carriers...hence the need for a unified encoding which combines those items already encoded (e.g. zodiac symbols) with symbols not currently in Unicode.</p>

<h3>Remaining Issues</h3>

<p>Because no Unicode script block is free of quirks, I document the issues overheard at the conference and at the Web. Namely:</p>

<ol>
	<li><p><b>Color</b> - Real emoji have colors (really bright ones), but the spec is in black and white. This makes sense because the rest of Unicode is also in black and white. Plus you will have more options to add the colors you want! </p></li>
	<li><p><b>5-Digit Code Points</b> - Or more technically, the new glyphs will be assigned a number above U+FFFF (i.e. not in the BMP or Plane 0). Not surprisingly, many mobile devices are limited to U+FFFF and below. The committee's comment was that they expected that moble developers would learn to overcome this restriction...because they really are running out of room in the U+0000-FFFF range. That may be good news for anyone wanting to transmit the ancient scripts over cell phones. You never know when you need to access a Mycenaean Greek text away from the office or when the next Linear B revival may happen.</p></li>
      <li><p><b>There's a <del>Jailbreak</del> App for that</b> - When researching this article I encountered articles about tricks for enabling emoji on non-Japanese iPhones, not all of which were legit. For a while, Apple was <a href="http://gizmodo.com/5196709/iphone-emoji-apps-back-in-app-store-someone-probably-rejoices">discouraging use of emoji outside of Japan</a> so it was hiding the emoji.  Fortunately, there is a legal way to enable emoji now (both <a href="http://justanotheriphoneblog.com/wordpress/iphone-tips/how-to-enable-emoji-for-free-without-jailbreaking">a trick</a> and an app).</p></li>
</ol>

<p>So there you have it - thanks to the great folks at Google and Apple, we will all be able to standardize the addition of cute icons in our online communication...or at least we will have a documented explanation of what they were for future generations. Trust me, in about 500 years, we will need it.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Unicode 33 Presentation Files</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/unicode-33-presentation-files.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.138389</id>

    <published>2009-10-13T14:46:37Z</published>
    <updated>2009-10-11T21:52:08Z</updated>

    <summary> MacKeyboardTutorialPyatt.2.pdf (Wednesday) KeyboardLayoutFiles2.zip (Wednesday) Unicode33PyattLogic.ppt (Friday)...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<ul>
	<li><a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/11/MacKeyboardTutorialPyatt.2.pdf">MacKeyboardTutorialPyatt.2.pdf</a> (Wednesday)</li>
<li><a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/11/KeyboardLayoutFiles2.zip">KeyboardLayoutFiles2.zip</a> (Wednesday)</li>
<li><a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/10/11/Unicode33PyattLogic.ppt">Unicode33PyattLogic.ppt</a> (Friday)</li>
</ul>]]>
        
    </content>
</entry>

<entry>
    <title>Announced i18n Enhancements for Mac Snow Leopard (10.6)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/08/announced-i18n-enhancements-fo.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.108220</id>

    <published>2009-08-27T19:38:37Z</published>
    <updated>2009-08-27T20:01:40Z</updated>

    <summary>New operating systems often mean new i18n toys to play with and even through the upgrade from Apple 10.5 (Leopard) to 10.5 (Snow Leopard) is not supposed to be full of new features, there are, in fact, new features scheduled...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="psuets" label="psuets" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>New operating systems often mean new i18n toys to play with and even through the upgrade from Apple 10.5 (Leopard) to 10.5 (Snow Leopard) is not supposed to be full of new features, there are, in fact, new features scheduled for the upgrade.</p>

<p>According to the<a href="http://www.apple.com/macosx/refinements/enhancements-refinements.html#systemwide"> Apple Snow Leopard Enhancement page</a>, 10.6 will include:</p>
<ul>
	<li>Redesign of Pinyin Chinese input with faster speed and enhanced dictionary </li>
	<li>Improvements to handwritten Chinese input</li>
	<li>New Asian fonts - Heiti SC, Heiti TC, Hiragino Sans B. </li>
	<li>New generic monospace font Menlo to be used in applications such as Terminal </li>
	<li>Enhanced RTL support including split cursor option to show text direction in documents with bidirectional text</li>
       <li>General Text substitution (e.g. (c) to ©) across applications. Could be handy for a lot of situations when you need to enter an unusual symbol. This already exists in Microsoft Office (Mac/PC).</li>
</ul>

<p>But I almost missed the big one - the <b>International</b> pane in the <b>System Preferences</b> has been redesigned and will now be the <b>Language and Text</b> pane, presumably with more features. There may be other enhancements in the works that are too minor to be announced (or at least too minor for most people), but there may be more things to find out. </p>

<p>How will they work? Alas, no details from Apple yet. I guess we won't know until we know....</p>]]>
        
    </content>
</entry>

<entry>
    <title>Enter Plane 1 (Phonecian/Linear B...) on Mac Unicode Hex Keyboard</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/08/enter-plane-1-phonecianlinear.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.104118</id>

    <published>2009-08-21T16:32:07Z</published>
    <updated>2009-08-21T17:34:14Z</updated>

    <summary>A useful utility on the Mac is the Unicode Hex keyboard which allows you to press Option plus any four digit Unicode code to get that character. For instance, if you need to enter the rarely seen archaic Roman numeral...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Ancient Scripts" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>A useful utility on the Mac is the <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/mackey.html#unihex">Unicode Hex keyboard</a> which allows you to press Option plus any four digit Unicode code to get that character.</p>

<p> For instance, if you need to enter the rarely seen <a href="http://en.wikipedia.org/wiki/Roman_numerals#Unicode">archaic Roman numeral symbol for 5,000 (<b style="font-size:18px">ↁ</b>)</a>, you could look up its Unicode character number (<b>U+2181</b>), then activate this keyboard then type <b>Option+2181</b> and generate the code (assuming the correct font is loaded).</p>

<p>But a lot of ancient scripts are in Plane 1, meaning they have Unicode values with five digits (i.e. U+10000 or higher). In Unicode world, adding the fifth digit means that some processes go slightly awry, and the Unicode Hex keyboard is one of them. Suppose I want to input Phonecian character Alf (Aleph) (<b style="font-size:18px">𐤀</b> or an A on its side), which is U+10900. If I enter Option+10900  on the Unicode Hex keyboard, I will not get Alf, but ႐ instead. </p>

<p><b>Note:</b> Characters U+0000 to U+FFFF are in <cite>Plane 0</cite> or the <cite>BMP (Basic Multilingual Plane)</cite>. A lot of systems are set up to deal with BMP only, but need special support for codes beyond U+FFFF.  The four-digit restriction corresponds to 16-bytes which a constraint in older systems. If you're not a programmer, let's just say it's a long story and leave it at that.</p>

<p>It turns out that the Unicode Hex keyboard has a four-digit limit. To get around it, you can break U+10900 into two 16-byte (i.e. 4-digit) sequences, also known as as a<a href="http://en.wikipedia.org/wiki/UTF-16/UCS-2#Encoding_of_characters_outside_the_BMP"> UTF-16 Surrogate Pair. </a>For <b>U+10900</b>, the surrogate pair is <b>D802+DD0C</b>. So in the Unicode Hex utility, you can now do this.</p>
<ol>
	<li>Hold down the <b>Option</b> key.</li>
	<li>Type <code>D802+DD0C</code>, where the <b>+</b> means type the Plus sign.</li>
       <li>Release the <b>Option</b> key.</li>
</ol>

<p>I bet you're asking - how did she get from  <b>U+10900</b> to <b>D802+DD0C</b>? <a href="http://en.wikipedia.org/wiki/UTF-16/UCS-2#Encoding_of_characters_outside_the_BMP">There is an algorithm,</a> but in this case I got it by opening the <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html#other">Character Palette</a>, finding the character I wanted and mousing over it. When you do that, the Unicode code point appears along with its surrogate pair in parentheses.</p>

<p>Of course, you could also directly <b>Insert</b> the character with the palette, but actually there are times when the <b>Insert</b> doesn't quite work (at some points in the careers of my laptops, I have corrupted my Character Palette so badly, it refused to play with me anymore). </p>

<p>Although this utility seems a little limited at the moment, if there's one thing I have learned is that Unicode no trick has ever gone to waste.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Korean Script Heads to Indonesia</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/08/korean-script-heads-to-indones.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.101729</id>

    <published>2009-08-11T13:27:06Z</published>
    <updated>2009-08-11T14:26:25Z</updated>

    <summary>The biggest sensation in Unicode land these days is that the Korean script Hangul (or Hangeul/Han&apos;gŭl depending on your transliteration preferences) has been adopted by the speakers of Cia-Cia in the nation of Indonesia. This will be the first time...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="CJK" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The biggest sensation in Unicode land these days is that the <a href="http://en.wikipedia.org/wiki/Hangul">Korean script Hangul</a> (or <i>Hangeul/Han'gŭl</i> depending on your transliteration preferences) has been adopted by the <a href="http://en.wikipedia.org/wiki/Cia-Cia_language">speakers of Cia-Cia</a> in the nation of Indonesia. This will be the first time any language other than Korean has adopted Hangul as it's writing system, so it is a cultural triumph for them.</p>

<p>What's interesting is how this decision happened. The <a href="http://thejakartaglobe.com/home/southeast-sulawesi-tribe-using-korean-alphabet-to-preserve-native-tongue/322636">standard press releases</a> are not giving much information and even <a href="http://languagelog.ldc.upenn.edu/nll/?p=1641">the linguistic community is a little perplexed</a>. It's actually more interesting if the Wikipedia report that Cia-Cia was formerly written in the Arabic script (specifically the <a href="http://en.wikipedia.org/wiki/Jawi_script">Jawi variant in Indonesia</a>) is accurate. According to <a href="http://www.ethnologue.org/show_language.asp?code=ci">Ethnologue,  </a>the population is still mostly Islamic, so there shouldn't be a religious reason to switch.</p>

<p>So what about it? First, let's discuss the switch from Arabic. Actually a lot of Muslim communities including speakers of Hausa, Swahili, Malay and Turkish have switched from Arabic to the Latin alphabet. Malaysia and Indonesian are two countries following this trend, although the Jawi/Arabic script is still used in some religious and cultural contexts. There may be a variety of reasons for this including European colonial policy or the perception that the Latin alphabet is easier to learn and enhances literacy (Turkish). A move to the Latin alphabet may also represent a move towards a secular government (as in the case of Turkey).</p>

<p> It should also be mentioned that the Arabic script must be modified heavily when it is used for non-Semitic languages if all the sounds are to be represented. If you look at the <a href="http://www.omniglot.com/writing/malay.htm">Omniglot Jawi chart</a> for example, you will see that many consonants have the same shape but with with different patterns of dots to indicate the differences. This also happens in the Latin alphabet (e.g. n vs. ñ in Spanish), but if Jawi also includes the different letter shapes depending on word position as Arabic, then the script becomes more complex.</p>

<p>Cia Cia is unique though in switching to something other than the Latin alphabet. One reader commented that this may be due to the fact that in South and Southeast Asia, a language gains social status by having its own script. In Indonesia, <a href="http://en.wikipedia.org/wiki/Balinese_language#Balinese_script">Balinese</a>, <a href="http://en.wikipedia.org/wiki/Javanese_language#Javanese_script">Javanese</a> and <a href="http://en.wikipedia.org/wiki/Sundanese_script">Sundanese</a> have their own historic scripts. Although these scripts may not be used on an everyday basis, they do show that there is a cultural tradition having nothing to do with the West.</p>

<p>In theory, Cia Cia could adopt one of these scripts or one from India (e.g. Devanagari) would would probably be a good fit, but none would probably be perceived as being unique in Indonesia. On the other hand...no one else in Indonesia is using Hangul. It is very unique. Fortunately, Hangul is probably a good fit. Although the forms are somewhat angular like Chinese writing, the underlying principles are actually very similar those used in India and Southeast Asia (with some differences of course).</p>

<p>There's another benefit to Hangul over scripts like Javanese and Balinese and that's enhanced Unicode support. Korea has been fortunate enough to have the economic and political influence for developers to develop functional encoding schemes, fonts and input utilities for Hangul. Many Southeast Asian scripts are still catching up Unicode wise. </p>

<p>Whether this is the reason Cia Cia switched to Hangul or not, I wish them the best of luck. I think there are lots of people now invested in the success of this project.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Accessibility and Unicode</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/08/accessibility-and-unicode.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.101054</id>

    <published>2009-08-05T19:38:50Z</published>
    <updated>2009-08-05T21:48:52Z</updated>

    <summary>Here at Penn State my duties include being an accessibility guru as well as being a Unicode guru, and not too surprisingly, Unicode can enhance accessibility in some situations. And not just in the abstract &quot;standards enhance accessibility&quot; but more...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Encoding Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="psuets" label="psuets" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Here at Penn State my duties include being an accessibility guru as well as being a Unicode guru, and not too surprisingly, Unicode can enhance accessibility in some situations. And not just in the abstract "standards enhance accessibility" but more concretely as in:</p>

<h3>It's An Encoded Character, Not a Font Trick</h3>

<p>We all know that relying on fonts to display characters (e.g. the use of the Symbol font for Greek characters) is a Bad, Bad Idea, but it's even worse for a screen reader. Consider the expression <cite>θ = 2π</cite>. In the old Symbol font days, this might have been coded as:</p>

<blockquote>
<code>&lt;p&gt; &lt;font face="Symbol"&gt;q = 2p&lt;/font&gt;&lt;/p&gt;</code>
</blockquote>

<p>And guess what the screen reader would read - <b>Q equals 2 P.</b>  Since the screen reader is essentially "font blind", the underlying text is what is read. Hence the Unicode correct code below is preferred:</p>

<blockquote>

<p><code>&lt;p&gt; θ = 2π&lt;/p&gt;</code></p>
<blockquote>-OR-</blockquote>
<p><code>&lt;p&gt; &amp;theta; = 2&amp;pi;&lt;/p&gt;</code></p>

</blockquote>

<p> If you think about it, the screen reader is a good tool for conceptualizing how characters (and their variants) may function semantically in different contexts.</p>

<p>I should mention that screen readers can get confused with a Unicode character if it can't recognize it, but that's more of a dictionary problem than a Unicode problem. For Jaws, it is possible to<a href="/ejp10/blogs/gotunicode/2008/09/getting-jaws-61-to-recognize-e.html"> install .sbl pronunciation files</a> to increase the character repertoire, especially for math and science.</p>

<h3>It's Text, Not An Image</h3>
<p>Perhaps the biggest advantage for Unicode though is that it allows characters that used to be embedded in images to be just plain text. For instance you could embed the following equation for the volume of a sphere:</p>

<h4>Text</h4>
<p align="center" style="font-size: x-large; font-style: italic; font-family: 'Times New Roman', serif">V = 4/3πr³</p>

<h4>Image</h4>
<p align="center"><img alt="AreaSphere.png V = four thirds pi R cubed" src="/ejp10/blogs/gotunicode/2009/08/05/AreaSphere.png" width="109" height="33" /></p>

<p>Consider what happens though if a low-vision reader (or a middle aged reader with decrepit eye sight) needs to zoom in on the text. As you will see in the screen capture below, the image will pixelate while the text remains crisp.</p>
<h4>Zoomed Text vs Zoomed Image</h4>
<p align="center"><img alt="Enlarged formula. Text is crisper than image" src="/ejp10/blogs/gotunicode/2009/08/05/SpherePixel.png" width="349" height="190"  /></></p>

<p>When you combine Unicode with creative CSS, you can see the possibilities for replacing images, including buttons with text. Not only is this more accessible, but it also results in smaller file sizes and is easier to edit.</p>

<h3>Hearing Impaired Users</h3>
<p>Unicode is actually important for these users because they need to read text captions or transcripts for video and audio. Once you get beyond basic English (e.g. Spanish subtitles)...well you know Unicode will be important.</p>

<h3>Motion Impaired Users</h3>
<p>For these users, the issue probably isn't so much reading text as being able to input it - which is the job of developers of operating systems and software. For motion impaired users, a good generalization is that keyboard access is better than using the mouse which requires a little more hand control. In the past I've commented on usability of various inputting devices, but since most do rely on key strokes, there are really no major complaints here.</p>

<p>One audience I didn't touch was color deficient vision, but except possibly for the Aztec script (which isn't even in Unicode yet)...it's not too much of an issue.</p>


]]>
        
    </content>
</entry>

<entry>
    <title>Looking Forward to the Unicode 33 Conference</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/07/looking-forward-to-the-unicode.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.99279</id>

    <published>2009-07-24T19:37:00Z</published>
    <updated>2009-07-23T19:53:33Z</updated>

    <summary>This October I will be traveling to the Unicode 33 Conference in San Jose to present a tutorial and a paper. The organizers asked if I could feature it on my blog, so here is my honest &quot;plug&quot; for the...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>This October I will be traveling to the <a href="http://www.unicodeconference.org/">Unicode 33 Conference in San Jose</a> to present a tutorial and a paper. The organizers asked if I could feature it on my blog, so here is my honest "plug" for the conference.</p>

<p>I've already been once to Unicode 31, so I can be honestly say it really was the <b>one of he best conference experiences of my career.</b> They generally have an excellent range of programs for all levels and interests and this year is no exception. </p>

<p>If all you know is the term "Unicode" or "foreign language tech stuff", there are beginner pre-conference tutorials which assume you know little to nothing including an introduction to writing systems in general as well as a "grand tour of Unicode." If you know more, you can find tutorials and sessions aimed for die-hard programmers, font designers and project managers. </p>

<p>There will <a href="http://www.unicodeconference.org/conference-at-a-glance.htm">also be sessions</a> on Unicode in Web 2.0 tools like Joomla and Twitter, several sessions focusing on Right-to-Left languages, math and East Asian languages (including Japanese cell phone emoji). </p>

<p>In addition to all the sessions, the major players will be there including presenters from the W3C, Adobe, Microsoft, Google, IBM and Yahoo as well as people from other companies and universities. This is also a gathering spot for members of the Unicode committee, so who knows you may end up talking to in the hall. </p>

<p>If you are seriously interested in Unicode, then this is a great opportunity to learn both the basics and the cutting edge materials. As I said in the title, I am really looking forward to this.</p>]]>
        
    </content>
</entry>

<entry>
    <title>New Features in Windows 7</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/07/new-features-in-windows-7.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.99269</id>

    <published>2009-07-23T19:26:09Z</published>
    <updated>2009-07-23T19:54:35Z</updated>

    <summary>The Windows 7 blog from Microsoft recently posted information on new &quot;global&quot; features for the next version of Windows (which would be Windows 7). Most of the focus in terms of fonts is on languages from South Asia (India in...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Windows" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="psuets" label="psuets" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The Windows 7 blog from Microsoft recently posted information <a href="http://blogs.msdn.com/e7/archive/2009/07/07/engineering-windows-7-for-a-global-market.aspx">on new "global" features for the next version of Windows</a> (which would be Windows 7). Most of the focus in terms of fonts is on languages from South Asia (India in particular) and Southeast Asia (including Lao and Khmer fonts). </p>

<p>Microsoft is also announcing enhancements for displaying Arabic script characters and a new Font control panel (which now include a font preview). Microsoft is also releasing pilots of new localization features.</p>

<p>I suspect a lot of people either skipped Vista or returned to Windows XP, so it is worth mentioning that <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/winkeyvista.html#available">Vista had already included substantial additions</a> in its font and locale repertoire including Ethiopic support and several Indian languages. </p>

<p>I've heard a few good things about Windows 7 through the grapevine, so I am crossing my fingers. I will be anxious to test it in the Penn State environment.</p>]]>
        
    </content>
</entry>

<entry>
    <title>iPhone Support for Romanian (Sorin Sbarnea)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/07/iphone-support-for-romanian-so.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.98806</id>

    <published>2009-07-21T16:49:16Z</published>
    <updated>2009-08-04T16:57:44Z</updated>

    <summary>This is sort of a guest column entrty. A while ago, I wrote about iPhone 3 Unicode support, and a reader Sorin Sbarnea sent me this following as a comment. Sorin Sbarnea I was interested in adding a comment to...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>This is sort of a guest column entrty. A while ago, I wrote about <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/06/iphone-30-unicode-support-stil.html">iPhone 3 Unicode support</a>, and a reader Sorin Sbarnea sent me this following as a comment.</p>
<hr />
<h3>Sorin Sbarnea</h3>
<p>I was interested in adding a comment to your article from
<a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/06/iphone-30-unicode-support-stil.html">http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/06/iphone-30-unicode-support-stil.html</a>
but I was not able to find any way of adding a comment on the page.</p>

<p>IPhone 3.0 does have full support for all Romanian characters,
previous versions did not. In order to be able to enter special
characters you need to use a Romanian keyboard. Due to ergonomics they
decided to not include all character on one keyboard because it would
be too hard to use. Also I want to say that the drag model is genuine
and very practical - after few days you'll see that is much smarter
and more precise than the old click model.</p>

<p>I remember that I submitted 3 bugs on Apple for Romanian support, they
solved one in 2.1 and the rest in 3.0 - so all you have to do is to
complain to them ;)</p>

<p>Just to give you an example: in the first implementation of the
Romanian keyboard the Ă character was the last on the right of the
list of accented A characters after characters that are not used in
Romanian - I complained to them explaining the reasons and now they
solved it. I don't know if I was the cause or the only one complaining
but they solved it anyway. Also - I wasn't able to find *any* error in
Romanian translation of the iPhone - this is something very good -
let's say I wasn't expecting this level of quality for Romanian
translation.</p>
<hr />
<h3>Comments</h3>
<p>The lesson here is that Apple does listen...if you know where to send input. Still not convinced about dragging on the iPhone in general (except for solitaire), but I can be stubborn. I'm glad Sorin is a satisfied Apple customer.</p>

<p>As to the question about submitting comments - I disabled mine because about 99% of them were offers for land in Florida or pharmaceuticals I cannot use. For now, please feel free to contact me at <a href="mailto:ejp10@psu.edu">ejp10@psu.edu.</a> If I hear an outcry for commenting, I may change my mind.</p>]]>
        
    </content>
</entry>

<entry>
    <title>How Unicode Mattered in Iran</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/07/how-unicode-mattered-in-iran.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.97099</id>

    <published>2009-07-02T17:05:35Z</published>
    <updated>2009-07-02T20:28:46Z</updated>

    <summary>As protesters expressed their anger with the Iranian presidential electoral process, the world marvelled at how Twitter, Facebook and other Internet outlets are re being used by Iranians to communicate with each other even as the government was sending out...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
    <category term="psuets" label="psuets" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>As protesters expressed their anger with the Iranian presidential electoral process, the world marvelled at how <a href="http://www.nytimes.com/2009/06/16/world/middleeast/16media.html?_r=2&scp=1&sq=iran%20twitter&st=cse">Twitter, Facebook and other Internet outlets</a> are re being used by Iranians to communicate with each other even as the government was sending out force to suppress the riots. The U.S. State department even requested that  <a href="http://blog.seattlepi.com/thebigblog/archives/171381.asp">Twitter reschedule a fix</a> so as not to interfere with the daylight hours of Iran.</p>

<p>Heady stuff for technologies we normally associate with most insipid of Internet messaging ("OMG - The Orioles lost again?!?"). I'm glad Facebook and Twitter were there, but I suspect that some of the most important messages were in Persian (Farsi) and were made possible by another less glamorous technology - Unicode. Both <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/igbo-in-facebook-it-can-be-don.html">Facebook </a>and <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/the-twitter-unicode-test-a.html">Twitter </a>have had underlying Unicode support in the beginning, so assuming your system had the right fonts, you could communicate in any language from Persian to Igbo and then some.</p>

<p>Although I am normally a symbol geek in my love for Unicode (goes well with my lifelong obsession with fonts, foreign language and exotic characters), at times like these I realize that Unicode is an important tool to the dream of the Internet enabling anyone, anywhere to speak out and be heard. If you are not a symbol geek, but wonder why Unicode is important...I think bloggers and Tweeters in Iran, China and everywhere can show you the answer. Unicode makes it possible for everyone to be heard...even if you haven't had the chance to learn English.</p>

<h3>Postscript: English Digital Divide</h3>
<p>In some countries there is a real digital divide based on language - that is those who have learned a major language such as English, French or Spanish or Chinese and Arabic are able to use the Internet while others who only know a relatively under supported language do not have little to zero access.</p>

<p>For instance, I asked a scholar at a Sri Lankan university how they computed in the Sinhala script, and his answer was that all computing was assumed to be in English (partly because Sri Lanka used to be the British colony Ceylon).  I was a little startled, but it makes sense. Until recently, I suspect that only a few people or institutions in the upper economic tiers could have afforded computers and they were likely already educated in English. Since English support is built in, it might seem a waste to work in support for a "local" script. Still, I think a lot of people and organizations understand the importance of Unicode in increasing access (and preserving local languages) and are working to provide low-cost utilities for these communities.</p>]]>
        
    </content>
</entry>

<entry>
    <title>iPhone 3.0 Unicode Support (Finding the ŵ)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/06/iphone-30-unicode-support-stil.html" />
    <id>tag:www.personal.psu.edu,2009:/ejp10/blogs/gotunicode//516.96178</id>

    <published>2009-06-29T19:34:22Z</published>
    <updated>2009-06-26T21:47:35Z</updated>

    <summary>This week I upgraded my iPhone (actually iPod Touch) software to version 3.0, and although I noted the copy/paste and enhanced landscape display, of course I zoned in on the note saying there was increased character support. Hmmm. As a...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>This week I upgraded my iPhone (actually iPod Touch) software to version 3.0, and although I noted the copy/paste and enhanced landscape display, of course I zoned in on the note saying there was increased character support. Hmmm.</p>

<p>As a warning, I have to admit that I'm a little behind the times in mobile computing, so bear with me if I repeat something you already know. Still, I'm not seeing this information all in one place it it may be a good over (at least for me).</p>

<p>The good news is that there does appear to be more character support, but the feature is still too well-hidden (I really had to work hard to find Welsh support). The iPhone also fails my test for general Unicode readiness because I am not able to yet enter phonetic characters like /ŋ,ɛ,ʃ/ (if nothing else that would kill the iPhone as a remote data entry device). However I doubt the iPhone is really not alone in that area.</p>

<p>So if you are wondering what I am talking about, let me discuss in context:</p>

<h3>Baseline Support</h3>

<p> Unicode data and display for major languages is generally supported. If Safari can display your Unicode Webpage, it will appear correctly on your iPhone...assuming that the built in fonts support the character. Further, if you have entered/purchased an exotic title in iTunes, it will appear correctly in your synched iTunes list on the iPhone.</p>

<h3>Entering Accents</h3>

<p>The next challenge is entering some exotic characters into e-mail or a notes application. If you are dealing with Roman characters, iPhone does have some support, but not as much as I would like. The easiest non-English characters to find are foreign currency symbols like £ (pound), ¥ (yen) and € (euro). You typically access these by clicking the the symbol set (often right after the numerals).</p>

<p>While I was able to figure that out, I admit to being stumped as to how to enter accented letters such as Spanish ñ or French è. Fortunately a quick Google search turned up some help sites including this blog entry from Pixelcoma. As you can see, the trick is to hold down a base key such as <b>N</b> or <b>E</b> to see the options for accented characters. </p>

<p>The trick though is that you have to <b>drag your finger across to the right character. </b> You can't hold and double tap as I tried to do. Oops</p>

<p>As stated earlier, there are more options in the palette than in previous earlier versions. For instance, the Pixelcoma A options show A,À,Á,Ä,Æ,Ã,Å,Ą which already covers lots of Western and Central European languages, but Version 3 does add Ā (macron) which is good for Japanese Romaji, Hawaiian, Maori and Latin with long marks (I know there are Latin users out there). I assume that there are other important additions at the other base letters.</p>

<p>However, there are still apparent gaps such as Welsh accented W and Icelandic þ,ð/Ð as well as Romanian Ă, Turkish Ğ,Ş and İ, Latvian Ņ and other really exotic accented letters. It turns out that many are actually in keyboard options installed on the iPhone with additional characters. It still can feel like these languages are "second" class in comparison to Spanish, French and German (at least Polish, Czech and Hungarian have been "mainstreamed" which is a plus).</p>

<p>Before I leave this section though, I do have a comment for future devlopers: </p>

<p>Future developers - if you want to wow your audience with global accent support, you may want to start here at the Wikipedia Latin palette.</p>
<p><img alt="WikipediaLatinPal.png" src="/ejp10/blogs/gotunicode/2009/06/26/WikipediaLatinPal.png" width="440" height="404"  style="border: 1px solid #000" </p>

<p> That way we can avoid the agonizing incremental addition of accented letters as individual user communities step forward. Why not be comprehensive at the start - like the <a href="http://tlt.its.psu.edu/suggestions/international/accents/codemacext.html">Apple U.S. Extended keyboard</a> (which is major reasons I still love Apple).</p>

<p>As much as I kvetch though, I don't think the iPhone is worse than any other U.S. mobile device. A forum post for Blackberry mentions holding down a vowel and moving a trackball. ¡Qué divertido!</p>

<h3>Other Keyboards</h3>
<p>As mentioned previously, if your character is not available in the accent palette, you may need to activate the keyboards (just like in the laptop/desktop). On the iPhone, you access these by clicking the <b>Settings</b> app, then going to <b>General Settings</b> then <b>International</b>. A number of keyboards for languages like Chinese, Japanese, Russian, Hebrew, Arabic as well as Icelandic, Turkish, Latvian are available (still no Welsh, unless it's hiding under the U.K. keyboard (yes it is !)).</p>

<p>This adds a globe icon (like the one below) to the usual iPhone keyboard and allows you to switch between keyboard modes. I just switched to the U.K. keyboard and behold, I <b>found the ŵ</b> under the W key (but now the ¥ key is missing).</p>

<p><img alt="GlobeIcon.png" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2009/06/26/GlobeIcon.png" width="386" height="77"  /><br /><b>Icon for International Keyboards on iPhone</b></p>



<h3>What I Really Want...</h3>

<p>Actually it's not necessarily more accented letters as I hold down a key. My thumb is shuddering at how the potential pain of dragging or trackballing additional accents on top of the other precision maneuvers required for English texting. I actually want several things</p>

<p>First, slightly better keyboard designs. The iPhone Google keyboard has the right idea when it makes the @ sign and .com extension basic keys. We already have options for switching on canned keyboards, but what if we had options for customizable keyboards. Maybe one with a "symbol" dock into which you drag the characters or phrases you need from a master slot (this way Americans learning Welsh CAN have their accented W's). Maybe you can reshuffle as well (like killing the \ key if you only synch with a Mac).</p>

<p>But I have to confess that I really want to be able to plug my iPhone into a keyboard. IThe touch interface is fine short small tasks on the run (like looking up movie times or weather by zip code), but still not so great for longer data entry or note taking tasks. I know it's Palm Pilot, but I am at a stage where I would like to ditch the laptop for short meetings and only carry a mobile device and take notes. I note that there are <a href="http://www.theiphoneblog.com/2009/03/25/external-iphone-keyboard-hack-100-jailbreak-free/">there are hacks out there already</a>...despite the <a href="http://www.iphonehacks.com/2007/07/iphonetipstrick.html">useful shortcuts provided.</a> That should be a sign for Apple and other makers of mobile devices that the need is out there (bummer dudes).</p>

<p>It goes without saying that if true Mac keyboard integration comes, it should come with support for the U.S. Extended and other keyboard variations Apple and the user community have concocted (Windows users can use the<a href="/ejp10/blogs/gotunicode/2009/02/windows-us-international-keybo.html"> U.S. International keyboard for the Mac</a>).</p>

<p>A final wish though is <b>better documentation.</b> The Unicode support for iPhone is decent, but it's quite a chore tracking it all down through numerous user blogs and guessing. I know Apple relies somewhat on it's "intuitive" interface to help users through, but, for whatever reason, Unicode support is rarely intuitive. You just have to know where things are. I'm glad there's a user community out there but from the lack of documentation (especially in comparison to Microsoft) it seems like Apple doesn't care about these issues (when I think they really do).</p>

<p> Microsoft has various <a href="http://msdn.microsoft.com/en-us/goglobal/bb688110.aspx">Globalization sites</a> (in English), so why can't Apple (or at least one I can find)? Is it because we're in the U.S? To me, It's a little condescending to me to assume that just because I live in the U.S. I will rarely need to enter non-English text. In fact, I type something "non-English" nearly every day. </p>

]]>
        
    </content>
</entry>

</feed>
