<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Got Unicode?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/" />
    <link rel="self" type="application/atom+xml" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/atom.xml" />
    <id>tag:www.personal.psu.edu,2008-01-24:/ejp10/blogs/gotunicode//516</id>
    <updated>2008-08-29T20:42:11Z</updated>
    <subtitle>Elizabeth Pyatt&apos;s Unicode tips, resources and war stories.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.2-en</generator>

<entry>
    <title>The Twitter Unicode Test: A</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/the-twitter-unicode-test-a.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.16182</id>

    <published>2008-08-29T20:27:46Z</published>
    <updated>2008-08-29T20:42:11Z</updated>

    <summary>Just for kicks, I decided to run Twitter through some Unicode tests, and I give it an A. For the record, I pretty much knew from Twittervision it supported a lot of encodings, but I threw a few more exotic...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Just for kicks, I decided to run Twitter through some Unicode tests, and I give it an A. For the record, I pretty much knew from <a href="http://twittervision.com/">Twittervision</a> it supported a lot of encodings, but I threw a few more exotic tests...just to see.</p>


<p>The first was <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2007/09/the-ipa-unicode-friendliness-t.html">my standard phonetic character test</a>...from a Mac. As far, as I'm concerned you have to pass this to be a serious global Unicode contender in my book. I also through in a long vowel (ā) and the one Hebrew word I can type (שבלת) or "shibboleth" to confirm right to left support.</p>

<p>What impressed me though, oddly was the the support for entity codes like &amp;eacute; (é) and &amp;&x0909; (&#x0909);. Twitter can accept <b>either</b> raw é or it can take  &amp;eacute;  and convert it to é. This differs from other modern tools like Facebook or XML <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/igbo-in-facebook-it-can-be-don.html">which can only accept raw Unicode input</a> (entity codes break).</p>

<p>Accepting either format is probably a pain to program, but very nice for the user. Having to remember when to enter entity codes and when to enter raw Unicode is confusing, but still an all-too-common reality. I appreciate Twitter for making the transition a little easier... even it's only for 140 characters.</p>

<p align="center">
<img alt="Screen capture of Twitter messages with Unicode Characters in test messages" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/29/TwitterUnicode.png" width="641" height="276" /></p>]]>
        
    </content>
</entry>

<entry>
    <title>Hebrew Computing Listserv</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/hebrew-computing-listserv.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.15217</id>

    <published>2008-08-23T12:44:56Z</published>
    <updated>2008-08-21T13:22:01Z</updated>

    <summary>If you are working with Hebrew, a helpful list may be the Hebrew Computing User Group on Yahoo. You have to join the list to see the messages, but they do cover a wide range of topics. For other resources,...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Hebrew" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>If you are working with Hebrew, a helpful list may be the <a href="mailto:http://groups.yahoo.com/group/hebrewcomputing/">Hebrew Computing User Group on Yahoo.</a> You have to join the list to see the messages, but they do cover a wide range of topics.</p>

<p>For other resources, you can check the <a href="http://tlt.its.psu.edu/suggestions/international/bylanguage/hebrew.html">Penn State Hebrew Computing Information Page</a> (which, by pure coincedence, I edit).</p>]]>
        
    </content>
</entry>

<entry>
    <title>Chinese Olympic Pictograms</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/chinese-olympic-pictograms.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.15218</id>

    <published>2008-08-21T13:22:17Z</published>
    <updated>2008-08-21T13:31:50Z</updated>

    <summary>One of the more interesting &quot;color&quot; pieces on the U.S. Olympics coverage on NBC was a piece on how the icons for the different sports were inspired by early Chinese pictograms, which were the precursors of the modern Chinese characters....</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>One of the more interesting "color" pieces on the U.S. Olympics coverage on NBC was a piece on how the <a href="http://en.beijing2008.cn/63/32/column212033263.shtml">icons for the different sports</a> were inspired by <a href="http://www.ocf.berkeley.edu/~wwu/chinese/handout.html">early Chinese pictograms,</a> which were the precursors of the modern Chinese characters.</p>

<p>You can read a bit more <a href="http://english.peopledaily.com.cn/200608/07/eng20060807_290917.html">about the design process</a> in this article from the People's Daily Online. </p> 

<p>The use of <a href="http://en.beijing2008.cn/63/32/column212033263.shtml">ancient art for modern Olympic pictograms</a> is not new (see the entries from Athens, Syndney, Lillehammer and Salt Lake City) but I think this was the first time it made it to television.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Yale Chinese Support Site</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/yale-chinese-support-site.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.15216</id>

    <published>2008-08-21T12:31:11Z</published>
    <updated>2008-08-21T12:37:55Z</updated>

    <summary>Despite some of my previous entries, it&apos;s a fact that I really know very little about Chinese writing (I think I can recognize the characters 1,2,3). But if I really had to figure out what was going on the first...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="CJK" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Secret Unicode Link" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Despite some of my previous entries, it's a fact that I really know very little about Chinese writing (I think I can recognize the characters 1,2,3). But if I really had to figure out what was going on the first place I would probably go to is <a href="http://www.yale.edu/chinesemac/">Yale Chinese Mac</a> which started back in the Mac Classic days.</p>

<p>Ironically though, the site is no longer just Chinese on a Mac, but includes information on Chinese on Windows, Chinese on Palm Pilot, encodings, free fonts and more. Many mysteries can be resolved here. If only I could find one of these for every script!</p>

<p><b>URL:</b> <a href="http://www.yale.edu/chinesemac/">http://www.yale.edu/chinesemac/</a></p>]]>
        
    </content>
</entry>

<entry>
    <title>Math Magic Equation Editor &amp; Unicode Fonts</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/math-magic-equation-editor-uni.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.14770</id>

    <published>2008-08-08T19:12:16Z</published>
    <updated>2008-08-08T20:11:37Z</updated>

    <summary>One challenge for math is laying out the actual equations like this integral below. The tool of choice of for many in the math/science industry is the equation editor which allows you to insert text and symbols into different &quot;layouts&quot;...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>One challenge for math is laying out the actual equations like this integral below.</p>

<img alt="Integral of C sub v ( T ) d T from T to T sub ref" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/08/Area%3DIntegralSerif.mmf.gif" width="94" height="51" align="center"/>

<p>The tool of choice of for many in the math/science industry is the equation editor which allows you to insert text and symbols into different "layouts" (e.g. an integral, fraction, matrix, etc). See the image at the bottom. It's a lot quicker than Illustrator. And an equation editor can usually export the output in different graphics formats and some can export LaTeX and MathML (Ooooh!) I chose <a href="http://www.mathmagic.com/">Math Magic</a> primarily because it works on a Mac as well as Windows, but it's similar to other tools I have seen including the one bundled with Microsoft Office.</p>

<p>The one quirk that I previously developed methods to insert Unicode symbols via the Character Pallette or custom math symbol keyboard. Another time you might need to use non-Math Magic character insertion if you are using an especially exotic character (this happened to me once).</p>

<p>However, when I tried the Character Palette on  MathMagic, the result was the square box of death meaning the character did not "exist." Fortunately...I realized that it was a font issue. As soon as I switched to a dedicated Math Unicode font like <a href="http://users.teilar.gr/~g1951d/">Unicode Symbols</a>, all was well. But now I wonder about the default font. </p>

<p>The quirky fonts are not a problem if you're exporting an image or working with text, but if it's MathML it could be problematic (but maybe I'm being paranoid). In any case, I sense a future MathML test coming.</p>

<h3>Typical Equation Editor Interface</h3>
<img alt="Math Magic Interface. Tool bar includes templates with squares where text can be inserted" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/08/08/MathMagicInterface.png" width="505" height="285" class="mt-image-center" align="center">

<h3>Postscript: The MathML Test</h3>

<p>The good news was that I was able to export a Math ML file and get the result to work in another HTML page. I should note that the &lt;?xml...?&gt; does not specify UTF-8 encoding. In theory, this shouldn't be a problem, but I might add the "encoding=UTF-8" part to make sure nothing weird is happening. The file also includes a custom &lt;annotation encoding="MathMagic"&gt; tag which is filled with vendor-generated code.  I'm not sure what this does, but I will probably leave it in...just in case</p>]]>
        
    </content>
</entry>

<entry>
    <title>A New German Unicode Letter - Capital S Sharp</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/07/a-new-german-unicode-letter-ca.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.13892</id>

    <published>2008-07-18T13:33:43Z</published>
    <updated>2008-08-08T21:01:34Z</updated>

    <summary>A relatively &quot;hot&quot; new addition to Unicode 5.1 is LATIN CAPITAL LETTER DOUBLE S (aka Sharp S or ß) for German. I&apos;d thought I&apos;d write about this because it covers both policy and an important Unicode concept of casing. About...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>A relatively "hot" new addition to Unicode 5.1 is LATIN CAPITAL LETTER DOUBLE S  (aka Sharp S or ß) for German. I'd thought I'd write about this because it covers both policy and an important Unicode concept of <cite>casing.</cite> 

<h3>About Sharp S (ß)</h3> 

<p>Many of you may already know about <a href="http://german.about.com/library/weekly/aa092898.htm">lowercase Sharp S (ß) </a>which is used in German spelling as a replacement for "ss". For instance, the German word <cite>gross</cite> 'large' could also be spelled as <cite>groß</cite> and <cite lang="de">Strasse</cite> 'street' can be spelled as <cite>Straße.</cite> The form itself is  <a href="http://en.wikipedia.org/wiki/%C3%9F">an old manuscript convention</a> that was incorporated into modern typograhpy.</p>

<p>So far so good, but what it means from a computing perspective is that any program working with German text has to know that <cite>gross</cite> and <cite>groß</cite> are essentially the same word, just with sligthly different spellings. If you're looking in a library database for instance, you would want to see both sets of results.  On an interesting side note, I entered in <cite>groß</cite> and pulled up the English Wikipedia page on the "gross" unit of measure as the first result - correct, but weird..
</p>

<h3>But not capital ß</h3>
<p>In official German spelling convention, there is NO CAPITAL SHARP S. First, no German word starts with "SS", so no word could ever begin with <b>ß</b> anyway. But even if a word is in all-caps or small caps, the convention should be to convert all <b>ß</b> to <b>SS</b> - thus groß should be GROSS in all caps.</p>

<p>Makes sense...except that people in German DO use capital Sharp S in some signs, gravestones and business names (similar to "Nite-Quil" instead of "Night-Quill"). The <a href="http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2888.pdf">2004 Proposal on Encoding Capital S Sharp (PDF)</a> contains a variety of photographs of Capital S Sharp in use. You can see one of these on <a href="http://en.wikipedia.org/wiki/Capital_%C3%9F">Wikipedia (Capital ß page)</a>.  In other words, Unicode ultimately has to bow to social usage.</p>


<p>So finally we have the official Unicode announcement...

<h3>Official Unicode 5.1 Announcement</h3>
<b>U+1E9E LATIN CAPITAL LETTER SHARP S</b>

<blockquote>In particular, capital sharp s is intended for typographical representations of signage and uppercase titles, and other environments where users require the sharp s to be preserved in uppercase. Overall, such usage is rare. In contrast, standard German orthography uses the string "SS" as uppercase mapping for small sharp s. Thus, with the default Unicode casing operations, capital sharp s will lowercase to small sharp s, but not the reverse: small sharp s uppercases to "SS". In those instances where the reverse casing operation is needed, a tailored operation would be required.</blockquote>
<p align = "right"><a href="http://unicode.org/versions/Unicode5.1.0/">http://unicode.org/versions/Unicode5.1.0/</a></p>

<p>There's a very nice write up of the issue at <a href="http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2888.pdf">http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2888.pdf (PDF)</a></p>

<h3>Now What?</h3>

<p>First the fonts will have to be developed to include a capital ß variant. This may or be in your system yet. Here's a quick test. It wasn't looking good, even though I am on on Leopard Mac.</p>

<table class="chart" cellpadding="3" cellspacing="0"> 
<tr>
    <th scope="col">Character Name</th>
    <th scope="col">Unicode Number</th>
    <th scope="col">Character</th>
</tr>
<tr>
   <td>LATIN SMALL LETTER SHARP S</td>
   <td>U+00DF</td>
   <td>&#x00DF;
</tr>
<tr>
   <td>LATIN CAPITAL LETTER SHARP S</td>
   <td>U+1E9E</td>
   <td>&#x1E9E;
</tr>
</table>

<p>Next comes the "casing" question. Casing is the set of eqiuvalences which match capital and lowercase letters as "the same" even though they are really two Unicode code points. For instance <b>capital A</b> is U+0041 (ASCII 65) encoded as while <b>lowercase A</b> is U+0061 (ASCII 97). When you search Google and most databases, both <b>A</b> and <b>a</b> are treated the same (yet are kept distinct enough so that you can switch between <b>A</b> and <b>a</b> in your word processor). Note that English casing also conflates <b>Á,Å,À,Ä</b> as just A.</p>

<p><b>Update from 8 Aug:</b> Technically this probably isn't "casing", but the principle is the same - you conflate certain variants as "one" character.</p>

<p>As stated before, official German spelling does not recognize capital ß, but not surprisingly, there was a discussion in the Unicode list just this week on whether this too will change over time. I'll be staying tuned.</p>



<h3>A Linguistic Closing Thought</h3>
<p>Normally linguists talk about seeing a sound change or a grammar change in progress, but this appears to be a spelling change in progress. <a href="http://en.wikipedia.org/wiki/%C3%9F#Capital_.C3.9F">Wikipedia Capital ß </a>page claims that legal documents often use capital sharp S in all cap names in order to avoid ambiguiity (e.g the defendant Hans Straßer or HANS STRAßER). And apparently the most notorious use of capital ß is the title page of Der Große Duden (The Great Duden dictionary) which was rendered as DER GROßE DUDEN. Clearly the capital sharp S was destined for permanent encoding.</p>

]]>
        
    </content>
</entry>

<entry>
    <title>List of Old Church Slavonic Fonts</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/list-of-old-church-slavonic-fo.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.10896</id>

    <published>2008-05-28T14:05:58Z</published>
    <updated>2008-08-21T12:42:39Z</updated>

    <summary>AATSEEL (American Association of Teachers of Slavic and East European Languages) has just posted a set of links to &quot;Medieval Slavic Fonts&quot; for Old Church Slavonic, Glagolitic and Blackletter. See http://www.aatseel.org/medieval_slavic_font for more information List includes Unicode fonts and older...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Cyrillic &amp; Eastern Europe" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Secret Unicode Link" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>AATSEEL (American Association of Teachers of Slavic and East European Languages) has just posted a set of links to "Medieval Slavic Fonts" for Old Church Slavonic, Glagolitic and Blackletter. 

<p>See <a href="http://www.aatseel.org/medieval_slavic_font">http://www.aatseel.org/medieval_slavic_font</a> for more information

<p>List includes Unicode fonts and older non-Unicode fonts</p>]]>
        
    </content>
</entry>

<entry>
    <title>Working with Doublestruck P &amp; Q (ℙ&amp; ℚ)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/working-with-doublestruck-p-q.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.10042</id>

    <published>2008-05-09T22:16:53Z</published>
    <updated>2008-05-09T22:48:20Z</updated>

    <summary>As I&apos;ve been reporting in recent entries, I&apos;ve been working with a symbolic logic course which has been using various exotic symbols including double struck P (ℙ). Since every Unicode point seems to have its own story, I thought I...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>As I've been reporting in recent entries, I've been working with <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/micrsoft-word-logic-inserting.html">a symbolic logic course</a> which has been using various exotic symbols <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/glyph-du-jour-doublestruck-p.html">including double struck P (ℙ).</a> Since every Unicode point seems to have its own story, I thought I would report some of the ineresting challenges for this character.</p>

<h3>Finding It</h3>
<p>When you are discussing a topic with lots of different symbols, you soon realize that in terms of Unicode, they will come from multiple blocks. For instance double struck P is from the <b>Letter Like Symbols block</b>  (starts at U+2100), while other math symbols may be in <b>Arrows</b> block, the <b>Number Forms</b> block, the <b>Mathematical Operators Block</b> or possibly the <b>Dingbats Block</b>. You can see from the <a href="http://www.unicode.org/charts/symbols.html">Unicode Org Symbols and Punctuation Chart </a>just how many blocks are involved.</p>

<p>Although a user doesn't normally have to know the Unicode point value, because many insertion tools such as the Windows Character Map, Mac Character Pallete or others are organized primarily by block, you do have to sort of have an idea of how blocks work.</p>

<h3>Rarity</h3>
<p>Fonts with a robust set of math symbols are still pretty rare, and sometimes the letter like symbols are even rarer. At one point I had ℙ (P) pulling from one font and ℚ (Q) from another...interesting.  Below are some fonts I know have doublestruck letters like ℙ,ℚ.</p>

<ul>
      <li><b style="color:#006">Windows/Mac Leopard</b> - Arial Unicode MS </b></li>
      <li><b style="color:#006">Macintosh OS X</b> - Apple Symbol, Hiragino Mincho Pro W3</b> (Japanese), <b>Hiragino Mincho Pro W6</b> (Japanese ), <b>Lucida Sans</b> </li>
      <li><a href="http://users.teilar.gr/~g1951d/">Unicode Symbols</a></li>
      <li><a href="http://dartcanada.tripod.com/Objets/Old/hh/hindhist.html#hu">Hindsight Unicode </a></li>
      <li><a href="http://everywitchway.net/linguistics/fonts/chrysuni.html">Chrysanthi (Chryʃsanþi)</a></li>
    </ul>

<h3>Formatting Issues</h3>
<p>Normally I try to avoid font and size specifications, but double struck P is an interesting counterexample. One challenge is that because the legs are hollowed out, it has a much lighter visual appearance than say normal P. My base text is 12 px on the Web, but for the double struck P, I decided to bump up the size to about 16 px (in a standards-compliant way of 1.3 em).</p>

<p>The other issue was selecting font faces. I wanted one with thick double legs - if you look at the font chart below from my Mac, you'll see that some fonts had some very skinny legs. </p>

<p align="center">
<img alt="Double Struck P in multiple fonts as seen on Mac Character Palette" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/09/PDoubleStruck.png" width="793" height="294" "/>
</p>

<p>I also prefer the serif fonts in this case since I personally believe serifs help inexperieced users in reading unfamiliar scripts (in this case undergraduate college students). For this course, I'll probably point students to some freeware fonts I like</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Glyph Du Jour: Doublestruck P (ℙ)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/glyph-du-jour-doublestruck-p.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.10041</id>

    <published>2008-05-09T21:47:16Z</published>
    <updated>2008-05-09T22:10:34Z</updated>

    <summary>Math symbols can stretch the boundaries of Unicode display technology, but not as much as some other related blocks like Letterlike Symbols the home of such symbols as ℙ (double struck P, see image below), ℚ (double struck Q), and...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Math symbols can stretch the boundaries of Unicode display technology, but not as much as some other related blocks like <b>Letterlike Symbols</b> the home of such symbols as ℙ (double struck P, see image below), ℚ (double struck Q), and even the pharmacy prescription symbol (℞). </p>

<p align="center">
<span class="mt-enclosure mt-enclosure-image"><img alt="GlyphSampleDSP.png" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/09/GlyphSample.png" width="464" height="520" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;"/></span>
</p>

<p>Double struck letters in particular are used in different branches of mathematics to respresent, for instance, the set of all real numbers (double struck R) or in symbolic logic to symbolize <b>any atomic proposition.  See the table below for different double struck letters and their Unicode values. See the Penn State Math Symbol chart for other <a href="http://tlt.its.psu.edu/suggestions/international/bylanguage/mathchart.html#let">common letter like symbols of math.</a> </b>

<table class="chart" cellpadding="3" cellspacing="0">
<tr>
    <th class="midblue" scope="col">Character Name</th>

    <th scope="col">Character</th>
    <th scope="col">Entity</th>
    <th scope="col">Num<br>
      Entity</th>
    <th scope="col">Hex <br>
      Entity </th>

  </tr>
 
  <tr>
    <td>DOUBLE-STRUCK REAL NUMBER   (Double&nbsp;R) </td>
    <td class="mightyglyph">ℝ</td>
    <td class="optioncode tealdark">--</td>
    <td>&amp;#8477;</td>

    <td class="navy">&amp;#x211D;</td>
  </tr>
  <tr>
    <td>COMPLEX NUMBERS   (Double&nbsp;C) </td>
    <td class="mightyglyph">ℂ</td>
    <td class="optioncode tealdark">--</td>

    <td>&amp;#8450;</td>
    <td class="navy">&amp;#x2102;</td>
  </tr>
  <tr>
    <td>NATURAL NUMBERS (Double&nbsp;N) </td>
    <td class="mightyglyph">ℕ</td>

    <td class="optioncode tealdark">--</td>
    <td>&amp;#8469;</td>
    <td class="navy">&amp;#x2115;</td>
  </tr>
  <tr>
    <td>PRIME NUMBERS (Double&nbsp;P) </td>

    <td class="mightyglyph">ℙ</td>
    <td class="optioncode tealdark">--</td>
    <td>&amp;#8473;</td>
    <td class="navy">&amp;#x2119;</td>
  </tr>
  <tr>
    <td>RATIONAL NUMBERS (Double&nbsp;Q) </td>

    <td class="mightyglyph">ℚ</td>
    <td class="optioncode tealdark">--</td>
    <td>&amp;#8474;</td>
    <td class="navy">&amp;#x211A;</td>
  </tr>
  <tr>
    <td>INTEGERS (Double&nbsp;Z) </td>

    <td class="mightyglyph">ℤ</td>
    <td class="optioncode tealdark">--</td>
    <td>&amp;#8484;</td>
    <td class="navy">&amp;#x2124;</td>
  </tr>
</table>]]>
        
    </content>
</entry>

<entry>
    <title>Arial Unicode on OS X (Leopard)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/05/arial-unicode-on-os-x-leopard.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.9511</id>

    <published>2008-05-01T21:02:04Z</published>
    <updated>2008-05-01T21:21:45Z</updated>

    <summary> I was able to upgrade to Leopard recently on my Mac which means I&apos;m able to manipulate a working version of Arial Unicode MS for the Mac...yeah. Web Display My blog actually switched to Arial Unicode because of the...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[ <p>I was able to upgrade to Leopard recently on my Mac which means I'm able to manipulate a working version of Arial Unicode MS for the Mac...yeah.</p>

<h3>Web Display</h3>
<p>My blog actually switched to Arial Unicode because of the way I had coded the CSS. It was very legible, but the x-height seemed smaller in comparison to the Apple Lucida Grande - so I reordered the priority. I will have to see if I can download Lucida Grande onto Windows via the Windows Safari download.</p>

<h3>Back to the Logic Symbols in Word</h3>
<p>Most of my recent Unicode adventures have been about <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/micrsoft-word-logic-inserting.html">inserting logic symbols like (∨,∧,⊃) into Word </a> (and later Excel).  My main struggle has been that if I insert them from the Character Palette, the font switches to Symbol... which is OK until I start typing English. At that point I will stop outputting the English alphabet and σταρτ ουτπυτιν τηε γρεεκ αλπηαβετ. Greek is great...unless you're typing English text. I was using the left arrow key quite a bit.</p>

<p>Now that Microsoft has developed a working version of Arial Unicode MS, I can input the symbols without switching over to Greek. The only gotcha is that I have to shif old logic symbols out of their pre Arial Unicode fonts (thank goodness for keyboard shortcuts). What I'm hoping is that I can bypass the big font switch in Windows word too.</p>

<p>So I'm happy to say that we're adding another small step towards Unicode compatibility. Finally I can have logic symbols in a non-Greek, non-Japanese, non-Chinese font!</p>]]>
        
    </content>
</entry>

<entry>
    <title>Language Codes: Dialect vs. Macrolanguage</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/04/language-codes-dialect-vs-macr.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.9114</id>

    <published>2008-04-25T18:44:17Z</published>
    <updated>2008-04-25T19:00:20Z</updated>

    <summary>A while ago, I was writing about the difficulty of defining some language tags like Cantonese because even though it&apos;s called a dialect, it&apos;s really a separate language. The SIL group is using a new term I think should become...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Language Codes" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>A while ago, I was writing about the difficulty of defining <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2007/05/picking-the-right-cantonese-la.html">some language tags like Cantonese</a> because even though it's called a <cite>dialect</cite>, it's really a separate language.</p>

<p>The SIL group is using a new term I think should become more common - the <cite>macrolanguage.</cite> A <a href="http://www.sil.org/ISO639-3/scope.asp#M">macrolanguage</a> is basically a set of related languages that share a common "identity" even though speakers can't normally understand each other. 

<p>Macrolanguages happen when language spreads to different regions and changes, but the cultural or political unity remains. <a href="http://www.sil.org/iso639-3/macrolanguages.asp">Other macrolanguages</a> include Arabic, Cree, Hmong, Quechua (as spoken in the Incan Empire), and Norweigian. I suspect that you could thrown in some other candidates like German and Italian - (we'd have more if the Roman Empire had made it to the 21st century.)</p>

<p>In any case, The ISO-639-3 language tag standard has a set <a href="http://www.sil.org/ISO639-3/macrolanguages.asp">of macrolanguage mappings</a> which show how different related languages can map to each other so that either Mandarin Chinese (cmn) or Cantonese (yue) can also be called Chinese (zh or zho)</p>


<p>I really hope this term takes hold...because I really think it will simplify other discussions about language  tags. After all, it was just this year that a language technology guru claimed that English had no "true dialects." I think he meant to say that English hasn't reached macrolanguage status yet.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>What&apos;s New in Unicode 5.1?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/04/whats-new-in-unicode-51.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.8988</id>

    <published>2008-04-23T19:35:55Z</published>
    <updated>2008-04-23T20:01:22Z</updated>

    <summary> Unicode version 5.1 was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Ancient Scripts" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Arabic Script" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Encoding Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="South Asian" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p><a href="http://www.unicode.org/versions/Unicode5.1.0/"> Unicode version 5.1 </a>was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts and utilities, but here is a partial list of new  items</p> 

<p>If you're interested in the new characters, the best place to view them is at <a href="http://www.unicode.org/charts/">http://www.unicode.org/charts/</a>
<h3>New Plane 0 Scripts</h3>
<ul>
       <li>Cham (Cambodia/Vietnam)</li>
       <li>Kayah Li (Thailand/Myanmar)</li>
       <li>Lepcha (India)</li>
        <li>Ol Chiki/Santali (India)</li>
       <li>Rejang (indonesia)</li>
        <li>Saurashtra (India)</li>
	<li>Sundanese (Indonesia)</li>
        <li>Vai (Liberia)</li>
</ul>

<h3>Script Extensions</h3>
<p>These blocks add characters to previously encoded scripts.</p>
<ul>
	<li>Cyrillic Extended-A</li>
        <li>Cyrillic Extended-B</li>
        <li>Arabic - characters for math, 4 Qu'ranic and multiple characters for different languages</li>
        <li>Indic - Malayalam, Tamil character sequences, Devanagari chandra a, <br />
       Sanskrit sounds in Gurmukhi, Oriya, Telegu</li>
       <li>Latin - characters for minority languages and capital German sharp S (rare)</li>
       <li>Math Symbols</li>
        <li>Medievalist Punctuation - for research</li>
        <li>Myanmar Additions</li>
</ul>

<h3>New Plane 1 Ancient Scripts and Miscellaneous Symbols</h3>
<ul>
        <li>Carian (Anatolia/Turkey)</li>
        <li>Lycian (Anatolia/Turkey)</li>
        <li>Lydian (Anatolia/Turkey)</li>
	<li>Phaistos Disk (Crete)</li>
        <li>Domino Tile Symbols</li>
        <li>Mahjong Tile Symbols</li>
</ul>
]]>
        
    </content>
</entry>

<entry>
    <title>Microsoft Word ∧ Logic: Inserting the Right Code Point</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/micrsoft-word-logic-inserting.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.6956</id>

    <published>2008-03-27T18:09:22Z</published>
    <updated>2008-03-27T18:58:03Z</updated>

    <summary> The Insert Symbol Tool in Word As I said last entry, I&apos;m working on a symbolic logic course and am learning new quirks for dealing with with Unicode logic symbols...and one of them apparently is the Microsoft Word Insert...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Windows" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[ <h3>The Insert Symbol Tool in Word</h3>
<p>As I said last entry, I'm working on a symbolic logic course and am learning new quirks for dealing with with Unicode logic symbols...and one of them apparently is the Microsoft Word Insert Symbol tool (this is found by going to <b>Insert » Symbol</b> in most versions of Word.</p>

<p>Like the <a href="http://tlt.its.psu.edu/suggestions/international/accents/charmap.html">Windows Character Map</a> and <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html">Mac Character Palette</a>, the Insert Symbol tool lets you insert single characters into a document so you can change "P implies Q" to the logical formulation P ⊃ Q or P → Q depending on your symbolism (and you can also switch between "P and Q," P &amp; Q or P ∧ Q). </p>

<p>But...unlike the Windows Character Map/Mac Character Palette, the Insert Symbol tool can take you on a little detour out of standard Unicode and into the Microsoft Private Use Area block - or the block where vendors can define their own characters. For instance, when I tried to insert the character ∩ (union) into a document, I noticed that the Insert Symbol palette gave a code point of <b>U+F0C7</b> instead of the expected <b>U+2229</b>, and yes the U+F0 code is a sign that you are in the Private Use Area.</p>

<span class="mt-enclosure mt-enclosure-image"><img alt="InsertMathSymbolMac.png" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/27/InsertMathSymbolMac.png" class="mt-image-center" style="margin: 0pt auto 20px; text-align: center; display: block;" height="293" width="551" /></span>


<p>First I should say that there is a rationale for this. You'll notice that the font in the graphic is set to "<b>Symbol</b>" which is an older pre-Unicode font which was used to insert lots of special mathematical symbols. The Private Use set-up undoubtedly prevents a lot older documents from breaking.</p>

<h3>So What?</h3>
<p>If all you're doing is using with Word, the Insert Symbol tool may still be working for you. But these days, more and more documents are actually destined for the Web or some other format...and not all tools recognize the Microsoft Private Use codes.</p>

<p>The way I first noticed that the logic symbols weren't standard Unicode was that some logic symbols did not "convert" well to HTML in <a href="https://blogs.psu.edu/ejp10/blogs/gotunicode/2008/03/course-genie-and-unicode-a.html">Course Genie</a> but mysteriously became things like "(". The ones I had inserted properly converted, but not the ones inserted with the Word Symbol tool.  Ugh.</p>

<p>The use of proper Unicode versus an older format does have a real world impact.</p>

<h3>Summary</h3>
<p>To avoid the Private Use function in new Word documents just always use the W<a href="http://tlt.its.psu.edu/suggestions/international/accents/charmap.html">Windows Character Map</a> and <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html">Mac Character Palette</a>. On Windows, you may need to switch the font to Arial Unicode.</p>

<p>Or if you're especially insane, you can develop your own logic symbol keyboard utility.</p>


]]>
        
    </content>
</entry>

<entry>
    <title>Course Genie and Unicode: A–</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/course-genie-and-unicode-a.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.6503</id>

    <published>2008-03-21T12:47:18Z</published>
    <updated>2008-03-21T14:09:57Z</updated>

    <summary>Since my day job is online course developer, I get to work with a lot of academic tools, including my newest tool the Course Genie (or Wimba Create) Word plugin. This is a tool which takes a Word file &quot;injected&quot;...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Since my day job is online course developer, I get to work with a lot of academic tools, including my newest tool <a href="http://www.wimba.com/products/wimbacreate/">the Course Genie (or Wimba Create) Word plugin.</a> </p>

<p>This is a tool which takes a Word file "injected" with the right styles and converts a long Word manuscript into a set of well-formed HTML documents complete with table of contents page and page navigation. Even if you don't insert any self-test quizzes, this is a major time saver. But...can it do Unicode?</p>

<p>For once, this is a real issue since the course I'm working on is symbolic logic and uses plenty of 
specialized symbols like ∪,∩,∃x,∀x and so forth. So far I've been pleasantly surprised to discover that the CourseGenie planners did think ahead and implemented decent Unicode strategies.</p>

<p>The good news is that if your instructor (aka subject matter expert) hands you a Word file including these symbols, you may not have to do much other than make sure that the symbols are inserted from the Character Map and not from an old custom font. Course Genie by default will either convert these to numeric codes...or if you select a special UTF 8 theme, even include the UTF-8 meta tag.</p>

<p>For most modern browsers this is sufficient. The only gotcha is that it sets everything to Verdana text (even the symbols) and IE 5/6 acts a little strange when fonts for special characters are pre-specified for Arial Unicode.</p>

<p>The other complaint is that  that most theme settings insert the ISO-8859-1 Latin-1 encoding meta tag instead of UTF-8...EVEN THOUGH the base XML file is UTF-8. Unless you know to select a UTF-8 theme, you won't get meta tag. Not only does this make me nervous on principle, but it means that you have to be extra careful if you ever edit the files in another program like Dreamweaver.</p>

<!--
<p>My other quirk is I'm not sure how it handles Unicode generated on a Mac version of Word. CoureGenie only works in the Windows version of Word, but I am a Mac person so I edit some docs on the Mac side. Yet some of the codes are not converting correctly (this will need further investigation). </p>

<p>Since CourseGenie is a plugin only for the Windows Word, theoretically this shouldn't be an issue...unless you have an instructor who hands you in a manuscript edited on Mac Word...believe me, it can happen.</p> -->]]>
        
    </content>
</entry>

<entry>
    <title>Igbo in Facebook - It Can Be Done (But Numeric Code Breaks)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/03/igbo-in-facebook-it-can-be-don.html" />
    <id>tag:www.personal.psu.edu,2008:/ejp10/blogs/gotunicode//516.5845</id>

    <published>2008-03-06T19:34:23Z</published>
    <updated>2008-03-06T19:53:04Z</updated>

    <summary>How does Facebook handle accents? Pretty well actually - but you can&apos;t use the numeric code. Instead you have to directly insert the character either by typing it in an Igbo Keyboard or via the Windows Character Map or Mac...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>How does Facebook handle accents? Pretty well actually - but you can't use the numeric code. Instead you have to directly insert the character either by typing it in an <a href="http://tlt.its.psu.edu/suggestions/international/bylanguage/igbo.html">Igbo Keyboard</a> or via the <a href="http://tlt.its.psu.edu/suggestions/international/accents/charmap.html">Windows Character Map</a> or <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.ht">Mac Character Palette</a>.</p>
<p>For Web 1.0, the safest way to display accented letters was with numeric entity codes. For instance, if wanted to display <b>Ụwa,</b> I might write <code>&amp;#7908;wa</code> within the HTML document. The codes were safer because they would work even if a developer forgot to include the UTF-8 meta tag.</p>
<p>In a Web based form, the rules may differ depending on how the developer configured the service. In some forms, you MUST enter the numeric code (often because the UTF-8 tag is missing). In other cases you CANNOT use the numeric code - this is true when you are entering data into a text field which will not go through any HTML formatting schemes. As long as the output has the UTF-8 meta tag (and Facebook does), you can avoid a numeric code (i.e. enter a "raw" accented letter) and still be OK.</p>
<p>How can you tell? Unfortunately, you have to test each application one by one. As I've commented before, applications which truly expect to support a global audience are generally UTF-8 ready and you can skip the numeric code. This includes Facebook, MovableType, iTunes, GoogleMaps, Twitter and so forth.</p>
<p>Being able to skip the numeric code is a positive sign (why memorize numbers when you can type?), but as with all change, there will be some old habits to break.</p>]]>
        
    </content>
</entry>

</feed>
