<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Got Unicode?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/" />
    <link rel="self" type="application/atom+xml" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/atom.xml" />
    <id>tag:www.personal.psu.edu,2008-01-24:/ejp10/blogs/gotunicode//516</id>
    <updated>2013-05-17T19:02:20Z</updated>
    <subtitle>Elizabeth Pyatt&apos;s Unicode tips, resources and war stories.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Nunavut Offical Languages Act and Canadian Aboriginal Syllabics</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2013/05/nunavut-offical-languages-act.html" />
    <id>tag:www.personal.psu.edu,2013:/ejp10/blogs/gotunicode//516.726737</id>

    <published>2013-05-17T18:50:34Z</published>
    <updated>2013-05-17T19:02:20Z</updated>

    <summary>A script that may not be well-known to U.S. citizens in the Canadian Aboriginal Syllabic script which is a syllabary used to write certain indigenous languages including Iñuit languages spoken in the Nunavut territory of Canada. This script is about...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Aboriginal Syllabics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="By Script" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>A script that may not be well-known to U.S. citizens in the <a href="http://en.wikipedia.org/wiki/Canadian_Aboriginal_syllabics">Canadian Aboriginal Syllabic script</a> which is a syllabary used to write certain indigenous languages including Iñuit languages spoken in the Nunavut territory of Canada. </p>

<p>This script is about to appear in <a href="http://languagemagazine.com/?page_id=6373">many more documents and signs</a> because the Nunavut's Official Languages Act is coming in to force to promote the Iñuit languages to be official languages alongside and English.

<p>In addition to the languages of Nunavut, this script is <a href="http://www.omniglot.com/writing/ucas.htm">used in Canada</a> to write a number of indigenous languages including Ojibwe, Blackfoot, Cree and others. In contrast, most indigenous languages in the U.S. are written in the Latin alphabet with the notable <a href="http://www.omniglot.com/writing/cherokee.htm">exception of Cherokee.</a>

<p>I'm curious if indigenous communities in the U.S. would consider adopting this script to further differentiate themselves from the U.S. If that happens, the Nunavut law should ensure that proper Unicode support is available.</p>]]>
        
    </content>
</entry>

<entry>
    <title>MathML Test on MovableType</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/10/mathml-test.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.654325</id>

    <published>2012-10-03T17:52:03Z</published>
    <updated>2012-10-03T19:22:15Z</updated>

    <summary>If you&apos;re on Firefox 4+, Safari 5+ or Internet Explorer 9 with MathType Player 3, the text below with be a MathML representation of Planck&apos;s Law. If you want to replicate this you have to: Paste the XML in the...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>If you're on Firefox 4+, Safari 5+ or Internet Explorer 9 with <a href="http://www.dessci.com/en/products/mathplayer/download.htm">MathType Player 3</a>, the text below with be a <a href="http://www.personal.psu.edu/ejp10/blogs/tlt/tutorials/mathml.html">MathML representation</a> of <a href="http://en.wikipedia.org/wiki/Planck%27s_law">Planck's Law.</a></p>

<p>If you want to replicate this you have to:</p>

<ol>
	<li>Paste the XML in the HTML code (i.e. NOT the WYSIWYG editor) </li>
        <li>Make sure that the first line of the XML includes a link to the MathML namespace as follows:</br >
<code>&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;</code>
       <li>I also like to use CSS to bump the font size - those super/subscripts can get very tiny.</li>
</ol>

<p>All I can say is - Wow. I wish all my CMS systems played this well with MathML.</p>



<div style="text-align:center; font-size:2.5em !important">
<math xmlns="http://www.w3.org/1998/Math/MathML">
    <semantics>
      <mstyle  mathvariant="normal">
        <msub>
          <mi mathvariant="italic">E</mi>
          <mrow>
            <mo mathvariant="italic">&#x3bb;</mo>
            <mi mathvariant="italic">b</mi>
          </mrow>
        </msub>
        <mo>&#x3d;</mo>
        <mfrac>
          <mrow>
            <mn>2</mn>
            <mo mathvariant="italic">&#x3c0;</mo>
            <msup>
              <mi mathvariant="italic">ℎc</mi>
              <mn>2</mn>
            </msup>
          </mrow>
          <mrow>
            <msup>
              <mo mathvariant="italic">&#x3bb;</mo>
              <mn>5</mn>
            </msup>
            <mfenced close=")" open="(">
              <mrow>
                <msup>
                  <mi mathvariant="italic">e</mi>
                  <mrow>
                    <mi mathvariant="italic">ℎc</mi>
                    <mo>&#x2212;</mo>
                    <mo mathvariant="italic">&#x3bb;</mo>
                    <msub>
                      <mi mathvariant="italic">k</mi>
                      <mi mathvariant="italic">b</mi>
                    </msub>
                    <mi mathvariant="italic">T</mi>
                  </mrow>
                </msup>
                <mo>&#x2212;</mo>
                <mn>1</mn>
              </mrow>
            </mfenced>
          </mrow>
        </mfrac>
      </mstyle>
      
    </semantics>
  </math>

</div>

<p>P.S. If you send me comments on this blog, please note that this text may possibly depart from standard academic English. Linguists can do that, especially in a blog.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Blackletter Gone Bild Wild</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/08/blackletter-gone-bild-wild.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.628670</id>

    <published>2012-08-09T20:21:14Z</published>
    <updated>2012-08-09T20:40:49Z</updated>

    <summary>The &quot;Tweed&quot; column from the Chronicle of Higher Education had an amusing story of a Blackletter glyph variant glitch on the new University of Idaho diplomas (specifically &quot;Congrabulations on Your Grabuation!&quot;) As with many U.S. diplomas, the university name was...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Humor" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The "Tweed" column from the <cite>Chronicle of Higher Education</cite> had an amusing story of a <a href="http://">Blackletter glyph variant glitch</a> on the new University of Idaho diplomas  (specifically <a href="http://chronicle.com/blogs/tweed/congrabulations-on-your-grabuation/30047">"Congrabulations on Your Grabuation!"</a>)</p>

<p>As with many U.S. diplomas, the university name was rendered in a <a href="http://en.wikipedia.org/wiki/Blackletter">Blackletter (aka "Old English" or Gothic")</a> calligraphic style font. This font though had a particularly <a href="http://chronicle.com/blogs/tweed/congrabulations-on-your-grabuation/30047/diplomas">high flourish on the lower case "v"</a>, high enough that recipients wondered if they had written a "b" instead of "v" (and who wants a diploma from the Unibersity of Iowa?).</p>


<p>According to the Chronicle, the administration reassured them that it was an archaic "v", but this case does highlight the legibility issues of some older manuscript fonts and the need to balance historical font authenticity with modern needs.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Good List of Arabic Encodings </title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/05/good-list-of-arabic-encodings.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.598572</id>

    <published>2012-05-31T15:05:48Z</published>
    <updated>2012-05-31T15:31:55Z</updated>

    <summary>The Arabic computing industry has worked with a number of encoding schemes since the 1960s. The History of Arabic on Computers page lists a number of historic encodings from NCR-64 to ASMO 708 and Windows 1256. My favorite might be...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Arabic Script" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="By Script" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The Arabic computing industry has worked with a number of encoding schemes since the 1960s. The <a href="http://baheyeldin.com/arabization/">History of Arabic on Computers</a> page lists a number of historic encodings from NCR-64 to ASMO 708 and Windows 1256.</p>

<p>My favorite might be an early 7-bit set which replaced the lower case English letters with Arabic letters (but kept the capiral letters). As the article notes, this worked because "Some printers were not even capable of printing lower case English letters."</p>

<p>It's a good thing we've moved beyond that.</p>]]>
        
    </content>
</entry>

<entry>
    <title>JAWS 13 and Phonetic Symbols</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/04/jaws-13-and-phonetic-symbols.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.528938</id>

    <published>2012-04-20T19:59:40Z</published>
    <updated>2012-04-20T20:59:26Z</updated>

    <summary>A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="South Asian" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Windows" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your pronunciation files.</p>

<p>I wrote about this in an <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2008/09/getting-jaws-61-to-recognize-e.html">earlier post on JAWS 6</a>, but today I was able to document and implement, so I thought I would share the procedure.</p>

<p>The fix I am using will expand the <b>symbol set</b> within JAWS so that a character like  /ə/ will be read as "schwa" (but not as its phonetic value of "uh")  Ideally, it would be nice to have a word pronunciation engine so that phonetic pronunciation values are emulated, but let's take this one problem at a time.</p>

<h3>SBL Files</h3>
<p>JAWS includes a set of symbol or .sbl files which match punctuation and symbol characters with a "word" (e.g, ? = "question mark"). The key is to add the character and reading to your working files.</p>

<p>Luckily, there there is a <a href="http://www.ruf.rice.edu/~reng/IPA_SBL.txt">phonetic symbol .sbl file</a> from Robert Englebretson. There's also a <a href="http://www.carrolltech.org/pub/math.txt">math symbol .sbl file </a>from Carroll Tech.</p>

<h3>Add Characters to Symbol File</h3>

<p>This procedure assumes that JAWS is using the Eloquence engine, in which case the key file to change is <b>eloq.sbl.</b> You will also need to have <b>an Admin account</b> to implement the changes.</p>

<p><b>Note:</b> SBL files can be opened in any text editor such as Notepad.</p>

<ol>
	<li>Open or download <a href="http://www.ruf.rice.edu/~reng/IPA_SBL.txt" tar>phonetic symbol .sbl file (New Window)</a></li>
       <li>Find the location of your eloq.sbl file. Mine was in the the following path on my C hard drive:</br>
<code><b>C:\Users\All Users\Freedom Scientific\Jaws\13.0\Settings\enu\eloq.sbl</b></code></li>
       <li>Make a (second) copy of this file and rename as <b>eloqOld.sbl</b>. This is your backup in case something goes wrong.</li>
       <li>Make a third copy and rename it as <b>eloqNew.sbl</b>. This is a temporary file to edit since you may not be able to directly edit eloq.sbl. </b> 
       <li>Open <b>eloqNew.sbl</b> in a text editor such as Notepad. This file contains pronunciation values for multiple languages. Scroll to the language you normally use (e.g. "[American English]"</li>
        <li>Scroll to the end of the symbol list for that language.</li>
        <li>Copy and paste the list of symbols from one of the other .sbl files <b>immediately after the final line in the list</b>. Each symbol will be in a single line and have the format <code>U+0001=character name</code></br>
<b>Note:</b> Don't worry if the format does not match the rest of the symbol list.</p>
      <li>Repeat the last step for each language you want to support. You can translate character names as needed for each language. Save and close file.</li>
      <li>Exit JAWS if it is open.</li>
      <li>Delete <b>eloq.sbl.</b> You may be asked for an admin password at this point.</li>
     <li>Rename <b>eloqNew.sbl</b> as <b>eloq.sbl.</b></li>
    <li>Restart JAWS and test on a page such as <a href="http://tlt.its.psu.edu/suggestions/international/bylanguage/ipachart.html#a">IPA Characters based on Letter A with Numeric Codes</a></li>
     
</ol>


<h3>Look Up Additional Codes</h3>

<p>Each line in the SBL file has this format:</p>

<div class="example"><code>
U+Codepoint=Character Name (no quotes)
</code></div>

<p>For instance, if I wanted to expand the repertoire of currency symbols to include the new <a href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2010/08/new-rupee-symbol-may-now-be-u2.html">rupee symbol of India</a> (&#x20B9), I would add the following to my .sbl file</p>

<div class="example"><code>
U+20B9=Rupee symbol of India
</code></div>

<p>A list of Unicode charts with code points is available at <a href="http://www.unicode.org/charts/">http://www.unicode.org/charts</a></p>
]]>
        
    </content>
</entry>

<entry>
    <title>Testing Some MP3 Sites with Halfaxa Titles</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/testing-some-mp3-sites-with-ha.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.515099</id>

    <published>2012-03-24T18:47:25Z</published>
    <updated>2012-03-21T19:21:05Z</updated>

    <summary>Unicode is such an esoteric subject, you sometimes wonder who&apos;s seeing the possibilities. One artist who does appreciate is Canadian electronic musician Grimes whose album Halfaxa contains song titles such as &quot;ΔΔΔΔRasikΔΔΔΔ&quot;, &quot;Sagrad Прекрасный&quot;, &quot;† River †&quot;, along with the...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Unicode is such an esoteric subject, you sometimes wonder who's seeing the possibilities. One artist who does appreciate is Canadian electronic musician Grimes whose album Halfaxa contains song titles such as "ΔΔΔΔRasikΔΔΔΔ", "Sagrad Прекрасный", "† River †", along with the charmingly titled "World♡Princess" and the mathematically complex "≈Ω≈ω≈ω≈ω≈ω≈ω≈ω≈ω≈"  (Αlmost Omega?)</p>

<p>That makes this album a great test case to check out how well your MP3 or streaming service does with Unicode. As you can see below, <b> iTunes</b> and <b>Rhapsody</b> do well, but for some reason <a href="http://www.amazon.com/Halfaxa/dp/B0045OHUNM/ref=sr_1_1?ie=UTF8&qid=1332357018&sr=8-1" rel="no follow">Amazon</a> is giving me the Unicode question mark of death (my guess it's because the page specfies Verdana which doesn't have all the characters).</p>

<p>I haven't tested every music site, but you get the idea...</p>

<h3>iTunes Halfaxa List</h3>
<img alt="Halfaxa album list on iTunes with correct symbols" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/21/Halfaxa.png" width="616" height="514" style="border" />

<h3>Rhapsody Halfaxa List</h3>
<img alt="Halfaxa album list on Halfaxa with correct symbols" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/21/HalfaxaRhapsody.png" width="431" height="507" style="border" />

<h3>Amazon Halfaxa List (Verdana Type)</h3>
<img alt="Halfaxa album list on Amazon with ?? for symbols" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/21/AmazonHalfaxa.png" width="546" height="445" class="mt-image-none" style="border" />]]>
        
    </content>
</entry>

<entry>
    <title>Converting Numeric Entity Codes Back to Text</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/numeric-entity-codes-back-to-t.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.510956</id>

    <published>2012-03-15T19:54:18Z</published>
    <updated>2012-03-08T21:10:06Z</updated>

    <summary>I got a technical question recently which I thought to share. Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="(X)HTML Markup" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>I got a technical question recently which I thought to share.</p>
<p>Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management system would convert Cyrillic text like <b>Україна</b> (Ukraine) to a series of numeric codes like:<br /></p>
<div class="example">
<p><code>&amp;#x0427;&amp;#x;&amp;#x043A;&amp;#x0440;&amp;#x0430;&amp;#x0457;&amp;#x043b;&amp;#x0430;</code></p>
</div>


<p>This is fine for single words and small phrases, but it's bad for an entire page...especially if you want to edit it.</p>

<p>Fortunately, there is a quasi-fix for this if you need to replace numeric codes with real text. That is:</p>

<ol>
	<li>Open your page in a browser which does render the entity codes as the correct text.</li>
       <li>Copy displayed text and paste it in another file. It will be rendered as text.</li>
      <li>Put the text back into your HTML source.</li>
</ol>

<p>It's a little tedious, but since I couldn't quickly find a better tool for this, it is a decent stop gap. At least you won't have to re-type everything...</p>]]>
        
    </content>
</entry>

<entry>
    <title>Unicode and WCAG 2.0 (Accessibility)</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/unicode-and-wcag-20-accessibil.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.510952</id>

    <published>2012-03-08T20:31:38Z</published>
    <updated>2012-03-08T20:52:12Z</updated>

    <summary>Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest WCAG 2.0 (Web Content Accessibility Guidelines) standard in some interesting ways. Text not Image One guideline in particular of...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="(X)HTML Markup" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest <a href="http://www.w3.org/TR/WCAG20/">WCAG 2.0 (Web Content Accessibility Guidelines)</a> standard in some interesting ways.</p>

<h3>Text not Image</h3>

<p>One guideline in particular of interest is Guideline 1.4.5:</p>
<blockquote><b>WCAG Guideline 1.4.5 Images of Text:</b> If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following:</blockquote>

<p>In other words, it is generally better to use CSS+actual text to present textual information, even when it is stylized. Unicode is especially important for doing this especially for characters beyond ASCII or Latin-1.</p>

<p>There are two reasons for this guideline. First is that if a screen reader has text available, the developer does not need to include any additional information such as an image ALT tag. The other is that text tends to be more flexible across devices. It particular, it can be zoomed without being rasterized (appearing jagged at large sizes) and it can have its format changed without information loss (say flipping from black text on white to white text on black - a format preferred by some users).

<h3>Right to Left Marker</h3>
<p>A second relevant guideline is:</p>
<blockquote><b>WCAG Guideline 1.3.2:</b> When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)</blockquote>
<p>An important concept for RTL (right to left languages) is ensuring that text remains in logical order so that characters are in their correct linear order, even if they are presented "backwards" from the more common LTR order.  WCAG threrfore also recommends logical order and mentions the Unicode RLM (right-to-left marker) and LRM characters</p>

<h3>Language Tags</h3>
<p>A final i18n technology mandate of the WCAG 2.0 is the use of language tags. </p>
<blockquote><b>WCAG Guideline 3.1.1:</b> Language of Page: The default human language of each Web page can be programmatically determined.</blockquote>

<p>In other words, use language tags to identify page language. This is especially important for screen readers which need to switch pronunciation engines between languages.</p>

<p>There have been several debates about the utility of WCAG 2.0, but I can rest assured that at least the needs of multilingual users have been considered.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Math+HTML 5 in 3 Browsers</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/03/mathhtml-5-in-3-browsers.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.510944</id>

    <published>2012-03-08T20:27:11Z</published>
    <updated>2012-03-08T20:31:30Z</updated>

    <summary>As you can see I haven&apos;t been posting here regularly. It&apos;s because I&apos;ve been tied up with a11y (accessibility) including MathML. However, I am happy to report that I was able to create a HTML5+MathML file that works in Internet...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>As you can see I haven't been posting here regularly. It's because I've been tied  up with a11y (accessibility) including MathML. </p>

<p>However, I am happy to report that I was able to create a<a href="http://www.personal.psu.edu/ejp10/blogs/tlt/2012/03/mathml-2012-update.html"> HTML5+MathML file </a>that works in Internet Explorer AND Firefox/Safari (with some Unicode thrown in). </p>

<p>As a reward, I think I will write a Unicode post today.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Unicode 6.1 Additions</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/02/unicode-61-additions.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.501350</id>

    <published>2012-02-10T18:43:10Z</published>
    <updated>2012-02-10T20:03:41Z</updated>

    <summary>The Unicode standard was just updated to version 6.1, and that means new blocks and characters. New Blocks Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic &amp; Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Ancient Scripts" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Arabic Script" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="South Asian" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The Unicode standard was just updated to <a href="http://www.unicode.org/versions/Unicode6.1.0/">version 6.1</a>, and that means new blocks and characters.</p>

<h3>New Blocks</h3>

<p>Blocks added included <b>Miao</b> (script developed for Hmong/Miao languages), <b>Merotic Heiroglyphic & Merotic Cursive</b> (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).

<p>Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.</p>


<h3>New Characters</h3>
<p>The Unicode Consortium has an index of which <a href="http://www.unicode.org/charts/PDF/Unicode-6.1/">new characters</a> have been added to different scripts.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Unicode 3Play - Yammer, Google Earth, iBooks Author</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2012/01/unicode-3play---yammer-google.html" />
    <id>tag:www.personal.psu.edu,2012:/ejp10/blogs/gotunicode//516.493741</id>

    <published>2012-01-27T20:52:53Z</published>
    <updated>2012-01-27T21:53:09Z</updated>

    <summary>I was so excited to get some time to test new tools that I tested three for basic Unicode support. My updates: Yammer - Pass Yammer is a service similar to Twitter but with more tools suitable to a corporate...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Tool Tests" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>I was so excited to get some time to test new tools that I tested three for basic Unicode support. My updates:</p>

<h3>Yammer - Pass</h3>
<p>Yammer is a service similar to Twitter but with more tools suitable to a corporate environment. I posted some text with obscure phonetic characters and some Devanagari, and results were generally good.</p>

<p>This was done on a Mac via the Web site interface and via the desktop client using.  It seemed low fuss enough that I suspect support is good in most configurations. Note that third party clients are always an unknown. For instance, although Twitter also has excellent Twitter support some of the third party viewers was pretty bad.</p>

<h3>iBooks Author - Pass</h3>
<p>Most apps from Apple have good Unicode support and this is no different. My only concern here is font control. It looks like you can define new styles based on pre-existing formatted text, but can't really edit existing one. </p>

<p>One non-Unicode gripe is that some styles had small caps and I was not able to disable that. It may not be a show stopper in most docs, but not all scripts include small caps (or even distinguish capital/lower case). </p>

<p>I gather that the <a href="http://alanquatermain.me/post/16179111286/ibooks-author-vs-epub-author">format generated is a form of XML (per Alan Quarterman)</a> with HTML features  and CSS....but the CSS is hard to directly edit. Whenever you leave the Western alphabet with few controls over font presentation, it's time to be nervous.</p>

<h3>Google Earth - Pass, but slightly Tempermental</h3>
<p>For the record, I am in love with Google Earth as a teaching tool. However, entering data was tricky.</p>

<p>The keyboard methods seem to work fine to enter text for items such as new locations, and so forth. However I had problems with using the Character Viewer (OS X 10.7). I would double click the symbol and nothing would happen ;(. Then again, it could be the new Character Viewer although it seems to be OK with Yammer.</p>


<p>In any case, this could be an issue if a user is trying to use a cute emoji symbol. Cut and paste from another document did appear to work.</p>


]]>
        
    </content>
</entry>

<entry>
    <title>Understanding the Character Viewer in OS X 10.7, Lion</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/12/understanding-the-character-vi.html" />
    <id>tag:www.personal.psu.edu,2011:/ejp10/blogs/gotunicode//516.470476</id>

    <published>2011-12-02T22:00:40Z</published>
    <updated>2011-12-02T22:06:16Z</updated>

    <summary>For Unicode fans, one of the bigger and more useful changes in the Mac 10.7 Lion operating system is the updated Character Viewer. Despite improvements though, it&apos;s different enough that I think some documentation is work posting. How to Access...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Macintosh" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>For Unicode fans, one of the bigger and more useful changes in the Mac 10.7 Lion operating system is the updated Character Viewer. Despite improvements though, it's different enough that I think some documentation is work posting.</p>
<h3>How to Access</h3>
<p>As with previous versions Character Viewer is activated in the <b>Language &amp; Text</b> section of the <b>System Preferences</b> menu. See <a href="http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html">http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html</a> for details.</p>
<h3>How to Find Characters</h3>
<p>By default, the Viewer only gives options for Symbols (e.g. &quot;Arrows, Punctuation, Currency Symbols, Emoji&quot;), I actually like how the symbols have been organized into semantic groups rather than by numeric block. However, the list does not include all the scripts I need to access.</p>
<p>Fear not though - the other block are availables. To view other blocks click the Gear icon, then select <b>Customize List.</b> This opens a pop-up window which provides a list of all available symbol lists, including <b>Unicode</b>, which is the entire list organized by Unicode block.</p>

<p><img alt="Character Viewer with Customize options open and multiple scripts checked" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/12/02/CharViewLion.png" width="659" height="579"  /></p>

<h3>How to Insert</h3>
<p>The basic mechanism is to select the character you want to highlight. In previous versions of this tool, there was an Insert button, but this has disappeared.</p>
<p>In this version, you need to do the following.</p>
<ol>
  <li>Place your cursor at an appropriate insertion point in your document.</li>
  <li>Open the <b>Character Viewer.</b></li>
  <li>Find and highlight your character.</li>
  <li><strong>Double click the character.</strong> It will be inserted into your document.<br />
  <b>Note: </b>You can also drag and drop characters into your document. </li>
</ol>
<h3>Favorites</h3>
<p>A feature that I am now using is the <b>Favorites</b> list, a place to list commonly used characters to insert. This puts everything in one list which you can order as needed.</p>
<p>You can add <b>Favorites</b> by selecting a character and either either clicking the <b>Add to Favorites</b> button or dragging it into the <b>Favorites</b> list.</p>
<h3>Conclusion</h3>
<p>Although I was a little confused by the new Lion Character Viewer, I actually think it was a good overhaul. It suits the average user who needs access to common symbols and emoji, but allows us more dedicated users access to what we need. </p>
<p>A final improvement worth noting is that the <b>Font Variation</b> feature is much more stable than in previous features. This is perfect for times when you need to debug a weird character/font combination.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Free ErlerDingbats Unicode Font for 2700 Block</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/11/free-erlerdingbats-unicode-fon.html" />
    <id>tag:www.personal.psu.edu,2011:/ejp10/blogs/gotunicode//516.463024</id>

    <published>2011-11-21T16:08:38Z</published>
    <updated>2011-11-21T16:27:38Z</updated>

    <summary>If you&apos;ve ever wanted Unicode support for snowflakes, decprative arrows, crosses and stars, then you may be interested in the free Erler Dingbats font from the Font Shop. The fonts even ship with keyboard layouts to make data entry easier...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>If you've ever wanted Unicode support for snowflakes, decprative arrows, crosses and stars, then you may be interested in the <a href="http://www.ffdingbatsfont.com/">free Erler Dingbats font</a> from the Font Shop. The fonts even ship with keyboard layouts to make data entry easier</p>

<p>The image below shows roughly the glyphs covered (generally in black and white). There are more characters covered in the for-fee font DD Dingbats 2.0, but even these provide some interesting possibilities in terms of documentation and even fancy bullet lists (especially if combined with font embedding.</p>

<img alt="Unicode Block UTF+2700-27BF" src="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/11/21/DBATBlock2700.png" width="346" height="254"  />]]>
        
    </content>
</entry>

<entry>
    <title>Got Double Hypens from Word?</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/08/got-double-hypens-from-word.html" />
    <id>tag:www.personal.psu.edu,2011:/ejp10/blogs/gotunicode//516.414102</id>

    <published>2011-08-18T19:19:25Z</published>
    <updated>2011-08-18T19:49:19Z</updated>

    <summary>Unicode hasn&apos;t been part of my life enough recently, but it did emerge in a very unexpected way this week to during a recent calendar upgrade. One of the conversion tasks was for us to add group e-mail addresses so...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Accents &amp; Punctuation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Humor" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>Unicode hasn't been part of my life enough recently, but it did emerge in a very unexpected way this week to during a recent calendar upgrade. </p>

<p>One of the conversion tasks was for us to add group e-mail addresses so we could share calendars among each other efficiently. But when I tried to copy and paste, I got a "not found error." Here is one of these addresses (altered for security reasons):</p>

<p class="example">
umg-sc.foo.staff@fuyu.ucal.psu.edu
</p>

<p> Can you spot the problem (HINT: Try cutting and pasting into a text file).</p>

<p>Given up? The problem is the hyphen. In the right font, you will see that it's not just a hyphen (U+002D or ASCII #45), but actually the more elegant and slightly longer <b>en dash</b> which is actually U+2013 (not in ASCII). As many of you know, many databases are still sensitive to differences, so a hyphen is just not the same as an en dash. Theis means searching is a FAIL.</p>

<p>How did the en-dash get in there if it's outside of ASCII? My guess is that it's a result of an auto-correct feature from Word which makes some formatting tweaks to enhance visual appeal. One is to change plain hyphens into a slightly longer en-dash (more favored by typographers).

<p> Another common change is to convert plain straight quotes (" at U+0022 or ASCII #34) to "Smart Quotes" like (&ldquo; at U+201C) and (&rdquo; at U+201D). 
Copying HTML code attributes from Word can be similarly dangerous since HTML recognizes plain quotes, but NOT fancy double quotes. Most of the time, the change does nothing, but when it comes to interacting with some systems, the reformatting makes a difference in a very annoying way.</p>

<p>How to catch it? In some cases, you can change the font, but many fonts make the dash and en-dash appear identical (Arggh!). Which leaves the old standdy (test,test,test) plus some Unicode awareness (which is increasing among programmers). </p> 




]]>
        
    </content>
</entry>

<entry>
    <title>&quot;Coming Soon to Unicode&quot; Pipeline Table</title>
    <link rel="alternate" type="text/html" href="http://www.personal.psu.edu/ejp10/blogs/gotunicode/2011/08/coming-soon-to-unicode-pipelin.html" />
    <id>tag:www.personal.psu.edu,2011:/ejp10/blogs/gotunicode//516.413375</id>

    <published>2011-08-12T20:29:08Z</published>
    <updated>2011-08-12T20:29:39Z</updated>

    <summary>The Unicode Consortium announced they they had created a Unicode &quot;Pipeline Table&quot; page of characters scheduled for future versions of Unicode. The table is organized by projected UCS code point number, but they are in various stages of the proposal...</summary>
    <author>
        <name>ELIZABETH J PYATT</name>
        
    </author>
    
        <category term="Secret Unicode Link" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.personal.psu.edu/ejp10/blogs/gotunicode/">
        <![CDATA[<p>The Unicode Consortium announced they they had created a <a href="http://unicode.org/alloc/Pipeline.html">Unicode "Pipeline Table"</a> page of characters scheduled for future versions of Unicode. </p>

<p>The table is organized by projected UCS code point number, but they are in various stages of the proposal process. Although dates of acceptance to a particular stage are posted, the target future version is not listed. Although many specifications look complete, the Unicode Consortium does warn that they are subject to change.</p>

<p>If you are interested in entire script blocks (particularly Ancient and lesser-known Indian scripts)  coming to Unicode, you can go to the <a href="http://www.unicode.org/pending/pending.html">Proposed New Script </a>page. The caveat that "things are subject to change" also applies here.</p>]]>
        
    </content>
</entry>

</feed>
