ELIZABETH J PYATT: November 2009 Archives

Hexadecimal to Decimal in FileMaker 7+ (Revised)

|

I'm updating my FileMaker Unicode database database to reflect the changes in the recent versions of Unicode. As part of the database, I like to have the decimal version of the code point handy as well as the actual hexadecimal version (it's good for debugging purposes).

Now the default version does not appear to have to hex to decimal conversion built in (not even in FileMaker 10), so here's my (updated) solution.

  1. In the main table corresponding to the list of code points, I created a field for the Hexadecimal Unicode code point value. I'll call this HexValue for now. It must be a Text field. You can create a Decimal field (Calculated), but you won't be able to fill in the formula yet.
  2. Then I created a second table to store the correspondence between a hex digit (0-F) and its decimal value (0-15). The HexValuefield is Text, but the DecValue field is a Number. See the sample table below (some values skipped).
    HexValue (Text) DecValue (Number)
    00
    11
    22
    33
    4...9 (1 row each) 4...9
    A10
    B11
    C...E (1 row each)12...14
    F15
  3. To do all the conversions, you need to extract the text value of each position in the code point. So, I created fields corresponding to the value for each place in the hex code point as shown in the list below. I'll explain the formulas below.

    Note: In case you're wondering, the name of the places are semi-inspired by Roman numerals and algebra.

    • Rightmost digit Units (n) : nhex = Right(HexValue;1)
    • Penultimate digit (t) : thex = Left(Right(UnicodeHex;2);1)
    • Antepenultimate digit (c) : chex = Left(Right(UnicodeHex;3);1)
    • 4th from right (m): mhex =Left(Right(UnicodeHex;4);1)
    • 5th from right (d): dhex =If(Length (UnicodeHex)>4;Left(Right(UnicodeHex;5);1);"0")
    • 6th from right (x): xhex = If(Length (UnicodeHex)>5;Left(Right(UnicodeHex;6);1);"0")

    The challenge for modern Unicode is that code points now come in variable lengths (4-6), so if you count from the left you can't always know you are the appropriate digit. That means you have to count from the right, but there's no simple formula for picking the 2nd digit from the right. My solution is to take a rightmost chunk then count in from the left. So to get the 3rd hex digit from the left, I take the right most 3 digits, then find the leftmost digit in that chunk (hence the embedded left(right) formulas).

    I also have to check to see if the length is greater than 4. When the length is 4, some digits are filled in with the value 0, otherwise you do a string extraction. Hence the formulas for dhex and xhex use conditional logic. Hopefully though, if Unicode adds more digits, these formulas will continue to work (unlike my original attempt which only assumed 4 digits in the code point.

  4. To convert each extracted digit to its decimal version. I need to set up some Relationships between tables so that each extracted digit can look up the decimal equivalent. For each of the intermediate digit fields above, I created a link to an instance of the Hexadecimal Lookup table (there are 4 instances total). It's important to make sure each instance has a name you can remember later; mine mention which digit I am working on. See the Relationships diagram below.
    HexRelationships.png
  5. Now we can finally get that decimal value! If you haven't already, create a DecimalValue field and make it Calculated.
  6. Here's my calculation. I'll explain what the parts mean below
    HexLookup N::DecValue + 16*HexLookup T::DecValue + 16^2* HexLookup C::DecValue + 16^3*HexLookup M::DecValue + 16^4*HexLookup D::DecValue+16^5*HexLookup X::DecValue
    • "HexLookupN::DecValue" means give me the equivalent decimal value column based on the hex value in the "HexLookupN" (units digit) table instance.
    • "HexLookup T::DecValue" does a look up for the tens unit. I multiply the value by 16 an add it to the ones value. Remember the hex #FF (F=15) means 15*16+15
    • I look up the hundreds place decimal value and multiply it by 16^2 (256), then the thousands place decimal and multiply it by 16^3 (4096).
    • I add up the results of each converted decimal digits times its appropriate power of 16.The calculation is complete.

Categories:

What did that font switch to in FileMaker (Mac)?

|

Prelude about the Problem

In terms of handling non-English characters, apps come in two types (at least on the Mac). There are apps which switch fonts behind the scenes without telling you, and those which don't...but then you have to guess which font to use.

To take a concrete example, if I switch from the English keyboard to Japanese input in FileMaker, the font will automatically switch to one of the Japanese fonts. In theory, once I switch back to English, I should return to the original font (except when I don't...we'll get to that). The same principle applies in most apps including TextEdit, FileMaker and so forth. In contrast, if I switch to Japanese input in Adobe Photoshop, I also have to change fonts.

In theory, the automatic font switching sounds nice except when 1) when the font doesn't change back after typing the exotic character (this happens a lot in phonetic transcription and elsewhere) or 2) you're trying to figure if font X actually has that glyph (or whether it's the illusion of font switching in action. With the Adobe products, the manual font switching means you know exactly which font you are using at all times, which is important in desktop publishing.

FileMaker

For instance...I uploaded a version of the UCD Unicode files into FileMaker so I would have a searchable reference locally. An additional function is that I can display glyphs in different fonts for comparison. I have most of the mega fonts selected, but few fonts have everything, so I know there are gaps.

However, because FileMaker switches fonts behind the scenes, I can't always be sure if font X actually has that glyph. If I see a bunch of boxes with identical glyphs, I can suspect an unannounced font switch...but to what?

Solution

The best solution now is to copy and paste the text into TextEdit then open up the font formatting palette (Command+T), and see what it says. Kind of dorky, but still more information than I had.

For the record, I understand why FileMaker is set up this way. For most purposes, you don't want your data entry operators to fidget with fonts. However, you can get inconsistent results if you are not careful. For instance, once I do switch to Japanese, I get the Japanese font, but if I return to English...I still get the Japanese font. I know Japanese fonts contain Latin characters, but the formatting is almost always NOT the one I intended.

It would be nice if FileMaker and the other apps (including Microsoft Office) could return you to your original English font formatting after your exotic sidetrip to the higher code points of Unicode.

Categories:

Glyph DuJour: Romance Ordinal ª and º

|

What these are

The superscript a/o (sometimes underlined) are abbreviations for ordinal numbers used in Spanish, Italian and Portuguese similar to English -th (as in "4th, 5th, 6th.."). The use of "o" vs "a" depends on the gender of the noun. For instance, the "1st American woman" would be 1ª americana in Spanish and the "1st American man" would be 1º americano. The 5th Amercan woman and man would be 5ª americana/5º americano.

The Codes

I got a request for putting codes for these on the Penn State Web Computing with Accents Web site in various locations, so I thought I would summarize the codes here.

  Feminine Ordinal (ª) Masculine Ordinal (º)
Unicode Code Point U+00AA (170) U+00BA (186)
Windows Alt Code ALT+0170 ALT+0186
Mac Option Code Option+9 Option+0
HTML Entity Code ª º

But Wait There's More

But in the land of Unicode, there's always more to know...such as that in Spanish 1º primero '1st.masc' or '1º' may be shortened to primer which can be abbreviated as '1er'...or that you may write octavo 'eight.masc' as 8º or 8.º or possibly 8vo...although Google tends to have more instances of 8º.

What's important though is that only º and ª have their own code points in Unicode. For English -th, -nd, -rd or Spanish -vo,-er you have to rely on the old fashioned SUP (superscript) tag or its equivalent in CSS.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments