A List Apart Accent Folding Article (with Review)

|

Accent Folding (or as A List Apart writes "Áçčềñṭ-Ḟøłðǐṅg") is the process of programming your search algorithms so that if a user types "cafe", results will be displayed for entries containing either "cafe" or the more Francophonically correct "café". Author Carlos Bueno also mentions its importance in Auto-Complete so that an English user can type something like "Lo" and pull up "López" from the system.

As with case folding (merging capital "A" and lowercase "a" as the 'same' letter), the strategy is to list variants and assign them to one "archiletter". This is generally recommended for English because the writing system has absolutely no accents at all, so English monolinguals really do tend to get flummoxed by typing accent codes. If the audience is not English, the question could be trickier (really, you don't want to merge ñ and n in Spanish) and may depend on whether the user needs to search data from an English language source or a source where accents are dropped (e.g. chat/e-mail).

The Accent Folding article by Carlos Bueno is a general introduction but does include snippets of code and links to additional samples and information about pre-existing code libraries. He also points out the various pitfalls (e.g. Should French thé 'tea' also pull up English the? - it will depend on the audience and database).

Another concept that Bueno brings up is that some language like German have conversion schemes in place such that ü becomes "ue" (e.g. Tübingen ~Tuebingen, not just "u". On the other hand, if your text is NOT from a source who knows that, you may be dealing with plain "u" as well (e.g. Müller ~ Muller). Bueno then mentions the transliteration issue and points out that there are usually multiple transliteration schemes (the German ü > ue or u is a simple example). Sensibly, he puts that issue to the side.

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments