Language Codes: April 2008 Archives

Language Codes: Dialect vs. Macrolanguage


A while ago, I was writing about the difficulty of defining some language tags like Cantonese because even though it's called a dialect, it's really a separate language.

The SIL group is using a new term I think should become more common - the macrolanguage. A macrolanguage is basically a set of related languages that share a common "identity" even though speakers can't normally understand each other.

Macrolanguages happen when language spreads to different regions and changes, but the cultural or political unity remains. Other macrolanguages include Arabic, Cree, Hmong, Quechua (as spoken in the Incan Empire), and Norweigian. I suspect that you could thrown in some other candidates like German and Italian - (we'd have more if the Roman Empire had made it to the 21st century.)

In any case, The ISO-639-3 language tag standard has a set of macrolanguage mappings which show how different related languages can map to each other so that either Mandarin Chinese (cmn) or Cantonese (yue) can also be called Chinese (zh or zho)

I really hope this term takes hold...because I really think it will simplify other discussions about language tags. After all, it was just this year that a language technology guru claimed that English had no "true dialects." I think he meant to say that English hasn't reached macrolanguage status yet.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage ( for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (

Powered by Movable Type Pro

Recent Comments