GUS Frequently Asked Questions

  1. What syllabaries are covered by GUS?
  2. As GUS is at heart a mapping of code points, code points from a common character set are prerequisite. Presently only those syllabaries that are a part of the Unicode 3.1 standard and the ISO-10646 Basic Multilingual Plane. We may venture out of the BMP as additional syllabaries are encoded and as volunteers become available.

    Covered   Not Covered
    Ethiopic
    U+1200 - U+1357
    Ahom
    Bakri Sapalo
    Balinese
    Batak
    Bengali
    Brahmi
    Bugis
    Buhid
    Burmese
    Byblos
    Celtiberian
    Cham
    Cypriot
    Dehong (Tai Nua)
    Devanagari
    Eskaya
    Gujarati
    Gurmukhi (Punjabi)
    Hangul
    Hanunóo
    Iberian
    Inuktitut
    Javanese
    Kannada
    Kharosthi
    Khmer
    La Mojarra
    Lao
    Lepcha
    Limbu
    Linear A
    Linear B
    Lontara / Makasar
    Malayalam
    Manchu
    Maya
    Meroitic
    Ndjuká
    Newari
    Old Persiaan
    Oriya
    Pahawh Hmong
    Phags-pa
    Redjang / Kaganga
    Sinhala
    Sorang Sompeng
    Tagalog
    Tagbanwa
    Tai Dam
    Tai Lue
    Tamil
    Telugu
    Thai
    Tibetan
    Tocharian
    Vai
    Varang Kshiti
    Cherokee
    U+13A0 - U+13F4
    Carrier (Dakelh)
    Cree (Nehiyaw)
    Inuktitut
    U+1400 - U+167F
    Hiragana
    U+3041 - U+3094
    Katakana
    U+30A1 - U+32FE
    Yi
    U+A000 - U+A48C

  3. Why are syllabaries, like Hangul or Malayalam, not covered?
  4. These syllabaries ("alphasyllabaries") rely on diacritic marks, as a separate character code, to compose a syllable from a base consonant. The base letter and the diacritic being two separate character codes, the same text processing issue does not arise as it does for syllabaries not employing diacritics (where a single character code represents each syllable).

    Stated another way, the GUS project addresses all Unicode code points that represent at least a "CV" symbol (to a maximum extent of a "CVCT" symbol) where the "V" component can not be modified by other symbols as a means to form the base of a new syllable.

    Syllabaries   Alpha Syllabaries
    CV Syllabaries CVC Syllabaries CVT Syllabaries C(V)+D Syllabaries
    Carrier (Dakelh)
    Celtiberian
    Cherokee
    Cree (Nehiyaw)
    Ethiopic
    Hiragana
    Inuktitut
    Katakana
    Linear A
    Linear B
    Old Persian Cuneiform
    Ndjuká
    Meroitic
    Byblos
    Eskaya
    La Mojarra
    Maya
    Vai
    Yi Ahom
    Bakri Sapalo
    Balinese
    Batak
    Bengali
    Brahmi
    Bugis
    Buhid
    Burmese
    Byblos
    Cham
    Cypriot
    Dehong (Tai Nua)
    Devanagari
    Eskaya
    Gujarati
    Gurmukhi (Punjabi)
    Hangul
    Hanunóo
    Iberian
    Inuktitut
    Javanese
    Kannada
    Kharosthi
    Khmer
    La Mojarra
    Lao
    Lepcha
    Limbu
    Linear A
    Linear B
    Lontara / Makasar
    Malayalam
    Manchu
    Maya
    Meroitic
    Newari
    Old Persian Cuneiform
    Oriya
    Pahawh Hmong
    Phags-pa
    Redjang / Kaganga
    Sinhala
    Sorang Sompeng
    Tagalog
    Tagbanwa
    Tai Dam
    Tai Lue
    Tamil
    Telugu
    Tibetan
    Thai
    Tocharian
    Vai
    Varang Kshiti

  5. What is a syllabic "family"?
  6. A syllabic family is a set of symbols derived from a common phoneme or glyph. Usually this means all syllables with a common initial consonant. The term "series" is interchangeable with "family".

  7. What is a syllabic "form"?
  8. Some syllabaries (Ethiopic for example) make use of the notion that each syllable represents a single "form" (or family member) in a syllabic series. In Roman script the best analogy would be the distinction between "B" and "b" as two forms of the same letter. A syllabary may be thought of as a script where each letter has a new case that corresponds to each vowel in the language using it. So in an English syllabary we would have "cases" for at least "Ba", "Be", "Bi", "Bo", "Bu". Of course the "cases" now become more important to writing than they did before as the each carry an implicit vowel.

    These cases, which we here after to as a "form" or "order", may have proper names. Rather than derive a unified naming convention for the syllabic orders, we have opted to enumerate them beginning from 1.

  9. What about locales?
  10. In some cases more than one language will make use of a given syllabary and may do so in a unique way. When different consonant or vowel associations occur for a given symbol the consequence to GUS would mean a competing mapping of the symbol. Which then do we decide is the "correct"?

    It is intended that the GUS table be applied in localised software and so the GUS table will have to be localised. Perhaps broken into components based on script and language, specifics are as of yet to be worked out by project participants. The normative form of GUS will make use of traditional rules to the extent possible, hopefully avoiding bias towards present day languages.

    While vowel assignments may change between languages, it is not expected that form ordering will change accordingly. The present treatment of forms assumes that they are locale independent and thus fixed. Note that this also means that forms and vowels become independent from one another.

  11. How are consonant clusters treated?
  12. Some syllables may represent CCVCC type patterns. The project team will need to be careful to not be confused by English transcription when mapping these syllables. When an IPA symbol is not available for a CC pattern, two IPA consonant symbols will be used as a single position on the "C" axis.