As GUS is at heart a mapping of code points, code points from a common character set are prerequisite. Presently only those syllabaries that are a part of the Unicode 3.1 standard and the ISO-10646 Basic Multilingual Plane. We may venture out of the BMP as additional syllabaries are encoded and as volunteers become available.
Covered | Not Covered | |
---|---|---|
Ethiopic U+1200 - U+1357 |
Ahom Bakri Sapalo Balinese Batak Bengali Brahmi Bugis Buhid Burmese Byblos Celtiberian Cham Cypriot Dehong (Tai Nua) Devanagari Eskaya Gujarati Gurmukhi (Punjabi) Hangul Hanunóo Iberian Inuktitut Javanese Kannada Kharosthi Khmer La Mojarra Lao Lepcha Limbu Linear A Linear B Lontara / Makasar Malayalam Manchu Maya Meroitic Ndjuká Newari Old Persiaan Oriya Pahawh Hmong Phags-pa Redjang / Kaganga Sinhala Sorang Sompeng Tagalog Tagbanwa Tai Dam Tai Lue Tamil Telugu Thai Tibetan Tocharian Vai Varang Kshiti |
|
Cherokee U+13A0 - U+13F4 |
||
Carrier (Dakelh) Cree (Nehiyaw) Inuktitut U+1400 - U+167F |
||
Hiragana U+3041 - U+3094 |
||
Katakana U+30A1 - U+32FE |
||
Yi U+A000 - U+A48C |
These syllabaries ("alphasyllabaries") rely on diacritic marks, as a separate character code, to compose a syllable from a base consonant. The base letter and the diacritic being two separate character codes, the same text processing issue does not arise as it does for syllabaries not employing diacritics (where a single character code represents each syllable).
Stated another way, the GUS project addresses all Unicode code points that represent at least a "CV" symbol (to a maximum extent of a "CVCT" symbol) where the "V" component can not be modified by other symbols as a means to form the base of a new syllable.
Syllabaries | Alpha Syllabaries | |||
---|---|---|---|---|
CV Syllabaries | CVC Syllabaries | CVT Syllabaries | C(V)+D Syllabaries | |
Carrier (Dakelh) Celtiberian Cherokee Cree (Nehiyaw) Ethiopic Hiragana Inuktitut Katakana Linear A Linear B Old Persian Cuneiform Ndjuká Meroitic Byblos |
Eskaya La Mojarra Maya Vai |
Yi |
Ahom Bakri Sapalo Balinese Batak Bengali Brahmi Bugis Buhid Burmese Byblos Cham Cypriot Dehong (Tai Nua) Devanagari Eskaya Gujarati Gurmukhi (Punjabi) Hangul Hanunóo Iberian Inuktitut Javanese Kannada Kharosthi Khmer La Mojarra Lao Lepcha Limbu Linear A Linear B Lontara / Makasar Malayalam Manchu Maya Meroitic Newari Old Persian Cuneiform Oriya Pahawh Hmong Phags-pa Redjang / Kaganga Sinhala Sorang Sompeng Tagalog Tagbanwa Tai Dam Tai Lue Tamil Telugu Tibetan Thai Tocharian Vai Varang Kshiti |
A syllabic family is a set of symbols derived from a common phoneme or glyph. Usually this means all syllables with a common initial consonant. The term "series" is interchangeable with "family".
Some syllabaries (Ethiopic for example) make use of the notion that each syllable represents a single "form" (or family member) in a syllabic series. In Roman script the best analogy would be the distinction between "B" and "b" as two forms of the same letter. A syllabary may be thought of as a script where each letter has a new case that corresponds to each vowel in the language using it. So in an English syllabary we would have "cases" for at least "Ba", "Be", "Bi", "Bo", "Bu". Of course the "cases" now become more important to writing than they did before as the each carry an implicit vowel.
These cases, which we here after to as a "form" or "order", may have proper names. Rather than derive a unified naming convention for the syllabic orders, we have opted to enumerate them beginning from 1.
In some cases more than one language will make use of a given syllabary and may do so in a unique way. When different consonant or vowel associations occur for a given symbol the consequence to GUS would mean a competing mapping of the symbol. Which then do we decide is the "correct"?
It is intended that the GUS table be applied in localised software and so the GUS table will have to be localised. Perhaps broken into components based on script and language, specifics are as of yet to be worked out by project participants. The normative form of GUS will make use of traditional rules to the extent possible, hopefully avoiding bias towards present day languages.
While vowel assignments may change between languages, it is not expected that form ordering will change accordingly. The present treatment of forms assumes that they are locale independent and thus fixed. Note that this also means that forms and vowels become independent from one another.
Some syllables may represent CCVCC type patterns. The project team will need to be careful to not be confused by English transcription when mapping these syllables. When an IPA symbol is not available for a CC pattern, two IPA consonant symbols will be used as a single position on the "C" axis.