The Balinese Language in Computer Processing

By Ida Bagus Adi Sudewa
v0.4 - March 15th, 2003

 


Change Control Sheet
Ver
Remarks
Initiator
Date
v0.3
The first version sent to Unicode experts
Dewa
March 10th, 2003
v0.4
  • Now include guidelines of horizontal font size of syllables
  • No more glyph alternatives
  • Glyphs are now converted to bitmap graphics
Dewa
March 15th, 2003

 
Prerequisites

This article is part of the Balinese Computing Project, a project to computerize the Balinese language and script.

For the readers who are not familiar with the Balinese script, please consult the article The Balinese Alphabet. For the readers who want to know more about historical and present use of Balinese script, `The Contemporary Use of The Balinese Script' is a good starting point. Both articles can be downloaded from this website. (http://www.babadbali.com/aksarabali)

 

Previous Works on Balinese Script Computerization

A number of projects has been performed for Balinese script computerization. Some of them are listed here, along with short descriptions.

Ida Bagus Made Jaya Martha has developed a Balinese text editor as his final project for his Ir. (engineer) degree, in 1991.

As a charity effort to preserve indigenous traditional Balinese manuscripts, I Gusti Made Mantra and his team from PT IBM Indonesia has executed a project that successfully transfer palm leaves manuscripts' contents to computer images. [There are no information whether there are some "character encodings" or "computer fonts" produced here]

A foreigner, J Glavy, has created a Balinese computer font "JG Aksara Bali" using TrueType technology. There is no keyboard input method created, so one has to insert the fonts using other ways. In MS Word, one can use the command Insert-> Symbol to insert fonts. Downloadable from: www.geocities.com/jglavy/asian.html

I Made Suatjana so far has contributed the most for the Balinese script computerization. Started with a font to use for ChiWriterTM in the late eighties, then the TrueType font "Bali Simbar" for Windows platforms in the mid-nineties. He also took a step forward by develop a Balinese keyboard input for Microsoft Word. All the texts in Balinese script in this article are typed using his fonts and his keyboard input template. Downloadable from this website (http://www.babadbali.com/aksarabali/balisimbar.htm)

 

Character Encoding

So far there is no character encoding defined for Balinese script. The team in the Balinese Computing project is preparing a character encoding for Balinese. The work –now in progress- will be proposed for inclusion in the Unicode standard. Most probably the encoding technique will use the virama model that is common for Brahmic-derived scripts.

More information about Unicode: http://www.unicode.org

 

Presentation Issues

4 Levels of Glyphs

The possible glyphs for each layer are described below. Here we used the convention A1-0-B1-B2 for layer name:

    A1 Some vowel signs and sound killers

    0 All syllables, some sound killers, gempelan

    B1 gantungan, some vowel signs

    B2 2nd level gantungan, some vowel signs also here attached to gantungan

Different Horizontal Size of Syllable
Latin alphabet also has different size of letters. `W' is always considered the widest one, while `i' is the thinnest one. In the Balinese script, the syllable nya, na rambat, ga-gora are the widest one, and the ca, na, ra, da, wa and some other syllable are the thinnest one.

There are four horizontal sizes of syllable.

RA, PA, PA KAPAL
NA, CA, DA MADU, etc.
HA, KA, TA, etc.
NA RAMBAT, NYA, GA GORA

 

 

The different syllable size introduces issues to the presentation of Balinese script in the computer:

  • Some vowel signs must be centered against the syllable
  • Some gantungan also have to be centered below the syllable
  • GUWUNG have to be stretched according to the size of the base syllable.

ULU, ULU SARI, PEPET, ULU CHANDRA, ULU RICEM, CECEK, and SURANG all must be centered above the base syllable, according to the horizontal size of the syllable.

There can also be six combinations of ULU/PEPET/ULU SARI with CECEK/SURANG. When it happens, the combination is also must be centered.


 

Most gantungan also has to be centered.



GUWUNG must be stretched to follow the size of the base syllable.


 

Glyph Reordering
Some vowel signs appear before the basic syllables. For example, pa + e is written as taleng then syllable pa:



Split Vowel Signs, Split Gantungan / Gempelan
Some vowel signs and one gantungan- gempelan consist of two glyphs. The rendering engine should be able to split the glyphs, do approriate reordering, and position the glyphs correctly to the syllable.

Split vowel plus reordering is a tough work.


Ligatures
There are 35++ ligatures in Balinese script. The main problem with these ligatures is that the original sequences are not necessarily next to each other in the character encoding.

One example is the word ‘plong’. The ligature happens between PA and TEDONG. But PA and TEDONG is not necessarily located adjacent to each other, there is LA between them. Glyph ordering is necessary to be performed before ligature can be formed.

Please note that this is an optional feature in the Balinese script, and need not be implemented immediately in the future.

plong


Vowel Sign Attachments
SUKU and SUKU ILUT must be correctly attached to either the parent syllable or the appended form of syllable. It is optional to have the SUKU and SUKU ILUT seamlessly unified with the base syllable. But it is mandatory to have SUKU or SUKU ILUT correctly positioned under the final stroke of the base syllable. Please recall that only PA KAPAL’s final stroke is not going down.



Another issue with SUKU and SUKU ILUT is that both of them are attached to gantungan/gempelan, if gantungan/gempelan exist. Only SUKU and SUKU ILUT behave like this, other vowel signs still attached to the base syllable.

  Bluluk

The Second Level Gantungan
Fortunately, the only gantungan that can act as the second level gantungan are gantungan ya, wa, and ra.

  Briag
  Bangkuang

Unfortunately, the second level form of gantungan ya or ra still can be attached by vowel signs, with suku and suku ilut are attached to the second level form while the other vowel signs are attached to the base syllable.

  Bangkrut
  Mencret


Glyph Substitutions
Some syllable change form completely when it is attached with another.

  ra + e = re-repa + become
  la + e = le-lenga + become



Character Spacing
Base syllable length determines the length of one horizontal unit.

  Bangkuang


However, there are exceptions to this rule. If the base syllable is appended by wa or ya or ra with suku or suku-ilut, the next syllable is shifted away leaving some space after.

  Bangkrut


Anomaly: One of the Dasa Aksara
ULU CANDRA can be attached to syllable to form the letters that are holy letters. The examples are the letters ANG, UNG, and MANG.

ang ung mang


ULU CANDRA is never attached to syllable with other sign that appeared above syllable, except for one case. One of the ten holy letters is SING:

sing


This is the only exception. The rendering engine in charge should recognize

“SA SAGA + ULU + ULU CANDRA” as special case, and rendered accordingly.

 

Rendering Engines


This section will only cover briefly about font technologies. Links and references will be given if one wants to know further about each technology.

Computer Fonts
When computer was at its early age, the only script it understood was only the Latin script. The computer screen was divided into grids of cells; each one can be occupied by one Latin character or punctuations. The size of the screen was fixed; the number of pixels inside one cell was also fixed for each type of computers. One type of computer can only display Latin script in one font only. The font technology at that time is called the bitmap font. Characters' size cannot be enlarged or reduced, nor to be made italic or bold.

Only until the advent of Macintosh computers that using vector font started to emerge. Apple introduced its own invention the TrueType font (TTF) that looks nice on screen while Adobe has PostScript Type 1 font technology that was primarily used for printing or publishing. Those two font technologies are still dominating the typography world at the present time.

The difference between those two, subtle to most users, are the curve formula used (TTF use quadratic while Type 1 use cubic curves), the hinting calculation, and the file format. Currently TrueType is more popular, thanks to OS market domination by Microsoft ­ a TTF endorser. But the two technologies can be used at the same time and they don't kill each other.

More about font technologies can be found in: http://www.truetype.demon.co.uk/tthist.htm

Font with Intelligence
When computers became more accessible to people who can’t read Latin alphabet, the needs to represent more alphabets on computer became apparent. Chinese-Japanese-Korean (CJK or Han) alphabet, Cyrillic, Arabic, Hebrew, Indian family of alphabets, Thai, are the foremost alphabets to be adapted by computers. These alphabets have different characteristics compared to Latin. Arabic and Hebrew are written from right to left. Indians and Thai alphabets are syllabic and have various signs attached before, after, below, and above syllable. CJK alphabets are simpler but the repertoire is huge. Arabic has a lot of ligatures and glyph alternatives. The simple one-to-one mapping of character-to-glyph is no longer sufficient. The assumptions of left-to-right text and inexistence of sign attachment are no longer true. A better font technology is needed.

The first to mention is the OpenType font technology. It extends either a TrueType font or Type 1 font to include features supported by OpenType. Among those features are: ligatures, kerning pair, anchor attachment, glyph substitution (one to many and many to one). Microsoft, again, is one of the companies behind this technology. Microsoft also provides an Application Programming Interface (API) called Uniscribe. As an API, Uniscribe is more sophisticated but more powerful compared to OpenType. If a script is too complex to be described in OpenType, one should consider to utilize Uniscribe.

Apple also has its own modern font technology, called AAT (Apple Advanced Typography). It is not as popular as OpenType and currently only supported in the Macintosh platforms.

The last one, Graphite, brings the promise that it can support any complex script, given the declarative syntax of its definition language. For example, OpenType doesn’t support glyph reordering needed by South/Southeast Asian scripts, while Graphite does support it.

More information about OpenType: http://www.adobe.com/type/opentype/main.html

More information about AAT: http://developer.apple.com/fonts/TTRefMan/RM06/Chap6AATIntro.html

More information about Graphite: http://graphite.sil.org/

Font Editors
There are two leaders in typography software market: Macromedia's Fontographer and FontLab. Well, Macromedia doesn't update Fontographer anymore, and FontLab is still enhanced every year. So it is clear now who is the true leader.
FontLab's demo can be downloaded for free from
http://www.fontlab.com/

Fontographer's web site is
http://www.macromedia.com/software/fontographer/

 

 
5. Input Method Issues

Please note that currently there are no proper input methods for Balinese script in the computer. The concepts here are borrowed from similar scripts from South/Southeast Asian

Keyboard Layout
First, look at the character repertoire. There are fifty-four syllables, twelve vowel signs, ten independent vowels, ten numerals, five final consonants, seven punctuations, making up ninety-eight in total. There are ninety-two keyboard buttons available for use. There are sure ways to squeeze ninety-eight to ninety-two, which still have to be designed.

A debate may arise whether gantungan or gempelan should be assigned different button with the base syllable or not. If we can assign the same button for base syllable and its gantungan or gempelan, then there is enough buttons to hold all the letters. The consequences is, the keyboard driver must be “smart” enough to identify when user press a button, whether it is a base syllable or not.

Cursor Positioning and Editing
The Balinese alphabet has four layers, while the cursor is horizontal only. A good example to follow is the way Thai implement cursor positioning

  • Left and right buttons = skip one horizontal position
  • Backspace button = delete the last sign but cursor doesn’t move; if there is only the syllable, delete the syllable and then move cursor to the left.
  • Delete button = delete syllable and all signs attached to it.
  • Character typing = when user type a syllable, cursor advance as usual; when user type a sign, cursor doesn’t move while the sign displayed in the previous syllable. If user want to type sign only, he/she have to press space and then the sign.
  • The input method should have the mechanism to validate text on the fly, preventing erroneous text to be produced.

Input Method Technologies
Operating system vendors supply input method for commonly used script. For less commonly used scripts, when the custom input method is required, the following tools should help:

  • Microsoft Word Template. The most common approach is to override the keyboard key with character that is desired. Can be used only in Microsoft Word word processor.
  • Keyman Keyboard Developer. Can be fully customized to process any input in any script. License is free for the user but not for the developers. The official website is at: http://www.tavultesoft.com/keyman
  • The most powerful one is an API from Microsoft: Microsoft’s IME (Input Method Editor). One must be an expert in programming to develop it. But once it is done, it blend seamlessly with the Windows operating system, as if it was built-in from the vanilla Windows.


Text Processing Algorithm


Searching
Searching is a little bit more complicated, provided there is one character sequence that can be regarded as equivalent that is a syllable ended with adegadeg AND that syllable with another syllable appended to it.

'dados’ is written as:

‘dados pangangge’ is written as:

When one search for the word ‘dados’ in the above text, the searching algorithm must be able to find that word in the sequence of ‘dados pangangge’.

Text Validation On-The-Fly
This one should check for the condition when user entered invalid combination of keys. For example, user should not press button for syllable HA followed by ULU then by SUKU. Keyboard driver should respond to this and beep for error, or if not possible to beep, just doesn’t send the SUKU to the word processor.

There should not be in any case, wrong character sequence sent to word processor or rendering engine.

Word Parsing and Spell Checking
Since there are no spaces to separate words, the word-parsing algorithm requires external dictionary to work with. The algorithm should compare the word in the source text to word in the dictionary. If there is more than one possibility of words, either complex context recognition must be employed or manual intervention needed.

Example: the word ‘bang’ (bank) and ‘bangbang’ (hole) both exist in the dictionary. It is impossible for the algorithm to determine (without context) whether ‘bangbang’ is actually a lot of banks (in Balinese, repetition make a word plural) or a hole.

Spell checking can be used along with word parsing. If a sequence of long syllables, say six syllables, doesn’t exist in the dictionary, it should be marked as spelling error. The process then starts to validate again from the next syllable.

Line Break Algorithm
A process should analyze where to put special character for allowing the word processor to insert a new line. The only rule of line breaking is that a syllable must not be separated with any of its signs.

Collation and Sorting
There is no preexisting sorting method defined for Balinese. A sorting method will be proposed in this project, and most probably it will use the defined code point order.

 

Conclusion

Currently the native users of Balinese script enjoy using the computerized font Bali Simbar. But there is apparent need to move to a character-based encoding to enable more advance text processing capabilities just like what other scripts in the world have already achieved.

The Balinese script has presentation issues that are similar with issues existing in other South/Southeast Asian script. There are also other issues need to be decided on keyboard input method. With the current technologies such as OpenType, Graphite, Keyman, Microsoft’s IME, it is definitely possible to comprehensively computerized the Balinese script.