The Balinese Language in Computer Processing
By Ida Bagus Adi Sudewa
v0.4 - March 15th, 2003
|
Change Control Sheet |
v0.3
|
The first version sent to Unicode experts
|
Dewa
|
March 10th, 2003
|
v0.4
|
- Now include guidelines of horizontal font size of syllables
- No more glyph alternatives
- Glyphs are now converted to bitmap graphics
|
Dewa
|
March 15th, 2003
|
|
|
|
This article is part of the Balinese Computing Project,
a project to computerize the Balinese language and script.
For the readers who are not familiar with the Balinese
script, please consult the article The
Balinese Alphabet. For the readers who want to know more about historical
and present use of Balinese script, `The
Contemporary Use of The Balinese Script' is a good starting point.
Both articles can be downloaded from this website. (http://www.babadbali.com/aksarabali)
|
|
A number of projects has been performed for Balinese script computerization.
Some of them are listed here, along with short descriptions.
|
Ida Bagus Made Jaya Martha has developed a Balinese
text editor as his final project for his Ir. (engineer) degree,
in 1991.
|
As a charity effort to preserve indigenous traditional
Balinese manuscripts, I Gusti Made Mantra and his team from PT
IBM Indonesia has executed a project that successfully transfer
palm leaves manuscripts' contents to computer images. [There are
no information whether there are some "character encodings" or
"computer fonts" produced here]
|
A foreigner, J Glavy, has created a Balinese
computer font "JG Aksara Bali" using TrueType technology. There
is no keyboard input method created, so one has to insert the
fonts using other ways. In MS Word, one can use the command Insert->
Symbol to insert fonts. Downloadable from:
www.geocities.com/jglavy/asian.html
|
I Made Suatjana so far has contributed the most
for the Balinese script computerization. Started with a font to
use for ChiWriterTM in the late eighties, then the TrueType font
"Bali Simbar" for Windows platforms in the mid-nineties. He also
took a step forward by develop a Balinese keyboard input for Microsoft
Word. All the texts in Balinese script in this article are typed
using his fonts and his keyboard input template. Downloadable
from
this website (http://www.babadbali.com/aksarabali/balisimbar.htm)
|
|
|
So far there is no character encoding defined for Balinese
script. The team in the Balinese Computing project is preparing a character
encoding for Balinese. The work now in progress- will be proposed
for inclusion in the Unicode standard. Most probably the encoding technique
will use the virama model that is common for Brahmic-derived scripts.
More information about Unicode: http://www.unicode.org
|
|
4 Levels of Glyphs
The possible glyphs for each layer are described below. Here we used
the convention A1-0-B1-B2 for layer name:
A1 Some vowel signs and sound killers
0 All syllables, some sound killers, gempelan
B1 gantungan, some vowel signs
B2 2nd level gantungan, some vowel signs also here attached to
gantungan
Different Horizontal Size of Syllable
Latin alphabet also has different size of letters. `W' is always considered
the widest one, while `i' is the thinnest one. In the Balinese script,
the syllable nya, na rambat, ga-gora are the widest one, and
the ca, na, ra, da, wa and some other syllable are the thinnest
one.
There are four horizontal sizes of syllable.
RA, PA, PA KAPAL
|
|
|
|
NA, CA, DA MADU, etc.
|
|
|
|
HA, KA, TA, etc.
|
|
|
|
NA RAMBAT, NYA, GA GORA
|
|
|
|
The different syllable size introduces issues to the presentation
of Balinese script in the computer:
- Some vowel signs must be centered against the syllable
- Some gantungan also have to be centered below the syllable
- GUWUNG have to be stretched according to the size of the base
syllable.
ULU, ULU SARI, PEPET, ULU CHANDRA, ULU RICEM, CECEK, and SURANG all
must be centered above the base syllable, according to the horizontal
size of the syllable.
There can also be six combinations of ULU/PEPET/ULU SARI with CECEK/SURANG.
When it happens, the combination is also must be centered.
Most gantungan also has to be centered.
GUWUNG must be stretched to follow the size of the base syllable.
Glyph Reordering
Some vowel signs appear before the basic syllables. For example, pa
+ e is written as taleng then syllable pa:
Split Vowel Signs, Split Gantungan / Gempelan
Some vowel signs and one gantungan- gempelan consist of two
glyphs. The rendering engine should be able to split the glyphs, do
approriate reordering, and position the glyphs correctly to the syllable.
Split vowel plus reordering is a tough work.
Ligatures
There are 35++ ligatures in Balinese script. The main problem with
these ligatures is that the original sequences are not necessarily
next to each other in the character encoding.
One example is the word plong. The ligature happens between
PA and TEDONG. But PA and TEDONG is not necessarily located adjacent
to each other, there is LA between them. Glyph ordering is necessary
to be performed before ligature can be formed.
Please note that this is an optional feature in the Balinese script,
and need not be implemented immediately in the future.
plong
Vowel Sign Attachments
SUKU and SUKU ILUT must be correctly attached to either the parent
syllable or the appended form of syllable. It is optional to have
the SUKU and SUKU ILUT seamlessly unified with the base syllable.
But it is mandatory to have SUKU or SUKU ILUT correctly positioned
under the final stroke of the base syllable. Please recall that only
PA KAPALs final stroke is not going down.
Another issue with SUKU and SUKU ILUT is that both of them are attached
to gantungan/gempelan, if gantungan/gempelan exist. Only SUKU and
SUKU ILUT behave like this, other vowel signs still attached to the
base syllable.
|
Bluluk |
|
The Second Level Gantungan
Fortunately, the only gantungan that can act as the second level gantungan
are gantungan ya, wa, and ra.
|
Briag |
|
|
Bangkuang |
|
Unfortunately, the second level form of gantungan ya or ra
still can be attached by vowel signs, with suku and suku
ilut are attached to the second level form while the other vowel
signs are attached to the base syllable.
|
Bangkrut |
|
|
Mencret |
|
Glyph Substitutions
Some syllable change form completely when it is attached with another.
|
ra + e = re-repa |
|
+ |
|
become |
|
|
la + e = le-lenga |
|
+ |
|
become |
|
Character Spacing
Base syllable length determines the length of one horizontal unit.
|
Bangkuang |
|
However, there are exceptions to this rule. If the base syllable is
appended by wa or ya or ra with suku or
suku-ilut, the next syllable is shifted away leaving some space
after.
|
Bangkrut |
|
Anomaly: One of the Dasa Aksara
ULU CANDRA can be attached to syllable to form the letters that are
holy letters. The examples are the letters ANG, UNG, and MANG.
ang ung mang
ULU CANDRA is never attached to syllable with other sign that appeared
above syllable, except for one case. One of the ten holy letters is
SING:
sing
This is the only exception. The rendering engine in charge should
recognize
SA SAGA + ULU + ULU CANDRA as special case, and rendered
accordingly.
|
|
This section will only cover briefly about font technologies. Links
and references will be given if one wants to know further about each
technology.
Computer Fonts
When computer was at its early age, the only script it understood was
only the Latin script. The computer screen was divided into grids of
cells; each one can be occupied by one Latin character or punctuations.
The size of the screen was fixed; the number of pixels inside one cell
was also fixed for each type of computers. One type of computer can
only display Latin script in one font only. The font technology at that
time is called the bitmap font. Characters' size cannot be enlarged
or reduced, nor to be made italic or bold.
Only until the advent of Macintosh computers that using
vector font started to emerge. Apple introduced its own invention the
TrueType font (TTF) that looks nice on screen while Adobe has PostScript
Type 1 font technology that was primarily used for printing or publishing.
Those two font technologies are still dominating the typography world
at the present time.
The difference between those two, subtle to most users,
are the curve formula used (TTF use quadratic while Type 1 use cubic
curves), the hinting calculation, and the file format. Currently TrueType
is more popular, thanks to OS market domination by Microsoft a TTF
endorser. But the two technologies can be used at the same time and
they don't kill each other.
More about font technologies can be found in: http://www.truetype.demon.co.uk/tthist.htm
Font with Intelligence
When computers became more accessible to people who cant read
Latin alphabet, the needs to represent more alphabets on computer became
apparent. Chinese-Japanese-Korean (CJK or Han) alphabet, Cyrillic, Arabic,
Hebrew, Indian family of alphabets, Thai, are the foremost alphabets
to be adapted by computers. These alphabets have different characteristics
compared to Latin. Arabic and Hebrew are written from right to left.
Indians and Thai alphabets are syllabic and have various signs attached
before, after, below, and above syllable. CJK alphabets are simpler
but the repertoire is huge. Arabic has a lot of ligatures and glyph
alternatives. The simple one-to-one mapping of character-to-glyph is
no longer sufficient. The assumptions of left-to-right text and inexistence
of sign attachment are no longer true. A better font technology is needed.
The first to mention is the OpenType font technology.
It extends either a TrueType font or Type 1 font to include features
supported by OpenType. Among those features are: ligatures, kerning
pair, anchor attachment, glyph substitution (one to many and many to
one). Microsoft, again, is one of the companies behind this technology.
Microsoft also provides an Application Programming Interface (API) called
Uniscribe. As an API, Uniscribe is more sophisticated but more powerful
compared to OpenType. If a script is too complex to be described in
OpenType, one should consider to utilize Uniscribe.
Apple also has its own modern font technology, called AAT (Apple Advanced
Typography). It is not as popular as OpenType and currently only supported
in the Macintosh platforms.
The last one, Graphite, brings the promise that it can support any
complex script, given the declarative syntax of its definition language.
For example, OpenType doesnt support glyph reordering needed by
South/Southeast Asian scripts, while Graphite does support it.
More information about OpenType: http://www.adobe.com/type/opentype/main.html
More information about AAT: http://developer.apple.com/fonts/TTRefMan/RM06/Chap6AATIntro.html
More information about Graphite: http://graphite.sil.org/
Font Editors
There are two leaders in typography software market: Macromedia's Fontographer
and FontLab. Well, Macromedia doesn't update Fontographer anymore, and
FontLab is still enhanced every year. So it is clear now who is the
true leader.
FontLab's demo can be downloaded for free from
http://www.fontlab.com/
Fontographer's web site is
http://www.macromedia.com/software/fontographer/
|
|
|
Please note that currently there are no proper input
methods for Balinese script in the computer. The concepts here are borrowed
from similar scripts from South/Southeast Asian
Keyboard Layout
First, look at the character repertoire. There are fifty-four syllables,
twelve vowel signs, ten independent vowels, ten numerals, five final
consonants, seven punctuations, making up ninety-eight in total. There
are ninety-two keyboard buttons available for use. There are sure ways
to squeeze ninety-eight to ninety-two, which still have to be designed.
A debate may arise whether gantungan or gempelan should
be assigned different button with the base syllable or not. If we can
assign the same button for base syllable and its gantungan or gempelan,
then there is enough buttons to hold all the letters. The consequences
is, the keyboard driver must be smart enough to identify
when user press a button, whether it is a base syllable or not.
Cursor Positioning and Editing
The Balinese alphabet has four layers, while the cursor is horizontal
only. A good example to follow is the way Thai implement cursor positioning
- Left and right buttons = skip one horizontal position
- Backspace button = delete the last sign but cursor doesnt
move; if there is only the syllable, delete the syllable and then
move cursor to the left.
- Delete button = delete syllable and all signs attached to it.
- Character typing = when user type a syllable, cursor advance as
usual; when user type a sign, cursor doesnt move while the sign
displayed in the previous syllable. If user want to type sign only,
he/she have to press space and then the sign.
- The input method should have the mechanism to validate text on the
fly, preventing erroneous text to be produced.
Input Method Technologies
Operating system vendors supply input method for commonly used script.
For less commonly used scripts, when the custom input method is required,
the following tools should help:
- Microsoft Word Template. The most common approach is to override
the keyboard key with character that is desired. Can be used only
in Microsoft Word word processor.
- Keyman Keyboard Developer. Can be fully customized to process any
input in any script. License is free for the user but not for the
developers. The official website is at: http://www.tavultesoft.com/keyman
- The most powerful one is an API from Microsoft: Microsofts
IME (Input Method Editor). One must be an expert in programming to
develop it. But once it is done, it blend seamlessly with the Windows
operating system, as if it was built-in from the vanilla Windows.
|
Searching
Searching is a little bit more complicated, provided there is one character
sequence that can be regarded as equivalent that is a syllable ended
with adegadeg AND that syllable with another syllable appended to it.
'dados is written as:
|
|
dados pangangge is written as:
|
|
When one search for the word dados in the
above text, the searching algorithm must be able to find that word in
the sequence of dados pangangge.
Text Validation On-The-Fly
This one should check for the condition when user entered invalid combination
of keys. For example, user should not press button for syllable HA followed
by ULU then by SUKU. Keyboard driver should respond to this and beep
for error, or if not possible to beep, just doesnt send the SUKU
to the word processor.
There should not be in any case, wrong character sequence
sent to word processor or rendering engine.
Word Parsing and Spell Checking
Since there are no spaces to separate words, the word-parsing algorithm
requires external dictionary to work with. The algorithm should compare
the word in the source text to word in the dictionary. If there is more
than one possibility of words, either complex context recognition must
be employed or manual intervention needed.
Example: the word bang (bank) and bangbang
(hole) both exist in the dictionary. It is impossible for the algorithm
to determine (without context) whether bangbang is actually
a lot of banks (in Balinese, repetition make a word plural) or a hole.
Spell checking can be used along with word parsing.
If a sequence of long syllables, say six syllables, doesnt exist
in the dictionary, it should be marked as spelling error. The process
then starts to validate again from the next syllable.
Line Break Algorithm
A process should analyze where to put special character for allowing
the word processor to insert a new line. The only rule of line breaking
is that a syllable must not be separated with any of its signs.
Collation and Sorting
There is no preexisting sorting method defined for Balinese. A sorting
method will be proposed in this project, and most probably it will use
the defined code point order.
|
|
Currently the native users of Balinese script enjoy
using the computerized font Bali Simbar. But there is apparent need
to move to a character-based encoding to enable more advance text processing
capabilities just like what other scripts in the world have already
achieved.
The Balinese script has presentation issues that are
similar with issues existing in other South/Southeast Asian script.
There are also other issues need to be decided on keyboard input method.
With the current technologies such as OpenType, Graphite, Keyman, Microsofts
IME, it is definitely possible to comprehensively computerized the Balinese
script.
|
|
|