Phonetic Help

Quick Tips on Phonetic Help

Bangla Help Blog # Bangla Help Site

  • The Bengali characters might not appear unless your browser is configured for Bengali text. Make sure you have the proper fonts
  • Read the keymap (and please report discrepancies)
  • You can't save the html, but you can cut and paste into Unicode aware editors like Notepad, Word (?), Gedit and Yudit
  • Proper rendering may or may not be available for your platform (details), but as mentioned above, cut-and-paste should work.

Character Map

The following table shows the character map used for the conversion, i.e., what latin letters you need to enter to get the output you want. The column headed 'User map' is what you need. The 'Internal Map' column describes some parameters used in the code internally, and are given here mainly to ease debugging. This is discussed in more details below.

Vowels
Bengali CodepointUser mapInternal MapComments
অ, অ-কারaa
আ, ◌াaa, AA
ই, ি◌ii
ঈ, ◌ীii, ee, IIuse e_e for এএ
উ, ◌ুuu
ঊ, ◌ূuu, UU
ঋ, ◌ৃRRi, RiR
এ, ে◌ee
ঐ, ৈ◌aiEuse a_i for অই
ও, ে◌াoo
ঔ, ে◌ৗauOuse a_u for অউ
Consonants
Bengali CodepointUser mapInternal MapComments
kk
khkh
gg
ghgh
G, GNG
chc
chhch
jj
jhjh
J, JNJ
TT
ThTh
DD
DhDh
NN
tt
thth
dd
dhdh
nn
pp
f, phph
b, vb
bhbh
mm
yy
rr
ll
shsh
Sh, SS
ss
hh
.DX
.DhZ
Y, .yY
০-৯0-90-9not implemented yet
Miscellaneous
Bengali CodepointUser mapInternal MapComments
◌ং.n, MM
◌ঁ.N, CC
◌ঃ:, HH
||daanri

_nothingused to disambiguate strings like au, ee, etc.

#zero-width non-joinerneeded for khando-ta and hasanta in the middle of a word

The keymap is modelled on, but not exactly equivalent to, the default Bengali input map that ships with Yudit. The main difference is that it does not need an explicit `a' to indicate the Vowel Sign corresponding to অ (a). This is close to what most people would normally use to write Bengali using latin letters. The underscore character (_) may be used to denote an explicit hasanta at the end of a word.

There is usually no need for anything to indicate a `hasanta' (it is sometimes required in the middle of a word, when successive consonants are required to be displayed separately without forming a conjugate -- use the `#' sign there to indicate a ZERO-WIDTH NON-JOINER (U+20CC) ). যুক্তাক্ষরs (conjugates) and vowel signs need no special treatment.

How the Code works

To keep things simple, the processing is done in two stages. The first step is to convert the given input into an internal format. The `simplicity' comes from the fact that this format has to use only a one or two letter code for each bengali codepoint in the unicode chart (none for the more esoteric and rarely used points like ৠ ) [Note: this is without the trailing vowel, if any]. In the first stage of processing a more intuitive and flexible format is transformed into the internal format by simple string replacements.

Future Changes in the Keymap

I have no objection to adding new representations as long as they can be implemented easily (that is, no ambiguity, no backwards incompatibility, and can be transformed into the internal coding by a straight string replacement). Just let me know.

The code is very simple javascript, and can be easily modified to work with other keymaps. The code is GPL-ed, so you are free to modify it as you please, as long as you adhere to the GPL if and when you redistribute it. In which case, also let me know, I would like to keep a copy here. 

-------------------------------------------

-------------------------------------------

What Is This Thing Anyway ?

This page is essentially some JavaScript code to process latin (english) characters entered in a Textarea field and transform them into some other characters, which when interpreted as being encoded in UTF-8, represent Bengali characters. This is done according to a particular algorithm, explained briefly in the section above. Try playing around with it (enter some text that you think should make sense as bengali, completing the word by hitting the Space bar). If you don't understand what's going on, go away, this page is not for you (... unless you see lots of boxes, in which case read on).

What Do I Need to Get it to Work ?

You need

  • a browser that supports JavaScript
  • a Bengali Unicode font

The term `JavaScript' is somewhat ill-defined, but (at least the latest versions of) Internet Explorer, Mozilla and Konqueror all have enough support to run this.

I have only tested this with Truetype/Open Type fonts, but other fonts should also work. If you don't have a Bengali font (or your browser is not configured properly to display Bengali), you could download my Likhan font, or get one from the The Free Bangla Font Project or the Bengalinux page. The code assumes that the Likhan font will be used (via a stylesheet element), but this can easily be changed by removing or modifying the font-family: likhan line. [Sorry about this, but saw no other way for Konqueror, and IE doesn't seem to recognize Likhan as a Bengali font automatically.]

If you don't know much about the status of support for Indic languages on various operating systems, you should be aware that proper display of Bengali needs the display software to support Unicode and Open Type Layout Features in fonts. This should be available in IE on up to date Windows distributions, but not so much (yet) in Web Browsers on Linux (other options are available, though, for instance Yudit, or Gedit using recent versions of Pango and Freetype. Cut and paste from Mozilla into these editors should work.)

What Use Will This be ?

This is obviously a very Bengali-specific tool, but it could be easily extended to other languages. My reason for writing this was to function as a support tool for my Bengali Document Archive Project. Linux has nice editors for writing Bengali unicode, but Windows doesn't (Yudit, for example, is difficult to set up. And Windows users are typically somewhat challenged :-P when it comes to setting up software). It is potentially also useful in other contexts where a simple way of coming up with a piece of Bengali unicode text is required.

But This Is Already Doable, why bother ?

Yes, this can already be done. For example, ITRANS can convert LateX-style markup to many formats, including UTF-8. It even has a Web interface. The advantage of this is that it requires nothing more than a half-decent browser and a (any) Bengali Unicode font (rendering proper looking text is another matter, though). And you don't need a web connection to use it, if you save the page locally.