Understanding Indian Multilingual computing

After my colleague and respected friend Deepak Gulati implemented the transliteration from Kannada to english and back for one of the projects, my interest in understanding the challenges increased beyond the windows api coding of locale specific world and initial palindrome check for multibyte characters :).

I found a great resource via IITMAchraya explaining the same with context of Indian languages.

In the present project I also gained new respect for southern languages which do not add to the confusion like Hindi’s devanagari  or English does. Phonetic base of these languages helps in correct granular correct pronounciation and representation using a good script (brahmi origin).

Words are just C,V, CV,CCV or in extreme cases CCCV where C stands for consonant and V for vowel. Challenge lies in way the vowel is combined with consonant to provide that unique syllable representing the sound. Challenge with Devanagari representation of words like राष्ट्रिय, आत्मविश्वास,विश्व are pretty difficult to get right in the head, we just remember it by rote and as we did in childhood make fun of people south of Vindhyas for not getting matra, ling etc right.  I was spelling nazi (remember soup nazi) and supporter of Hindi as the national language. But lately – must say pretty lately I have come to appreciate the Tamil language (why there is no need for the people residing there to learn hindi – why that is false patriotism, why it stokes the “rule” by the north and resistance feeling, heard even ravan killing is sort of white north wnning over the dark dravidian) and its bretheren. With help of Nudi keyboard which is phonetic in nature – I am finding it easier to learn/type spoken kannada. The default keyboard is inscript which is common for Indian languages but then it loses the nuances of each language.

MS has taken steps over number of years to support multiple languages except providing local language OS 🙂 – now I can appreciate it little better as it would be tough to get it right and then maintain it.

there are 4 parts

Entering ( keyboard, mapping to qwerty)

Displaying (font -glyph)

Storage (encoding – how many bytes to store a syllable)

working with data (sorting/searching/frequency count etc).

With this new information I have new found respect for Sanskrit which can pack so much of information in such short shlokas/stotras. 

Understanding Indian Multilingual computing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s