When is ISCII better than UNICODE and vice versa

ISCII is great to store names(people/location etc) which do not vary across languages. Consider a 10 Million names database storing names of people which need to be picked up during reports across different languages. One row storage of ISCII can take care of the names and transliteration provided by  .Net encoding classes (similar effort can be applied to Unicode too but without lot of success) help display in various indic languages. In case Unicode encoding you will need to store a language specific name( this too could be useful if you are hell bent on correcting names/matras to suit local language) -thus multiplying storage cost.

The cost of storing ISCII is offset by need for translation into Unicode for display(IE is quite ahead in terms of display of unicode data with appropriate font)/capture (with help of INSCRIPT or local variant of phonetic or web based entry).

Indexing/sorting – language specific sorting can be different (Tamil is very different from other languages). A topic for another post alltogether.

Some more Indic language resources

Telugu Keyboard layout - similar to Nudi.

Hindi to Urdu transliteration

Great resource to understand the tamil/telugu/kannada scripts in one sitting.

Folks on linux head onto Raghu.
MacFans rush to xenotype.

ISCII referencehttp://varamozhi.sourceforge.net/iscii91.pdf
Great Kannada learning resourcehttp://www.cs.toronto.edu/~kulki/kannada

The former kannada resource has the old form of kannada which might be little different than what is available in print media , like anna, amma (aA is implied in these prounounciation but you will notice the script does not reflect the same – which is little difficult to get hands around for noobie)

Understanding Indian Multilingual computing

After my colleague and respected friend Deepak Gulati implemented the transliteration from Kannada to english and back for one of the projects, my interest in understanding the challenges increased beyond the windows api coding of locale specific world and initial palindrome check for multibyte characters :).

I found a great resource via IITM- Achraya explaining the same with context of Indian languages.

In the present project I also gained new respect for southern languages which do not add to the confusion like Hindi’s devanagari  or English does. Phonetic base of these languages helps in correct granular correct pronounciation and representation using a good script (brahmi origin).

Words are just C,V, CV,CCV or in extreme cases CCCV where C stands for consonant and V for vowel. Challenge lies in way the vowel is combined with consonant to provide that unique syllable representing the sound. Challenge with Devanagari representation of words like राष्ट्रिय, आत्मविश्वास,विश्व are pretty difficult to get right in the head, we just remember it by rote and as we did in childhood make fun of people south of Vindhyas for not getting matra, ling etc right.  I was spelling nazi (remember soup nazi) and supporter of Hindi as the national language. But lately – must say pretty lately I have come to appreciate the Tamil language (why there is no need for the people residing there to learn hindi – why that is false patriotism, why it stokes the “rule” by the north and resistance feeling, heard even ravan killing is sort of white north wnning over the dark dravidian) and its bretheren. With help of Nudi keyboard which is phonetic in nature – I am finding it easier to learn/type spoken kannada. The default keyboard is inscript which is common for Indian languages but then it loses the nuances of each language.

MS has taken steps over number of years to support multiple languages except providing local language OS :) – now I can appreciate it little better as it would be tough to get it right and then maintain it.

there are 4 parts

Entering ( keyboard, mapping to qwerty)

Displaying (font -glyph)

Storage (encoding – how many bytes to store a syllable)

working with data (sorting/searching/frequency count etc).

With this new information I have new found respect for Sanskrit which can pack so much of information in such short shlokas/stotras.