Batteries included has strong resonance

Gems of Python

Requirement – Find out immediately if the files are different and then point out the differences.

Solution – use – filecmp.cmp and then  difflib.ndiff. This is great for small size file.

a. filecmp.cmp(file1,file2,shallow=False))

b.
x = file1.readlines(); file1.close()
z = file2.readlines(); file2.close()

for line in difflib.ndiff(x,z):
if line.startswith(“+”) or line.startswith(“-”):
print(line)
2. Returning of multiple pieces of information.

Solution – use the return intuitively.

return 1, numfiles, thequeue  # just magic.

so at other side

status, numfilesprocessed,listprocessedfiles = funcname(params)

does the work.

3. Requirement : Working over two ordered lists

Solution – Use Zip.

for r, s in zip (vernaqueue,engqueue):
if False ==  (filecmp.cmp(r,s,shallow=False)):
fcompare(r, s,exfile)

4. Requirement – get list of lowercased complete filenames  of particular kind in a simple way

Solution : Sheer magic of list comprehension then move on to generators

extlist        = [".mdb"]

location     = “some directory”

filelist        = (os.path.normcase(f) for f in os.listdir(location)) # map functionality

filelist        = (os.path.join(location,f) for f in filelist if os.path.splitext(f)[1] in extlist)

Languages – Urdu – Hindi relatioship, Urdu – language of pakistan,

While working on some work, I got distracted enough by need of “urdu-Hindi” – names translation. In the process I met important pointers @ (via sepiamutiny)

1. Robert King’s paper describing the relationship between Urdu and Hindi. Although I do not have credentials of Bob, I vehemently oppose the views of the paper after cursory reading of the paper as his conclusions if are based on lonely planet list of words/sounds and his first paper interpretation of Urdu as “graceful” and Hindi as “chunky”. What is graceful to you is gibberish to others. The claim of scholarship ends due to futile generalization. (brutal – yeah – no need for newer classification “dental” when most of these things have existed for long” ) .

2. Removing illusions about urdu being the “uniting” factor of Pakistan and reconforming Punjabi being the main language.

3. Role of Syed (founder of aligarh university?)

4. Miserable English language and its inadequacies – directly from source of language log.

Digression

Pakistan’s need to improve the education and channeling growth, democracy to every strata(same as any other democracy). Mohajir’s view of percieved reason of failed state.

Resolutions of Televangelists for 2008 (funny)

Malyasian government backing down from demand of “saying only god is allah and should be referred to as such“. Well believe/pray as you want but do’t force others dude.

China government imposing strict quality checks on re-incarnation of tibetan budhist monks in an attempt to place their own rubber stamp.

When is ISCII better than UNICODE and vice versa

ISCII is great to store names(people/location etc) which do not vary across languages. Consider a 10 Million names database storing names of people which need to be picked up during reports across different languages. One row storage of ISCII can take care of the names and transliteration provided by  .Net encoding classes (similar effort can be applied to Unicode too but without lot of success) help display in various indic languages. In case Unicode encoding you will need to store a language specific name( this too could be useful if you are hell bent on correcting names/matras to suit local language) -thus multiplying storage cost.

The cost of storing ISCII is offset by need for translation into Unicode for display(IE is quite ahead in terms of display of unicode data with appropriate font)/capture (with help of INSCRIPT or local variant of phonetic or web based entry).

Indexing/sorting – language specific sorting can be different (Tamil is very different from other languages). A topic for another post alltogether.

Some more Indic language resources

Telugu Keyboard layout - similar to Nudi.

Hindi to Urdu transliteration

Great resource to understand the tamil/telugu/kannada scripts in one sitting.

Folks on linux head onto Raghu.
MacFans rush to xenotype.

ISCII referencehttp://varamozhi.sourceforge.net/iscii91.pdf
Great Kannada learning resourcehttp://www.cs.toronto.edu/~kulki/kannada

The former kannada resource has the old form of kannada which might be little different than what is available in print media , like anna, amma (aA is implied in these prounounciation but you will notice the script does not reflect the same – which is little difficult to get hands around for noobie)