GIDS 2014 – Learning from Data – A busy SW professional’s guide to Machine learning

Talk slide can be found here – slideshare, speakerdeck [updated 25th Apr 2014]

For last couple of years we have been helping customers do analysis of data to find insights which are not normally found through reports/kpis. I must extend gratitude here – first is Anand S – whose simple visual analytics helped me move forward. For years I have read Tufte and other books but mostly from usability angle. With D3.js, ggplot2 and wonderful pandas/scikit and R tool set life has become simpler to quickly clean, analyze data. Thanks Anand for sharing your stories.

Another is my team-mate – Vinod – who encouraged me to share rather than try to be perfect as he saw me struggle/learn through last 3 years with various engagements as cohort for customers. We realized active learning, reinforcement learning, bramble forest, NDCG, GINI co-efficient and the humongous maths/algebra/statistics is all important. Algorithm choice is important. But we spent more time in cleaning/organizing the data, we spent more time understanding/getting frustrated why data is not telling us something. Also thanks to Pinal for extending the invite and pushing.

Others are tools/discussion lists – Wise.io, BigML, SkyTree, R, Scikit, Pandas, Numpy, Weka and vowpal wabbit folks. The folks in Cloudera trying to integrate that English firm myrrix?, that database which does all approximation, all the small startups (import.io, mortardata, sumologic to everybody else).

Our own toolset in SQL Server was very dense and difficult to adopt – required way too much ceremony.

It is important to create simpler way to understand the path to the field, de-mystify it so that rest of travellers on this journey do not have issues that we had. With that intention I am presenting in GIDS 2014 Bangalore

I am excited as ever because we get to meet a different set of audience and the expectations are completely different.

Event Location: J. N. Tata Auditorium
National Science Symposium Complex (NSSC)
Sir C.V.Raman Avenue, Bangalore, India

The complete schedule is published here.

My talk is on 25th April 2014.

Time : 11:40-12:25 

This session is targeted at folks who are curious about machine learning and want to get a gist by looking at examples rather than dry theory. It will be a crisp presentation which takes various datasets and uses bunch of tools. Intention here is to share a way to comprehend what is involved at high level in machine learning. Since the ground is very vast this session will focus on applied usage of Machine learning with demos using Excel, R, Scikit and others. You will walk out with what it means to create a model using simple algorithms, evaluate a model. Idea here would be to simplify the topic and create enough interest so that attendees can go and follow-up on topic on their own using their favourite tool.

I am also hoping I will get to bump into familiar friends (Pinal Dave, Amit Bahree, Sunil, Praveen, Balmukund) and hopefully Erick H and the whole Solr gang, Siva (qubole), Regunath (Aadhar fame), Venkat.

GIDS 2014 – Learning from Data – A busy SW professional’s guide to Machine learning