This is one of those posts where time taken to jot it down is more than time it takes to show stuff. After presenting at GIDS I was involved with bunch of gentleman who had simple argument about which cricket batsmen are similar.
I used http://import.io to strip down the statistics from cricinfo.com (thanks to both firms) for just top 50 batsmen.
Data is present at here.
The simple code is here.
As the plot shows :
# Sachin is in different league, Ponting/Kallis/Dravid are together
# Bradman, hutton, jayasuria, gayle are clustered together. (was expecting Gary sobers, Richards too)
# Richards, Hayden, Sehwag though are clubbed together.
# Expected Laxman to be with Dravid, Kallis but he is with Inzy and Javed – that is definitely a surprise.
# That other reliable player M yousuf is with G. Greenidge and C Loyd – great company Another surprising tidbit is that Pietersen is grouped together with D. Gower, Boycott and Clarke.
# Yes Maxwell comes together with Hayden, Symond in T20 grouping. But this is because we are considering everything including strike rate, sixes, not outs etc. There is a cricketer Bosman – who has similarities with this group. Never heard of him earlier. (t20 data cleaned up and present for playing around there itself)
# Although views can change based on kind of algorithm one chooses say for example ward but I like this biased. 🙂
A note about Import.io – it just works and helps you learn the patterns and very nicely appends data which is similar.