Spark on Azure using Docker – works

For past few weeks trying out docker and found it useful to convey the need of lightweight containers for dev/test.

Although it works like git It presents nice extensions on/around lxc. lxc has extremely simple cli interface to use and run with(as a user I remember being excited by solaris containers long time ago). Docker makes it much more powerful by adding version and reusability imho.

I used it on Azure without issues. When Spark’s docker friendly release was mentioned by Andre it was on my to do list for long time. Intent was to run the perf benchmark using memetracker dataset – will get it on fullfledged cluster one of the days.

Update – 2014-10th-June – MSOpen technologies announces support for docker natively on Azure – http://msopentech.com/blog/2014/06/09/docker-on-microsoft-azure/

Everything mentioned at the repo worked without issues – I just cloned the docker scripts directly. The only change was for the cloning, I used following statement

git clone http://github.com/amplab/docker-scripts.git

Challenge with any new data system is to learn - import/export of data, easy query, monitoring , finding out root cause. That will require some work in real project - somewhere down the road. Got distracted by use of Go in docker in between. 
 
Spark on Azure using Docker – works

One thought on “Spark on Azure using Docker – works

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s