Azure Linux THP

You should read the compatibility of your application with THP.(transparent huge pages)

How do you find its status

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled  
grep -i --color huge /proc/meminfo
sudo sysctl -a | grep hugepage

at present you will see cat /sys/kernel/mm/transparent_hugepage/enabled telling it is enabled [always]. Other commands are other ways to see the usage.

How do you modify it? 

1. Edit /etc/rc.local or better yet /etc/sysctl.conf  . WRT rc.local add

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag

2. Add “transparent_hugepage=never” to the kernel boot line in the “/etc/grub.conf” file.


Oracle – does not like THP.

Mongo – does not like THP and prefers 4k pages.

Cassandra – There was a thread on twitter and the google group wrt THP.  Looks like suggestion is to disable it.

Hadoop does not like THP.

Splunk does not like THP.

MySql does not like THP.

Postgres does not like THP.

What does it do – here are the details. 

Azure Linux THP

Azure Linux tip – swappiness

In general folks disable swap of memory bound processes for linux instances (ymmv).

How to detect swapfile is present

1. grep -i –color swap /proc/meminfo

2. swapon -s

3. free -m

You will get confirmation no swap is setup. If you check for swappiness via cat /proc/sys/vm/swappiness though you will see swapping of default 60 :). Question on your mind will be where it is doing the swapping.

What should you do ? In general no swapping is good thing, so setting that swappiness to 0 is good thing with default installation. In case you require swapfile(which you will – if you care about latest kernel changes), Add a swap file based off local disk(sdb1 on the /mnt mostly or ssd ) on the guest (do not add azure storage) for the instance.

How to modify swappiness  (for a web or file server) – echo 5  | sudo tee /proc/sys/vm/swappiness or – sudo sysctl vm.swappiness= 5 – To persist this setting through reboots it is better to edit the /etc/sysctl.conf and ensure add the swapfile to fstab. No swapping is good for lucene workloads(solr/elasticsearch), databases (cassandra/mongo/mysql/postgres etc) but for stability reasons at high constantly peaked machines- it is good to have local disk/ssd as help

How to allocate swapfile  usually you will do it on local disk – use df -ah to get mount name) —- Allocate swapfile

– sudo fallocate -l 4G /mnt/swapfile (ensure size is double the memory size)

— Ensure root has access

– sudo chmod 600 /mnt/swapfile

– sudo mkswap /mnt/swapfile

– verify free -m

– add to fstab

– sudo nano /etc/fstab **** add line *** /mnt/swapfile none swap sw 0 0

To switch off swapping completely On Linux systems, you can disable swap temporarily by running:sudo swapoff -a.

To disable it permanently, you will need to edit the /etc/fstab file and comment out any lines that contain the word swap.

To ensure swapiness is switched after reboot

# Set the value in /etc/sysctl.conf
sudo echo ” >> /etc/sysctl.conf
sudo echo ‘#Set swappiness to 0 to avoid swapping’ >> /etc/sysctl.conf
sudo echo ‘vm.swappiness = 0’ >> /etc/sysctl.conf

Why to swap if nobody likes swapping and it is not 90s – For safety.  From kernel version 3.5-rc1 and above, a swappiness of 0 will cause the OOM killer to kill the process instead of allowing swapping. (ref – ) While you are at all of this do notice – df /dev/shm and see what you can do about it. Do you want to use it?

Ref –

  1. ElasticSearch – from strong  bootstrap.mlockall  – with  suggestion swappiness to zero to switch it off  and also instruct oom not to kill it,
    1. When Otis says something – I just follow it.
  2. Solr –  ( )
  3. Cassandra
  4. MySql –
  5. MongoDB –
  6. Postgres –  it is the same suggestion.
  7. Oracle –
Azure Linux tip – swappiness

Azure throttling errors

Most of the cloud services provide elasticity  creating illusion of unlimited resources. But many  times hosted services need to push back requests to  provide good governance.

Azure does a good job providing information about this throttling in various ways across services. One  of the 1st service was SQLAzure which provided error
to help the client to retry. Slowly now all services are providing information when they are throttled. Depending on whether you access native API or REST endpoint you get this information in different ways.  I am hoping slowly comprehensive information from services and underlying resources like network, cpu and memory starts percolating like storage so that client, monitoring systems can manage workloads.

Azure DocumentDB provides throttling error and also the time after which to retry.
(HTTP error 429 ) . It definitely is ahead of other services for providing this exclusive information.

Azure Storage on other hand provides information to the native client so that it can back off retry. It also pushes this information into metrics. A great paper exists which provides information about Azure transactions and capacity.

SQL Azure Throttling    – was one of the 1st services to provide throttling information to due to crud/memory operations(45168,45169,40615,40550,40549,40551,40554,40552,40553).

Azure Search throttling provides HTTP error 429/503 so that client can take proper action.

Azure Scheduler provides HTTP Status 503 as it gets busy and expects client to retry.

Azure Queue, Service Bus Queue both send back 503 which REST clients can take advantage of.

Biztalk services  provides “Server is busy. Please try again”

Over # of years we always request customers to exploit Azure and one of the ways is to actually is to work with hosted services and plan workloads by catching these kind of errors. Some of the customers like SQLAzure’s throttling so much they wished they want some of those soft/hard throttling errors in on-premise database.

Most of the Azure services do not charge when quotas are hit or throttling is done. Idea for the client is to back off and try again. I hope though the “monitoring” becomes better say for example in case of biztalk services – a client should be able to query the “busy-ness” since it has to try after systems becomes less busy. SQlAzure’s retry logic has been the well codified and understood over years.

Just in case you wonder other public cloud services too have throttling?  Public cloud services are shared infrastructure and implement throttling for governance. Throttling is implemented and it is exposed in different ways. DynamoDB for example – has 400 series of error codes with specifically LimitExceededException,  ProvisionedThroughputExceededException, ThrottlingException as an example. Almost every service has 400 series of errors with Throttling as specific exception.

Azure throttling errors

Who owns our past

This is a very old draft of a thought. Finally after 10 years it is time to publish much smaller version.

As Search engines dominate the digitization an old fear comes back to haunt. What will be the story that a future generation will hear or read. Will they ever get time and exposure to explore different points of view? Will set of points of view dominate the narration?

I have always wondered how historians tell the story in detail by looking at artefacts and what is left over physically. Sometimes they also reference  written stuff and interpret it to mean something. I have also wondered whether the person who wrote something did not wilfully change the narrative(bible to every other written word)? What about people who did not document anything? How many documents also record honestly everything that happened without bias of any kind.What about people where a new religion or new people erased everything from past. In digital world – search makes it frighteningly easy to obliterate your identity, information about community (news/views/existence).  Since the veracity of information is always suspect we need to better alternative or at least ensure national archives are physical and open up their versions of narrative.

Over last century newspapers have played an important role in dissemination of information and helping people formulate opinions. They do tend to focus on immediate, tactical. Long term investigations and viewpoints come out in magazines or obscure books or worse academic journals. Nobody has time to read through everything. Will future generation have enough information or they will get curated data?

It is in our interest for pushing search engines , social networks, aggregators, applications to share their ranking for relevance, not showing links, withdrawn due to dispute, withdrawn due to laws (government x).

Who owns our past

Left Facebook, planning to leave LinkedIn too…just infitismely small statistic

It is quite sometime since I left FB, I keep connected with friends via phone calls or physical meeting. Other friends are checked on by better half’s account occasionally. When FB announced the “experiment” results  – that was it for me.

I am skeptical of news aggregators, search sites and now social networking sites. I
was personally appalled by folks pushing in exaggerated versions of themselves form
their work life.

Lack of transparency
I do not know why certain news items appear in google news and others never make
it. There is neither explanation nor transparency. Over time there will be
dominance of only one “viewpoint” depending on the ownership of these places. It
scares the hell out of me.

I do not get much job offers anyway on linkedIn – and their pulse is very skewed to
interests which do not align to mine. The flow of stories, updates too seems less
on freshness and more on “linked to” etc..I do not have reason to follow these. I
rarely have conversations here.

Only thing which works is Prismatic – I wished Twitter would buy them. They bring
in stuff I am interested in, allow me to explore at my pace rather than twitter –
where stuff is lost after that instant or predefined window controlled by Twitter.
I am hoping there is more of context – more of “permanency” rather than immediate
day or 2. I wished there was better search, right now lot of stuff locked into
individual networks. Twitter’s search has to improve. Please. Just don’t focus on twitter stream, there is bigger world outside.

And yes I am keeping my offline/online conversation tool email handy. I will also
scare people away by putting random stuff on the blog. 🙂 .

Left Facebook, planning to leave LinkedIn too…just infitismely small statistic

Apache Flink – SQL support?

Nowadays the challenge is the moment you blink something new would have popped up in the Data processing world – either as a fullfledged backend or another middle layer in between.

Apache Flink promises to take advantage of “declarative” model but at present has java/scala api. One of the fastest way for any of these intermediaries to succeed is to adopt something which has been there for decades. The most successful DSL of our times is SQL – the way Apache Drill is doing via using optiq – is a great 1st step. Hopefully underlying store’s decade long work does not go waste.

It also ensure you do not accumulate karma against the demi-god DBA who does not look forward to learning new things anymore. He joined that world because there were few things to do. Storage layout, profiler, queries, indexes, backups-restores, HA/DR.. Kidding. Just kidding.


Apache Flink – SQL support?

10 things I wished my datastore would do (updated: Is DocumentDB my savior?)

We use datastores generally to ingest data and try to make some meaning out of it by means of reports and analytics. Over years we have had to make decisions in terms of adopting different stores for “different” workloads.

Simplest being the Analysis – where we offload to pre-aggregated values with either columnar or distributed engines to scaleout the volume of data. We have also seen rise of stores which allow storage of data which is friendly for range of data. Then we have some which allow very fast lookups, maturing to doing aggregations on run. We have also seen use of data-structure stores – the hash table inspired designs vs the ones which don sophisticated avatars (gossips, vector clocks, bloom filters, LSM trees).

That other store which pushed compute to storage is undergoing massive transformation for adopting streaming, regular oltp (hopefully) apart from its regular data reservoir image. Then we have the framework based plug and play systems doing all kind of sophisticated streaming and other wizardry.

Many of the stores require extensive knowledge about the internals of the store in terms of how data is laid out, techniques for using right data types, how data  should be queried, issues of availability and taking decisions which are generally “understandable” to the business stakeholders. When things go wrong – the tools differ in range from just log error to actual “path of the execution” of the query. At present there is lot of ceremony for thinking about the capacity management, issues around how data changes are logged and should be pushed to another location. This much of detail is great “permanent job guarantee” but does not add lot of value in long term for the business.

2014-22nd Aug Update – DocumentDB seems to take away most of the pain –

  1. Take away my schema design issues as much as it can

What do I mean by it? Whether it is traditional relational databases or the new generation no-sql stores. One has to think through either ingestion pattern or the query pattern to design the store representation of entities. This by nature is productivity killer and creates impedance mismatch between storage and representation in application of the entities.

Update (2014-22nd Aug) – DocumentDB – need to test for good amount of data and query patterns but looks like – with auto-indexing, ssd we are on our way here.

  1. Take away my index planning issues

This is another of those areas where lot of heart burn takes place as lot of innards are exposed in terms of the implementation of the store. This if done completely automagically would be great-2 time-saver. Just look at the queries and either create required indexes or drop them. Lot of regression issues for performance are introduced as small changes start accumulating in the application and are introduced at database level.

Update (2014-22nd Aug) – DocumentDB does it automatically , has indexes on everything. It only requires me to drop what I do not need. Thank you.

  1. Make scale out/up easier

Again this is exposed to the end application designer in terms of what entities should be sharded vertically or horizontally. This ties back to 1 in terms of queries ingestion or query. This makes or breaks the application in terms of performance and has impact on evolution of the application.

Update (2014-22nd Aug) – DocumentDB makes it no brainer again. Scaleout is done in CU. Need to understand how the sharding is done.

  1. Make the “adoption” easier by using existing declarative mechanism for interaction. Today one has to choose the store’s way rather than good old DDL/DML which is at least 90% same across systems. This induces fatigue for ISVs and larger enterprises who look at cost of “migration back and forth”. Declarative mechanisms have this sense of lullaby to calm the mind and we indulge in scaleup first followed up scaleout (painful for the application).

Make sure majority of the clients are on par with each other. We may not need something immediately for a rust. But at least ensure php, java, .net native and derived languages have robust enough interfaces.

Make it easier to “extract” my data in case I need to move out. Yes I know this is the least likely option where resources will be spent. But it is super-essential and provides the trust for long term.

Lay out in simple terms roadmap – where you are moving so that I do not spend time on activities which will be part of the offering.

Lay out in simple terms where you have seen people having issues or wrong choices and share the workarounds. Transparency is the key. If the store is not good place for doing like latest “x/y” work – share that and we will move on.

Update (2014-22nd Aug) – DocumentDB provides SQL interface !

  1. Do not make choosing the hardware a career limiting move. We all know-stores like memory. But persistence is key  for trust. SSD/HDD, CPU/Core, Virtualization impact – way too much of moving choices to make. Make 70-90% scenarios simple to decide. I can understand some workloads require lot of memory or only memory – but do not present swarm of choices. Do not tie down to specific brands of storage or networking which we cannot live to see after few years.

In the hosted world – pricing has become crazier – Lay out in simple to understand terms how costing is done. In a way licensing by cores/cpu was great because I did not have think much and pretty much over-provisioned or did a performance test and moved on.

Update (2014-22nd Aug) – DocumentDB again simplifies the discussion, it is SSD backed and pricing is very straightforward – requests – not reads, not writes or indexed collection.

  1. Resolve HA /DR in reasonable manner. Provide simple guide to understand hosted vs host your own worlds. Share in clear manner how should the clients connect, failover. We understand Distributed systems are hard and if store supports distributed world – help us navigate the impact, choices in simple layman terms or something we are already aware of.

If there’s an impact in terms of consistency – please let us know. Some of us care more about it than others. Eventual is great but the day I have to say – waiting for logs to get applied so that reports are not “factual” is not something I am still gung-ho about.

Update (2014-22nd Aug) – DocumentDB – looks like in local DC it is highly available. Assuming cross DC DR is on radar. DocumentDB shares available consistency levels clearly.

  1. Share clearly how monitoring is done for the infrastructure in either hosted/host your own cases. Share a template for “monitor these always” and take these z actions – sort of literal rulebook which makes again makes adoption easier.

Update (2014-22nd Aug) – DocumentDB provides oob monitoring, need to see the template or the 2 things to monitor – I am guessing latency for operation in one and size is another. I need to think through the scaleout unit. I am sure more people push – we will be in better place.

  1. Share how data at rest, data in transport can be secured, audited in simple fashion. For the last piece – even if actions are tracked – we will have simple life.

Update (2014-22nd Aug) – DocumentDB – looks like admin/user permissions are separate. Data storage is still end developer responsibility.

  1. Share simple guide for operations, day to day maintenance – This will be a life saver in terms of x things to look out for, do backups, do checks. This is how to do HA, DR check, performance issue drilldown – normally part of the datahead’s responsibility. Do we look out for unbalanced usage of the environment? IS there some resource which is getting squeezed? What should we do – in those cases?

Update (2014-22nd Aug) – DocumentDB – looks like cases when you need older data because user deleted something inadvertently is something user can push for.

Points 1-4 make adoption easier and latter help in continued use.

10 things I wished my datastore would do (updated: Is DocumentDB my savior?)