This post was created after having worked with multiple customers requesting information about High Availability options on Azure for different databases. In general High availability has to be thought ground up for a system. The power, gateway access, internal redundant network paths, hardware – disk/cpu/mc fail – everything has to be thought through. Since Azure takes care of the utilities(power/nw paths etc) – we need to focus on local HA within datacenter and DR across datacenter to on-premise/another location.
Most folks are aware of and have used some kind of cluster based services which provides local fail over. For example disk issues are taken care by storage systems and respective raid systems. Some of the cluster services also provide load balancing of requests – or others which redirect reads to secondaries. Their working is not the focus for this discussion. We are also going to assume client side XA transactions are generally not adopted or great idea.
Traditionally Clustering technology and good SAN was required to provide local HA for database. Cloud platforms have created level field and obsoleting the requirement of expensive SANs and heavy cluster requirements (In some databases earlier – machine configuration had to be exactly same etc).
Conceptually most of the relational databases follow above picture in faith. There are various mechanism to synchronize databases but some databases allow seamless fail over to secondary, read replicas and client connection management. In pure cloud setup in one data-center one will try to synchronized transactions either through log push/pull mechanism in two local instances and take the log and apply to secondary in other locations for disaster recovery purposes. This is very similar to on-premise setup of these databases.
With SQL server AlwaysOn on Azure is ready at present for use. The fail over for clients is automatic and secondary takes over the primary as databases are in sync.
With Oracle at present Active Dataguard running in pretty much the same way as SQL server is the prefered path. At present SCAN feature (which is part of the RAC) with managed fail over is not present in ADG. For SQL server folks SCAN is concept similar to Listener in AlwaysOn. This does mean there is impact on RTO. There is Golden Gate option too for folks requiring that kind of functionality.(basically multi-master). There are bunch of features in latest Oracle databases to help automatic client failover but applications need to take those into account.
Postgres SQL too has choices (I have not tested streaming) for HA . Admittedly it is little low level in terms of precautions – for e.g., requiring – adjusting log file sizes -wal_keep_segments for streaming. Edit – And no there is no suggestion to go around and try to do custom cluster solution or pgpool/pgbouncer as clouds have intolerance for holding IP addresses. Pgpool solves is very expansive and solves too many problems. This basically means one has many things to look for when there are issues. For .Net/Sql folks – think of PgPool akin to Sql AlwaysOn (minus the postgres specific features like load balancing , replication, parallel query), PgBouncer is just out of process – connection pooling process.
MySql -has well established HA/DR – master HA (simplest being master slave combination). But anything requiring proprietary cluster service/hw etc is not suggested for cloud environment. MHA (from DeNa) is closest to SqlServer AlwaysOn in terms of concepts. At present I do not have idea about Tungsten which also provides application failover. Another approach is corosync & pacemaker.
Concept of providing availability is same across all the relational databases
1. Create virtual network – to isolate the workload(more importantly retain an IP in Azure)
2. Create availability set of two database servers to ensure the pair does not go down at once for maintenance.
3. Create primary server and choose option for pushing/pulling data from secondary(push requires secondary to be up) – I am glossing over setting up wsfc-for-sqlserver, listener, active directory config, setting up archive mode on primary, taking backup and applying it to secondary)
4. Create additional read-replicas if the technology from the database allows it.
5. Create remote location for DR to on premise in asynchronous way. Another cloud location requires cloud to provide inter data center connectivity.
6. If technology like listener as in SqlServer is supported – configure it to provide failover for local setup
Monitoring of the setup is required to detect network partition, workload issues on either of the machines.
For Azure – availability set provides concept for ensuring machines within an availability set do not all go down for a planned maintenance job (by Azure team). For Cloud 2 use Multi-AZ at min for local HA for now to host the machines and push out to region for DR for database. For cloud 3 one has to use again a concept of zones to isolate – “locally” plus replicate to region for database. (for truly moving application – lot more has to be done, but we will restrict here to the database)
Update– This post does not contain lot of material about Application continuity – unfortunately it depends on the client driver and intelligence built into it for identifying failure of primary. Oracle has something called SCAN/FAN+ONS, SQL server client can also detect failure and try for HA, appservers like jboss/weblogic’s have multipool HA facility.