Disaster Recovery is in some way on every organisation’s agenda. It is no surprise that Gartner have now produced a magic quadrant for Disaster Recovery as a Service (https://www.gartner.com/doc/3033519?ref=AnalystProfile&srcId=1-4554397745) What caught my attention was the inclusion of Acronis in that document. Who? This piqued my curiosity to see what this “DRaaS” thing actually delivers.
What is Disaster Recovery?
The correct answer is the foreboding “it depends”, but we can do better than that. For any business, a reasonable disaster definition is “a substantial event which materially affects the ability to continue trading”. This would likely include events that prevent a primary datacentre from operating – e.g. power failure, flood, terrorist action – and also non-physical interruptions – virus, authentication failure, out of storage, networking failure, phones don’t work etc.
This is very much a sliding scale starting at “boring, take a note”, through “very inconvenient, fix it now” up to “we’re doomed, engage headless chicken mode”. Whatever name you decide to call it, it means survival of a very bad thing. A thing so bad, that you couldn’t possibly continue to operate your business when it happens, without your ready-to-go, fully tested, bright pink, recovery plan.
DRaaS Vendors are not in your Industry Vertical
The vendors that sell DRaaS aren’t in your industry vertical. If they were, they would be competing for your clients’ business rather than selling you a DRaaS thing. This is a significant point. They aren’t experts about your business operations or processes, so they avoid getting tangled in your process detail. The essential hours of analysis time unravelling your processes, understanding your services and mapping them to a clean VM migration plan doesn’t improve their bottom line. Quite the opposite.
On a scale of 1 to 10, I’d rate a “virtualisation only” proposition using AWS or Rackspace, that doesn’t call out Identity Management, IP addressing and naming services, about a 2 out of 10. The reasons for this are many. Consider a SPoT for identity management and certificate services? While storage block and operating system level replication appears to lift and shift the bulk, it doesn’t know anything about the authoritative copy. Half way through the failover, where does the live service exist? Does it exist? Are there ever two active at the same time? Does that make sense? Are business transactions well understood and ACID compliant at both primary and secondary (cloud) locations? Unfortunately, the majority of DRaaS vendors leave many more questions than they answer.
A well organised IT department can get a positive ROI from the DRaaS vendor, by asking the DRaaS sales consultant to stop talking and start providing SLAs and technical detail on interfaces and APIs used to achieve their migrations. This will help the IT department understand what they can offer the business and what it will cost, while explaining clearly how it fits with the agreed RTO/RPO and the related operational processes.
Disaster Recovery starts with disaster definition
In Gartner’s defence they do state that DRaaS was initially scoped to cover only Virtual Machines. I’m sceptical of the value in calling such a service “DR”. It is a long way short of any Disaster Recovery procedure I have helped build for a business, and I have a fair few under my belt. Make sure the scope and complexity of your Disaster Recovery journey is agreed up front, and change managed if the goalposts move. This will make the selection of technical tools and products clear and straightforward as well as scoping the test plan and schedule. Too many organisations avoid the test bit due to complexity and downtime requirements. If your risk appetite lets you run without testing, you run a unique business. Arguably worse still is testing which doesn’t actually demonstrate a capability to operate the business i.e. badly scoped and designed. Having a clear understanding of your IT Architecture and how it works, is fundamental to getting the test plan right, to enable the agreed services to be up and available within the RTO with the agreed amount of data missing according to the RPO. Yep – there will be data missing, such as those uncommitted SQL transactions, or the JMS msg queues that hadn’t persisted and replicated, or the NTFS journal that hadn’t been flushed to vmdk in the big fat virtual machine running FT but not between sites. Your Information systems architecture design should have dealt with all of those, to have the business operational with a balanced accounts receivable/payable ledger and no reporting anomalies post-disaster.
The Architect’s Viewpoint
The business service is king, deftly executing business transactions which rely on many underlying technologies and databases which must work in careful orchestration to avoid loose ends and grumpy auditors following a disaster event. It is this reason alone that makes asking the IT department to build a recovery plan for a business tantamount to the proverbial square peg into the triangular hole. It’s a team effort, that will only succeed with both the Business Analysts and IT Department pulling their weight in partnership.
Expecting too much from vendor provided DRaaS in dealing with the complexity of a disaster event is unlikely to end well. Backing the right horse for the right kind of race, with clinical preparation and clear demarcation of who is doing what, underpinned by your prioritised business service catalogue, will put you well ahead of the pack in the race back to operational health post-disaster.