To avoid business impact due to those failure, Disaster recovery solution is very much needed on critical Cloud. Disaster recovery solution provide business continuity in case of complete fail over of primary site. In case of a disaster situation, primary site workload would be switched to Replica site and would be made as active site so that businesses are not affected.
DR solution focus on switching to Replica site with defined RTO/RPO, so in case such a disaster occurs, the customer is notified of the maximum time in which his system will come back online.
NEC has done a PoC to try out the few DR solutions and checked their feasibility and stability. It was executed on single and miltinode env and collected the results of DR scenarios.
Market Trend for DR solution:
Although the concept of DR in cloud is still nascent, a lot of SMBs are beginning to opt for DR to guard their business applications.
It is an attractive alternative for companies that are strapped for IT resources and who have a secondary infrastructure which is not effectively utilized.
Having DR sites in cloud reduces Data Centre costs/space, IT resources, IT Infra leading to significant cost reduction.
Each leading Cloud Service Provider ( AWS, Azure, Openstack ) now provide DR solutions via from their own end or via third-party software.
Multiple options for DR mechanism:
Option 1: Use Backup Images to recover at DR cloud
- Primary cloud creates incremental backup images and saves them to backup storage at every predefined interval
- Backup storage is expected to replicate backup images consistently between primary site and DR site
- RPO is determined by how frequently incremental backups are performed
Option 2: Use storage replication to replicate data continuously
- Storage replication is established between two storage volumes upfront
- Every write that happens at primary site is written to storage volume at DR site
- Primary volume and secondary volumes are always in sync
- During DR, secondary volume is made r/w and new VMs are created with secondary volumes and the VMs are restarted
Option 3: This is kind of of the above two solutions leading to a hybrid solution.
- Hybrid DR solution combines backup and replication to create a single solution.
- Replication allows to copy data in volumes / data backend from Primary to DR cloud.
- VM Metadata can be copied using Backup tooling
- All other metadata can be copied directly from Primary to DR cloud.
- Data Storage/Backend can be used to synchronize the Backup images from Primary to DR Cloud
Comparison of different DR approaches:
|Criteria||Backup DR||Replication DR||Hybrid DR|
|Dependency on Storage Backend||No dependency on Backend for Replication||Dependency on Backend for Replication||Dependency on Backend for Replication
|Recovery Point Objective||RPO is large||RPO is small||RPO similar to Replication DR
|Recovery||Completely recoverable without manual intervention||Needs Manual intervention as VM metadata may not be recoverable.||Completely recoverable if supporting scripts exist.
Among all above 3 DR approach, this PoC tried the Backup based DR using Trilio data solution. Trilio is not open source but we found Trilio as one of the most stable and reliable for OpenStack Cloud DR Backup Approach.
We have tried it with single site as well as with multi site env:
It has been tested with ~35 tests including functional as well as system testing. Below are the summary of outcome:
Overall Test Summary:
System Level stats:
For complete details of this PoC with test & results details can be found @OpenStack_Cloud_DR_PoC.