OpenStack Cloud DR PoC

admin OpenStack

With migration of application to cloud systems, cloud systems have become very important. Minor failure in the systems or human errors can cause major business impact ( like 1, 2)

To avoid business impact due to those failure, Disaster recovery solution is very much needed on critical Cloud. Disaster recovery solution provide business continuity in case of complete fail over of primary site. In case of a disaster situation, primary site workload would be switched to Replica site and would be made as active site so that businesses are not affected.

DR solution focus on switching to Replica site with defined RTO/RPO, so in case such a disaster occurs, the customer is notified of the maximum time in which his system will come back online.

NEC has done a PoC to try out the few DR solutions and checked their feasibility and stability. It was executed on single and miltinode env and collected the results of DR scenarios.

  Market Trend for DR solution:

Although the concept of DR in cloud is still nascent, a lot of SMBs are beginning to opt for DR to guard their business applications.

It is an attractive alternative for companies that are strapped for IT resources and who have a secondary infrastructure which is not effectively utilized.

Having DR sites in cloud reduces Data Centre costs/space, IT resources, IT Infra leading to significant cost reduction.

Each leading Cloud Service Provider ( AWS, Azure, Openstack ) now provide DR solutions via from their own end or via third-party software.

  Multiple options for DR mechanism:

       Option 1: Use Backup Images to recover at DR cloud

  • Primary cloud creates incremental backup images and saves them to backup storage at every predefined interval
  • Backup storage is expected to replicate backup images consistently between primary site and DR site
  • RPO is determined by how frequently incremental backups are performed

       Option 2: Use storage replication to replicate data continuously

  • Storage replication is established between two storage volumes upfront
  • Every write that happens at primary site is written to storage volume at DR site
  • Primary volume and secondary volumes are always in sync
  • During DR, secondary volume is made r/w and new VMs are created with secondary volumes and the VMs are restarted

        Option 3: This is kind of of the above two solutions leading to a hybrid solution.

  • Hybrid DR solution combines backup and replication to create a single solution.
  • Replication allows to copy data in volumes / data backend from Primary to DR cloud.
  • VM Metadata can be copied using Backup tooling
  • All other metadata can be copied directly from Primary to DR cloud.
  • Data Storage/Backend can be used to synchronize the Backup images from Primary to DR Cloud

  Comparison of different DR approaches:

CriteriaBackup DR
Replication DR
Hybrid DR
Dependency on Storage Backend
No dependency on Backend for Replication
Dependency on Backend for Replication
Dependency on Backend for Replication
Recovery Point Objective
RPO is large
RPO is small
RPO similar to Replication DR
Recovery
Completely recoverable without manual intervention
Needs Manual intervention as VM metadata may not be recoverable.
Completely recoverable if supporting scripts exist.

 

Among all above 3 DR approach, this PoC tried the Backup based DR using Trilio data solution. Trilio is not open source but we found Trilio as one of the most stable and reliable for OpenStack Cloud DR Backup Approach.

We have tried it with single site as well as with multi site env:

                      Single Site

 

 

 

 

 

                      Multi Site

 

 

 

 

It has been tested with ~35 tests including functional as well as system testing. Below are the summary of outcome:

    Overall Test Summary:

 

    System Level stats:

 

 

 

 

 

For complete details of this PoC with test & results details can be found @OpenStack_Cloud_DR_PoC.

You May Also Like..

OpenStack Rocky PTG, Dublin Recap

OpenStack 3rd PTG @Dublin was held from Feb 26th – Mar 2nd, 2018 for discussion of Rocky development cycle. I reached […]

OpenStack QA Rocky PTG Summary

We had great and adventures Rocky PTG in Dublin. It was adventures because of snow storm in Dublin but good […]

Leave a Reply

Your email address will not be published. Required fields are marked *