When disaster strikes: How IT process automation helps you recover fast

What do these things have in common?

  • Power failure
  • IT hardware failure
  • Network failure
  • IT software failure
  • Human error

 

There’s a high probability that at least one of those things will be the cause of a major IT disruption. According to a Forrester Research 2013 survey, one third of respondents declared a disaster in the past five years. The top five culprits are listed above. Now imagine what it would be like if you were out for 30-plus hours, like 1 in 5 survey respondents experienced.

 

Don’t be scared, be prepared

 

With the risks so high, IT organizations must implement and continuously test and update disaster recovery (DR) and business continuity (BC) procedures. But testing DR procedures is a time-consuming and resource-intensive task that involves multiple subject-matter experts (SMEs) from across IT. A typical test for a large organization can involve dozens of people on multiple conference calls for up to a full day.

 

Its little wonder why testing typically happens so infrequently. The Forrester survey revealed 39 percent of firms conduct a full test — a live or simulated failover of all infrastructure at a site — only once a year. In fact, DR procedures really should be tested every time a major change is implemented on an application.

 

Orchestrate your recovery

 

In many ways, IT process automation and orchestration is a perfect fit for DR procedure testing, reducing how many resources you expend and improving success rates.

 

Consider the characteristics of any disaster recovery or failover exercise:

  • Requires a number of tasks that need to be performed in a very specific sequence
  • Tasks often span a number of different IT domains — server, network, storage, and others
  • Tasks require a number of different SMEs, including network engineers, database administrators, server administrators, and others
  • Success depends on coordination and handoffs between these SMEs

 

IT process automation and orchestration makes all of this faster and easier. By creating workflows that tie together diverse tools, processes and domains, the risk of failure is significantly reduced. And because workflows capture and essentially document the process information, you also protect yourself from risks that key personnel or groups will be unavailable.

 

 FREE: The new HP Operations Orchestration Community Edition

                       

 download now.png

 

How OO workflows automate disaster recovery of an email system

Let’s look at an example of how HP Operations Orchestration drives efficiency and reduces errors by automating a number of repetitive and tedious tasks. Below is an HP OO workflow for automating the disaster recovery procedure for an email system:

 

 

Implementation of a disaster recovery process using HP OO.png

Figure 1: Implementation of a disaster recovery process using HP OO

 

The HP OO workflow above can may be triggered when a change ticket declaring the DRP event is approved. Here are the steps it follows:

 

  1. The DR event is declared (real or test)
  2. Verify that the change requests in service desk systems (such as HP Service Manager) are approved
  3. Verify that network is operational
  4. Validate the health of the destination systems, including server and storage
  5. Verify that the configuration of the destination system is same as source system, including database (SQL Server), application servers (Exchange) and Web servers.
  6. Clone the destination server, if source and destination are not same
  7. Disable monitoring, clustering on the primary systems
  8. Perform failover tasks:
    1. Disconnect users and disable new connections
    2. Open connections into destination systems
    3. Reroute Domain Name Systems (DNSs) to point to destination systems
    4. Deactivate primary systems
  9. Validate the availability of service for the new system
  10. Update change request ticket in service desk system
  11. Update configuration management database (CMDB) with current status, view reports to verify that failover completed successfully
  12. Re-enable monitoring and clustering
  13. Notify users and stakeholders
  14. Declare DR event complete

 

There are also two automated sub-workflows built in at Step 6, for cloning the destination server, and Step 8, for the failover from source to destination:

 

Sub-workflow for cloning destination server.png

 

Fig. 2: Sub-workflow for cloning destination server

 

 sub-workflow for failover from source to destination.png

Fig. 3: Sub-workflow for failover from source to destination

 

To manually conduct such a complex disaster recovery procedure would clearly require a significant amount of time and resources — and chances are that your organization would not get around to testing its effectiveness as often as it should.

 

Automating and orchestrating a large number of disaster-recovery tasks drives down the costs of performing critical disaster-recovery planning. Furthermore, the procedures are more reliable and ready for an actual recovery event. Institutionalizing disaster-recovery procedures in orchestration workflows also helps to communicate and document the procedure, and reduces how much you must depend on specific individuals or groups.

 

 

Experience HP Operations Orchestration NOW!

The new HP Operations Orchestration Community Edition is a free download of the platform with out-of-the-box content packs for automating incident remediation. Designed for easy self-installation, you will be able to begin experiencing within two hours the power of IT process automation and IT operations orchestration.

 

 HP OO-CE download.png

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, tradition...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.