What You Need to Know About Disaster Recovery Planning

Author: Anonym/Friday, October 18, 2013/Categories: Business Security Services

The term “disaster” gets tossed around rather loosely these days, but when it comes to IT operations it’s worth taking seriously. Major data loss can literally kill or cripple an organization. It can stop revenue flow and destroy customer trust. And, depending on the industry, it can create regulatory nightmares. Virtually every aspect of business—finance, planning, personnel, marketing, and manufacturing among others—depends on data access and communication, and as that dependence increases, so do the risks of data loss, even temporary. At the same time technology provides more and better means of protection.

One of the challenges in justifying disaster preparedness is the fact that it addresses the unpredictable. Man-made hazards include equipment failure, power loss, fire, or malicious attack. Then there are natural disasters like floods or tornadoes; we’ve seen plenty of those recently, and preparations can be costly. Business loves a nice neat ROI, and it can be difficult to calculate a meaningful return on investment in disaster preparedness. If no disaster occurs, the only real return is better sleep for system managers. On the other hand, if disaster does occur and the preparation works, the return can be almost huge.

Disaster recovery planning is an exercise in resource allocation, a multi-dimensional calculation involving likelihood of the event and its cost to the organization versus cost of recovery or mitigation. The calculation begins with the organization’s operations and IT’s place within them. How does IT’s disaster preparedness plan serve the organization’s business continuity plan or, to put it another way, when disaster strikes who’s first into the lifeboats? The allocation of resources is further complicated by the fact that it takes place both within IT and between IT and the rest of the organization. The biggest challenge for IT may be selling the value of preparedness to the rest of the company, especially when there are demands for more services today.

Measures for dealing with disaster fall into three categories: prevention, recognition, and correction. Prevention consists of the design decisions that proactively protect systems and data and can include:

·         RAID configuration of storage within data centers to eliminate data loss due to equipment failure

·         UPS and surge protection to protect against power data loss or equipment damage

·         Fire protection

·         And anti-virus protection throughout the system

Recognition consists of all measures to identify problems as they occur. To use an automotive analogy, not all data protection challenges are blowouts; some are slow leaks. Without effective monitoring, systems can degrade or data can be lost before you even realize there is a problem. This can occur both in active systems and in deep storage and can become progressively worse until the problem is addressed.

Correction is a range of responses including repair or replacement of lost systems and restoration of data, ideally, as quickly and completely as possible. There are a wide range of approaches to data and system protection.

·         You can back up to tape and ship the physical tape to a safe off-site location. Obviously this is for deep storage rather than for data needed quickly or on a regular basis.

·         You can copy data to off-site disk storage. This allows quicker access to data for recovery.

·         You can replicate data to an operational off-site data center, which can actually handle operations without the need to restore data to the primary site. This allows faster restoration of service but at a significantly higher cost. If you have multiple centers they can provide one another with reciprocal backup at lower incremental cost.

·         High availability (HA) system design is reserved for your most critical operations. This will typically entail high levels of redundancy in systems and communications and will minimize the need for human intervention to keep systems running in the event of failures. Such systems can be quite expensive, but may be justified when the costs of unavailability are even higher.

In highly simplified form, the steps in planning for disaster are:

1.       Define and prioritize business continuity requirements

2.       Determine recovery time objective (RTO), the time within which a business process must be restored, and recovery point objective (RPO), the age of files that must be recovered from storage.

3.       Decide how and where you will back up data to meet these goals.

4.       Identify and train staff to execute the plan.

5.       Document all aspects of the plan.

6.       Implement necessary monitoring to ensure that you know when problems occur.

7.       Review all aspects of the plan regularly. Company needs change, technology changes, and your implementation may require changes in the plan to continue to reach company goals, whatever they may be.

Print

Number of views (9420)/Comments (0)