How to Establish Effective Disaster Recovery Plans and Prevent Failures

Ernest Hamilton, Tech Times 24 September 2020, 02:09 pm

According to Ponemon Institute's 2019 Global State of Cybersecurity in Small and Medium-Sized Businesses, nearly half of SMEs do not have incident response plans. They are not prepared to deal with disruptions in the IT side of their businesses. Those that have plans, however, are not guaranteed an immediate return to regular operations.

Unfortunately, many companies and organizations still suffer from disruptions that affect their data or the overall functioning of their IT systems. The reasons for these failures can be any or a combination of the following.

Wrongly identified and poorly understood recovery dependencies

One of the most important aspects of disaster recovery planning is data backup and identification of restore points. A plan that does not match specific restore expectations can still lead to problems.

This is where the terms recovery point objective (RPO) and recovery time objective (RTO) come in. To optimize recovery plans, it is essential to define RPOs carefully. Proper determination usually takes into account the industry a company works on, mode of data storage, and compliance considerations.

It would be a mistake to restore server operations only in the beginning. This is particularly true for multi-tier applications, which operate in environments that involve different machines for data hosting, processing, management, and presentation functions. Uncoordinated backup schedules or the wrong boot order can mess with the communication between machines and the subsequent failure of the disaster recovery plan.

Additionally, disaster recovery plans may be riddled with configuration issues. For example, not properly allocating space for snapshots when setting up virtual server environments as backup targets can result in a failure.

To avoid problems attributable to recovery dependencies, it is vital to discuss the recovery plan with everyone involved and to thoroughly scrutinize the steps and requirements. It is also advisable to document critical dependencies such as app requirements and boot sequences to come up with clear references for discussion and troubleshooting.

Lack of disaster recovery plan testing

A State of Recovery report by disaster recovery solutions provider Zetta reveals that 40 percent of companies do not have documented disaster recovery plans and 40 percent of those that do only conduct tests once every year. "This study reveals that even as organizations improve their disaster readiness, they still fall behind in planning and testing their strategy," it reports.

Disaster recovery plans cannot be purely theoretical or dependent on protocols copied from other organizations or presented by a third-party provider. Many aspects of a plan can turn out to be different from what users preconceive.

The testing of data backup availability, for example, is different from virtual machine recoverability, which is also different from the testing of the specific restore points of entire systems. A lot of problems can be encountered along the way, including the emergence of situations that may have not been taken into account during the development of the plan.

Also, there may be changes in organizational protocols, IT infrastructure, or live systems that are not reflected in the current disaster recovery plan. If a considerable number of organizations fail to conduct regular tests, it is only logical to expect that many also fail to consider adjusting their plans when they implement changes in their networks or IT systems.

The solution for this lack of testing is obviously to conduct more testing. The right frequency depends on the industry an organization is in. How often an organization undergoes changes also matters.

Data corruption, malware, and software compatibility issues

Power outages, hardware failures, XFS and filesystem issues, and other technical problems can result in data corruption, which can ruin an otherwise working disaster recovery plan. Unfortunately, some do not consider data corruption as a threat to recovery, as they think it is part of the problem that the plan covers.

Similarly, some view malware infection as another challenge that can be overcome by going back to a restore point. Malicious software can be designed to remain hidden and lay dormant for a very long time, that they get embedded in the backups. Never underestimate the patience and persistence of cybercriminals. There have been actual ransomware attacks that targeted NAS devices and backup storage.

Moreover, disaster recovery plans should consider the possibility of non-recoverability because of software compatibility issues. One of the common sources of this problem is the Microsoft Shadow Copy Service (VSS). It can be affected by data conflicts or misconfigurations that prevent normal recovery.

To avoid data corruption and malware problems, the solution should be both hardware and software-based. Having an uninterruptible power supply (UPS) and regular equipment maintenance will be a must for businesses. On the software side, having a proven and regularly updated security system is compulsory. It also helps to prefer Linux-based backup and recovery technologies, since most malware tends to focus on Windows systems. When it comes to compatibility issues, consider backup recovery technologies that integrate self-healing systems.

The need for sensible planning

It is not enough to simply come up with a plan or to use a recovery system supplied by a vendor. Preferably, organizations need to spend time and effort to understand how the plan works.

Making sure that the plan works is not a matter of accountability for the party that developed the plan. Rather, it is common sense for organizations that seek to avoid damaging disruptions in their operations. Additionally, sensible planning should include testing based on industry best practices and a mindset that takes into account the possibility that hardware and software issues can destroy the plan itself.

Join the Discussion