2021 Backup and Disaster Recovery
Checklist

A BDR Plan is Essential

All business IT systems and data are at risk from natural disasters like hurricane, flood or fire; technical errors (most common) like human error or equipment failure; and both physical and cyber-crime.

That’s why every organization should plan for the worst, and have a tested process to get themselves out of trouble as quickly and efficiently as possible, and be able to isolate and contain compromised systems.

In 2020, the global push to remote work completely altered the cyber-threat landscape. For most businesses, this calls for a full review of your Business Continuity and Disaster Recovery (BCDR) plans and resources.

Business Continuity Assurance

Stakeholders expect assurances from IT that their data and systems are protected from threats, and there is a recovery plan for any kind of disaster scenario.

Trouble is, disaster recovery is often out-of-site and out of mind. Never urgent enough to rise to the top of the to-do-list, unless you make time for it.

This results in BDR procedures not getting tested frequently enough, as your network and cloud resources evolve. Ultimately this leads to an emergency response fiasco that fall short of your business continuity objectives.

To avoid this MOHSO recommends you schedule BDR audits every 6-9 months, and allocate sufficient resources to see-thru changes, testing and verification.

Having properly documented strategies and specific tactics to mitigate the risks and consequences of unexpected events is at the heart of business continuity management, and it begins with talking to stakeholders about Risk Assessment and in particular the Impact Assessment for each IT system or service.

Assess the Risks

This involves identifying potential threats to your business or processes. Any threat that could harm day-to-day operation, should be evaluated – what is the likelihood of this threat manifesting.

Quantify the Impact

Now you need to do impact analysis to estimate the costs – not necessarily monetary – if any of the identified threats were to transpire.

Will the business lose revenue, and how much per hour / day of downtime. Will it be able to continue functioning, or will it be temporarily suspended? How much will it impact the business’ reputation?

Disaster Recovery Plan (DRP)

Recovery Objectives

Now with a good understanding of the risk and probable cost (from stakeholder’s perspective – not yours) of different threats, you can prioritize your attention on protecting the systems and resources that are most critical to business continuity, and work down the list from there.

You may not get to them all and that’s OK. The secret is to create a separate mini Disaster Recovery Plan for each critical system, application or resource, and eventually to roll-up all your mini plans to one master plan.

This mini-plan approach makes the task small enough that you can get it done in between other things. So, bit by bit you will get to review, update and optimize backup and recovery for every aspect of your IT infrastructure.

Same goes for testing and verification… when each server (or cluster) or application has its own plan, testing and verification is doable in a few hours.

Defining RPO and RTO

While Gartner claims network downtime costs businesses $5,600 per minute on average, truth is, the cost depends greatly on what data you are dealing with.

A Recovery Point Objective (RPO) defines the minimum acceptable point you need to get back to. Understanding that instant recovery and no loss of data is impossible, what can the business can tolerate?

For some applications you may tolerate 24 hours of lost data, provided it can be re-entered. For others it is hours or minutes, whereas for inactive unstructured data (such as compliance data, surveillance footage) you may be able to do without it for weeks, before access to it is needed again.

Hence the Recovery Time Objective (RTO) defines the maximum timeframe to successfully recovery, before this has additional consequences for the business that cannot be tolerated, e.g. impacting client services and customer trust.

It is important to realize that failure to meet the RTO likely leads to additional mitigation steps (such as PR damage control) being needed due to the additional impact of extended downtime.

Backup Technologies

Obviously the RPO and RTO established for each application, directly affects the technology choices and related costs of meeting those objectives. In some cases, you need real-time replication with instant fail-over. While for others, an incremental daily or weekly file-system to off-site backup, may suffice.

While businesses are increasingly moving to cloud backups due to short RTOs, there is growing concern about the rising cost of cloud storage.

Damage Control

Maybe your customers are affected in some way by delays or a temporary impediment to their business, or worse their confidential data is leaked. In either case you also need a plan to manage the damage and set expectations as to what you are doing about it. This may be as simple as courtesy call or it may require full-on PR campaign.

Whatever its extent, it must recognize the severity of the incident and make your customers feel heard and supported. In addition to the technical matters, it is crucial that you also have capabilities in place to call on the right people and create the right messaging to manage a crisis effectively.

DRP Repository

Don’t forget to put emergency response contacts, call trees and recovery procedures in a secure resource that any incident team member can access in a crisis, regardless the state of your own network. You’ll be surprised how many make this rookie mistake and get locked out of their own plans.

Talk to Stakeholders

When planning your backup policies, testing and verification, talk to stakeholders to fully consider the following criteria:

1. Criticality of the system, data, database and information

2. How the business functions or data usage will evolve

3. The likelihood of needing the data in an emergency

4. How quickly the system and data must be recovered

5. The data backup technology and methodology; and

6. The location of the backed-up system or data.

Plan for Tomorrow’s Data

Whenever you review your BDR capabilities and emergency response plan, you should also consider the future capacities and usage patterns, because these are evolving quite rapidly as businesses begin to exploit AI.

With businesses increasingly moving to cloud backups due to shorter RTOs, there is growing concern about the rising costs of enterprise cloud storage.

Consider these 2025 global data predictions from IDC:

• About 30% of all data will be created in real time.

• About 60% of data will be created by enterprises.

• Half of the world’s data will reside in the public cloud.

• Enterprise data storage needs will grow about 60% annually.

• Cost of cloud storage is only falling 5-10% annually.

These factors have an important bearing on the optimum backup approach for different classes of data, depending on the applicable RPO/RTOs. And this is crucial to avoid out of control costs. New backup technologies can expand your options with continuous data protection, instant recovery, data reduction and virtualization each with different benefits, costs or savings. The myriad choices make it all the more important to understand stakeholders’ usage patterns.

Nuances in requirements together with a large range of backup and storage solutions can make technology selection quite complex. If you truly want to maximize control and minimize cost, there is no one-size-fits-all solution.

DRP Testing is Critical

Your Disaster Recovery Plan is only as good as your last successful test. Business continuity testing isn’t about pass or fail. It’s about continuous improvement by learning from findings uncovered in a live exercise.

A well-orchestrated test strategy helps protect your brand and reduce risk.

• Identify gaps, and areas for improvement.

• Continually validate and improve plans.

• Satisfy compliance requirements and regulators.

•Reduce recovery time and cost.

Post Mortems

Following any incident, once things return to normal, always review what caused the incident. How might this be prevented in future? Also review the recovery procedures you actually followed and reexamine the steps that could have gone smoother. Was something missing, is there a better way?

 

Checklists

You can adapt these worksheets for the specific systems, applications and infrastructure in your environment.

We have a blank Exel checklists for you to download

 

  • Hidden
  • Hidden
  • Hidden
  • This field is for validation purposes and should be left unchanged.