Do you know the difference between a BCP and a DRP?
I recently received the following email from a reader. Other people might like the answer, so here’s the question and the answer:
I have read through the section in the book a couple times and tried to research online and this seems to be an area I am struggling with and I am planning to test in a couple of days and want to be fully prepared. Thank you for your time and for the great book you wrote on Security+.
You’re Ready
First, your level of understanding is probably good enough for the Security+ exam, but no problem helping you to clarify it. I recently completed a chapter on the updated SSCP: Systems Security Certified Practitioner Study Guide and this level of knowledge is needed for the SSCP. That said, after you complete the Security+ exam, you’ll be well prepared for the SSCP.
Here are some key points you have that are accurate:
- It seems like Business Impact Analysis (BIA) is a part of Business Continuity Planning
Yes. - A BIA is used to identify critical systems so you know which systems to restore first.
Yes
Overview
Part of the challenge is that many people combine a business continuity plan (BCP) and a disaster recovery plan (DRP) as though they are a single document. However, they are different. Here are some key points:
- In short, the BCP has a wide scope and helps an organization continue to operate even if disaster occurs.
- The BIA is part of the BCP and identifies critical systems and services.
- You then create DRPs to ensure you have methods/procedures/processes to restore these critical systems in the event of the disaster. Y
Start with a BCP
As an example, imagine an organization doesn’t have a BCP, BIA, or DRPs. They hire a business continuity expert to help them develop a BCP.
The BCP Requires a BIA
One of the first things the expert completes is the BIA to identify the critical systems and services.
Let’s say the BIA identifies an ecommerce web site as a critical system. The BIA then identifies the underlying functions and services for the web site. They might include a web server, a back-end database hosted on an internal database server, Internet access, and network infrastructure providing connectivity and including a DMZ.
Create DRPs for Critical Functions and Services
Now that you’ve identified a critical service and the underlying critical functions and systems, how do you plan for a critical outage of this ecommerce server?
You need to create one or more DRPs. Here’s one way.
- Create a DRP that allows someone to restore the website server after a catastrophic failure of the server. It will include the detailed steps that a technician can use to restore it.
- Create another DRP that allows someone to restore the database server after a catastrophic failure of that server. It will include the detailed steps that a technician can use to restore it.
- Create another DRP that allows someone to restore the firewalls to recreate the DMZ after a catastrophic failure of the firewalls. It will include the detailed steps that a technician can use to restore them.
It’s also possible to create another overriding DRP that identifies how to restore the full functionality of the web server after a catastrophic failure took out all the components.
Could an organization create DRPs without a BCP? Yes, but they might be misguided. If the organization hasn’t taken the time to identify what services are critical, they might end up creating DRPs for non-critical systems. Worse, they might not create DRPs for critical systems.
RTO
What is an acceptable timeframe to restore these services? Ten minutes? Ten days?
The BIA identifies the maximum acceptable outage time. Imagine it is 60 minutes for the web site. If so, the recovery time objective (RTO) is 60 minutes. This means that the DRPs for these components must be able to restore these critical services and functions within 60 minutes.
The RTO also drives the implementation of other security controls to prevent an outage. For example, implementing RAID subsystems and server clusters can prevent an outage even if some individual components fail.
Does that mean you should automatically install RAIDs in all your systems? No. You would only install them on systems identified as critical in the BIA.
RPO
Recovery point objective (RPO) looks close to RTO, but it isn’t. It’s primarily focused on databases.
Consider the online webserver with the backend database. How much data can you afford to lose on the backend database? Ten minutes? Zero minutes? The BIA identifies this and this is your RPO. If the RPO is ten minutes, you can afford to lose up to ten minutes of data.
You then implement methods to ensure that you can recover the database within that timeframe. If the RPO is zero, you must be able to restore the data up to the moment of failure. This is expensive, but if the BIA determines the RPO is zero, the cost is justified.
Alternate Locations
The BCP also considers catastrophic events such as fires, floods, hurricanes, tornadoes, and earthquakes.
Is it acceptable to shut down all functions and services after one of these events?
If not, what are the critical functions that need to continue to operate? How much time can you take to restore these services at an alternate location? How much money are you willing to spend on the alternate location?
These decisions help you decide what type of alternate location to use such as a hot, warm, cold, or mobile site.
DRPs might come into play here, too. If you need to designate an alternate location, what services will be relocated there. What are the steps to get these services up and running. Documenting this data within a DRP to be used when activating an alternate location helps ensure that the process is as seamless as possible.
RTO and RPO are mixed up in this article.
RTO is maximum time allowed to restore the system (if the system must be restored within 1 hour after failure, RTO is 1h)
RPO is point to which system must be restored. If RPO for database server is 1 day, than database server must make (at least) daily backups, so that if the server fails, we do not lose more than 1 day of data.
Thanks. You’re correct. I must have typed that part of the post from memory without verifying it
I pulled up my source document and verified that I swapped them in this post. The study guide is accurate and I just fixed this post.
Thanks again.