Recently in Disaster Recovery Planning Category
After reading an article from the Disaster Resource Guide - Executive Issue (Volume 12, Issue 3), I wanted to comment on one of the articles in the guide. The article, "Business Continuity Planning and Enterprise Risk Management", by John R Phelps, gave an account of how Blue Cross and Blue Shield of Florida, Inc is allowing Business Continuity Management (BCM) and Enterprise Risk Management (ERM) programs to collaborate and integrate together.
Usually, Risk Management (RM) and Business Continuity (BC) plan in silos, meaning they work with business units towards a goal, however, the goal for BC and RM may not a common goal. This causes frustration and long term planning gaps for the business units because they do not see how Risk Management and Business Continuity can work together to make their plans more viable, usable and realistic.
Risk Management assists business units in identifying the vulnerabilities to the business processes. Like Business Continuity, Risk Management does not own the business process, nor the vulnerabilities. They are just responsible for assisting the business unit in identifying them and providing information on how to mitigate them within acceptable means. The business units are still responsible for the continuity of critical processes and mitigating potential risks.
It makes sense that Business Continuity and Disaster Recovery are components of Risk Management. By using information from the risk analysis, recovery plans can be written for critical applications, services or systems. These critical processes can be determined by completing a Business Impact Analysis (BIA), but the BIA will only help determine the impact of an outage not the likelihood of it. So a Risk Assessment should be completed by the business units. The Risk Assessment helps identify the local vulnerabilities, but an Enterprise Risk Management analysis should be able to see the big picture and determine what the likelihood of an event occurring would be and how that may affect various business processes.
Within the article, John Phelps refers to three models of how BCM and ERM can work together:
1. Having a central management for both BCM and ERM. This is the model Blue Cross and Blue Shields of Florida, Inc. uses.
2. Create a shared responsibility with BCM and integrate the functionality into ERM.
3. Maintain BCM and ERM programs in separate silos. This is the model Penn State uses. According to John Phelps, this model is the least effective and efficient.
How can we move in this direction? All groups working on any type of recovery at Penn State are meeting on a regular basis to ensure we all understand the objectives of the type of recovery we are working towards. At some point, it would be beneficial for Penn State to integrate some of this recovery planning effort so that we are all working towards the same goal to ensure Penn State will be able to instruct students, provide research and outreach. "We" (all Penn State employees, faculty and staff) need to ensure we will continue Penn State's mission in the event an outage occurs which could risk that mission and still maintain a safe environment for all. That is what we are all working towards and hopefully we will begin to align our efforts so that we can accomplish that.
As part of the training curiculum at Penn State for departmental Business Continuity Planning, we encourage participants to attend our table-top exercise. In this exercise, we break the training class into 4 groups. Three groups are part of the Information Technology (IT) group and one is a business unit that relies on IT.
Throughout the training, the group plays roles in a fictitous college and need to make decisions as a power outage shuts down some of their critical applications that the business unit needs. Some groups have recovery plans to refer to for decision-making, but some groups do not have plans. This makes it interesting, as the groups need to make decisions ad hoc.
We have presented this training appoximately 5 times and every time I am fascinated by the different points of view each group brings to the same training. It makes this table-top exercise enjoyable to run, since every group focuses on different aspects of the scenario, but some of the most notable "lessons learned" are what I wanted to share, since they are universal for any instiution in any event.
Lessons Learned
- All business units should have plans. It is important for IT to have disaster recovery plans, but that doesn't mean that the business units do not need to plan to continue their critical services.
- Communication, communication, communication.... it is the first thing that breaks down when an event occurs. Sometimes we forget to keep key areas in the loop as decisions that impact them are being made.
- If your unit cannot make decisions during an event, someone else will. Sometimes, these decisions are not best for our area because the individuals making them are not familiar with your operations.
- We have the potential to create our own disasters. By making poor decisions on how the situation is handled, we can escalate the event into a full-blown disaster rather then letting the event play out. The situation needs to be evaluted and the best course of action needs to be determined.
- The recovery strategies for IT and critical services should be realistic with the Recovery Time Objective* (RTO). * The RTO is the length of time critical services/systems must be recovered after an outage.
- Make sure your employees know what to say when approached by a reporter. Make sure they point them to single point of contact for all external communication.
- Sometimes, key indivudals who must make critical decisions will not be available during the event. Have a succession plan that is able to deal with key individuals not being available.
These are just a few of the lessons learned from our scenario. It is fun to run both IT and business units through this scenario. Once these departments who have participated in this training have plans for their own critical services, they will be able to run execrises on their own plans. Hopefully, they will remember these lessons learned when writing their own plans.
What is a BIA?
According to DRII and DRJ glossary of terms, a BIA is: A process designed to prioritize business functions by assessing the potential quantitative (financial) and qualitative (non-financial) impact that might result if an organization was to experience a business continuity event.
Penn State has adopted a 5-phase approach to business continuity planning. See the Recovery Planning Process Flow Chart for the 5-phase process. The BIA is completed in Phase 2 of the process. We provide departments and campuses with a survey that helps them identify the critical services they run within their area. They need to judge what the impact to Penn State would be both financially and operationally if that service is not operational for a specified period of time. Peak periods of operation would need to be identified for each critical service. For example, students registering for classes would be a higher priority service at the beginning of the semester than the ability to post grades. However, posting grades would take priority at the end of the semester when registering for classes would be lower priority.
A BIA is designed to help departments and campuses understand their "business" better and help them prioritize their critical services. It will also help determine which services need funding for recovery strategies.
Why is the BIA misunderstood?
If the BIA provides a lot of answers, then why is it the least understood. Most units want to skip this step primarily because they do not see the "value" or IT units do not want the business units to get involved.
Most units feel they already understand their business and in most cases they are correct, but trying to prioritize services based on intuition does not provide concrete reasoning for decision-making. Nor can intuition explain how decisions were soundly justified after an event occurs and hindsight comes into play. Knowing up front what impacts would occur and what they would be, gives units the authority to make decisions that have the University's outcome in the best interest.
By understanding the priorities of the business units, IT can adjust their priorities and strategies to fit the business needs. Also, cost can be evaluated based off of the time needed for recovery of the technology which supports the business requirements. Typically, the faster you need technology the higher the cost to recover and vice versa. If you need technology back fast, the BIA will provide the data that supports the reasoning for spending the money to recover quickly.
How can Administrative Information Services (AIS) help?
We work with departments and campuses to show them the how critical the the BIA is. We also provide an easy method of collecting this information. We provide a 20 question survey to managers within the departments and ask them to complete it. This should take no longer than 4 hours to complete and is web-based. Penn State uses Strohl Systems BIA Professional to create the survey, collect the data and analyze it. We hope that in making this process as "easy" as possible, units will start to see the benefit and will want the survey to assist them on many levels, from short term decision- making to long term strategy.
Next week: BIA and the RTO!
When we go out to talk with departments and campuses, we hear the terms business continuity and disaster recovery used interchangeably. We are trying to raise awareness that these terms are NOT the same.
Using industry standard terminology and definitions, we use the Disaster Recovery Journal's glossary for both Business Continuity and Disaster Recovery.- BUSINESS CONTINUITY: The ability of an organization to provide service and support for its customers and to maintain its viability before, during, and after a business continuity event.
DISASTER RECOVERY: The technical component of business continuity planning
Penn State's Business Continuity Plans include both Service Recovery Plans, plans that recover critical university functions and Disaster Recovery Plans, the plans that recover the technology functions for these critical services.
The priority of Services at the university is determined from the Business Impact Analysis. Once the services are identified and prioritized, the technology is identified which supports these services.
In the past, most people focused on disaster recovery plans. They felt if they had plans for the technology, they didn't need anything else. Recent events, however, proved that kind of thinking has many drawbacks. Technology is just a piece of the overall plan. Communicating with your stakeholders, having work around procedures and creating policies for employees to follow during an outage are critical to sustain university business. In a world of 24/7 operations, our customers expect minimal inconvenience during an unanticipated outage.
