Recently in Business Continuity Planning Category
After reading an article from the Disaster Resource Guide - Executive Issue (Volume 12, Issue 3), I wanted to comment on one of the articles in the guide. The article, "Business Continuity Planning and Enterprise Risk Management", by John R Phelps, gave an account of how Blue Cross and Blue Shield of Florida, Inc is allowing Business Continuity Management (BCM) and Enterprise Risk Management (ERM) programs to collaborate and integrate together.
Usually, Risk Management (RM) and Business Continuity (BC) plan in silos, meaning they work with business units towards a goal, however, the goal for BC and RM may not a common goal. This causes frustration and long term planning gaps for the business units because they do not see how Risk Management and Business Continuity can work together to make their plans more viable, usable and realistic.
Risk Management assists business units in identifying the vulnerabilities to the business processes. Like Business Continuity, Risk Management does not own the business process, nor the vulnerabilities. They are just responsible for assisting the business unit in identifying them and providing information on how to mitigate them within acceptable means. The business units are still responsible for the continuity of critical processes and mitigating potential risks.
It makes sense that Business Continuity and Disaster Recovery are components of Risk Management. By using information from the risk analysis, recovery plans can be written for critical applications, services or systems. These critical processes can be determined by completing a Business Impact Analysis (BIA), but the BIA will only help determine the impact of an outage not the likelihood of it. So a Risk Assessment should be completed by the business units. The Risk Assessment helps identify the local vulnerabilities, but an Enterprise Risk Management analysis should be able to see the big picture and determine what the likelihood of an event occurring would be and how that may affect various business processes.
Within the article, John Phelps refers to three models of how BCM and ERM can work together:
1. Having a central management for both BCM and ERM. This is the model Blue Cross and Blue Shields of Florida, Inc. uses.
2. Create a shared responsibility with BCM and integrate the functionality into ERM.
3. Maintain BCM and ERM programs in separate silos. This is the model Penn State uses. According to John Phelps, this model is the least effective and efficient.
How can we move in this direction? All groups working on any type of recovery at Penn State are meeting on a regular basis to ensure we all understand the objectives of the type of recovery we are working towards. At some point, it would be beneficial for Penn State to integrate some of this recovery planning effort so that we are all working towards the same goal to ensure Penn State will be able to instruct students, provide research and outreach. "We" (all Penn State employees, faculty and staff) need to ensure we will continue Penn State's mission in the event an outage occurs which could risk that mission and still maintain a safe environment for all. That is what we are all working towards and hopefully we will begin to align our efforts so that we can accomplish that.
Last month, several of us from Administrative Information Services (AIS) attended the Disaster Recovery Journal (DRJ) Spring World Conference in Orlando. I have had the opportunity to attend this conference for the past several years, and thought I would share some of the information we were able to bring back.
First, I would like to mention that the conference draws about 1300 business continuity professionals from various industries. Over the past few years, I expected the number of higher education institutions to attend this event to increase, especially with all the recent events in the news. I was disappointed again this year, that the number of higher education attendees was a low turnout. Here's hoping to a better year next year!
Our group had an opportunity to sit in many great sessions. We sat in on topics that included IT strategies for recovery, BIA, Risk Assessment, Risk Management and Global Warming - yes, a wonderful speaker talked about how business continuity professionals will need to deal with risks and vulnerabilities that will be/and is caused by global warming.
The greatest piece of knowledge I was able to bring back from the conference is that the process we are developing at Penn State for business continuity planning is inline with the process that large corporations are using. It provided validation to the process we are rolling out across the University, which gives our team more confidence in pushing ahead and continue to make progress.
It was interesting to hear that corporations still struggle with the issues of getting executive support for business continuity planning. Though, it is getting easier with all the events that have occurred across the world, executive management still does not see the value in using the plans for more than just recovery. They have a hard time using this information for strategic planning and understanding the breadth of their operations. Some of the corporations that are successful in planning, have the executive support and they seem to understand the importance of not only having these plans, but actually using them to their fullest potential.
My final observation is that, corporations that believe in the process give authority to the business continuity professionals. They create a Business Continuity Office, which has the responsibility and authority to oversee all the planning efforts for the Enterprise. As Penn State matures in Business Continuity Planning, the process will be more accepted into the culture. How can Penn State ensure that critical functions will be there for students/faculty/staff will be there when they need them? Planning, planning, planning....
As part of the training curiculum at Penn State for departmental Business Continuity Planning, we encourage participants to attend our table-top exercise. In this exercise, we break the training class into 4 groups. Three groups are part of the Information Technology (IT) group and one is a business unit that relies on IT.
Throughout the training, the group plays roles in a fictitous college and need to make decisions as a power outage shuts down some of their critical applications that the business unit needs. Some groups have recovery plans to refer to for decision-making, but some groups do not have plans. This makes it interesting, as the groups need to make decisions ad hoc.
We have presented this training appoximately 5 times and every time I am fascinated by the different points of view each group brings to the same training. It makes this table-top exercise enjoyable to run, since every group focuses on different aspects of the scenario, but some of the most notable "lessons learned" are what I wanted to share, since they are universal for any instiution in any event.
Lessons Learned
- All business units should have plans. It is important for IT to have disaster recovery plans, but that doesn't mean that the business units do not need to plan to continue their critical services.
- Communication, communication, communication.... it is the first thing that breaks down when an event occurs. Sometimes we forget to keep key areas in the loop as decisions that impact them are being made.
- If your unit cannot make decisions during an event, someone else will. Sometimes, these decisions are not best for our area because the individuals making them are not familiar with your operations.
- We have the potential to create our own disasters. By making poor decisions on how the situation is handled, we can escalate the event into a full-blown disaster rather then letting the event play out. The situation needs to be evaluted and the best course of action needs to be determined.
- The recovery strategies for IT and critical services should be realistic with the Recovery Time Objective* (RTO). * The RTO is the length of time critical services/systems must be recovered after an outage.
- Make sure your employees know what to say when approached by a reporter. Make sure they point them to single point of contact for all external communication.
- Sometimes, key indivudals who must make critical decisions will not be available during the event. Have a succession plan that is able to deal with key individuals not being available.
These are just a few of the lessons learned from our scenario. It is fun to run both IT and business units through this scenario. Once these departments who have participated in this training have plans for their own critical services, they will be able to run execrises on their own plans. Hopefully, they will remember these lessons learned when writing their own plans.
People are the most valuable asset any organization has. In planning, sometimes, we forget to include the importance of people and focus on the Information Technology (IT) or the business functions. These can't happen if the right people with the right skill sets aren't available.
After reading a white paper from IBM Global Services, In the spotlight: the human side of business continuity planning, I thought it would be important to pull out some of the important parts of this paper.
The paper discusses human capital resiliency and defines it as "the organization's ability to respond and adapt rapidly to threats posed to its workforce." The people that make up the organization know the critical systems that run the business. They know the customers and stakeholders who would be impacted by any outage and they know how to communicate with each other under normal circumstances.
When an event occurs, it may be anything but normal for individuals involved directly and indirectly. Tasks that we accomplish with ease today can become a struggle. Decisions that would be easy to make can become complex. It's important to have policies and procedures in place, which all employees know and they understand what to do during an event. These should be exercised and don't wait until an event occurs to put them into action. These policies include sick time, paid leave, flex time, childcare, elder care, etc.
Communicating with your employees during an event is also critical. Make sure the message that the employees receive about the event's response and recovery is clear and concise. Allow employees to communicate via multiple communication vehicles like voice, intra-net, email, etc. "Adopting a virtual working environment requires that your company address specific technology and communication requirements, including providing remote access and support, on line tools and collaborative workspace."
From the white paper, there are questions we need to ask ourselves; "Are we prepared?"
- Does our organization have critical policies identified and alternatives designed specifically for use during a crisis?
- How will our employees receive critical information in the event of a crisis?
- How will employees communicate with colleagues to keep the business running?
- Have we provided the right preparatory advice to employees in the event of a crisis? Is it kept up to date?
- Are we able to provide immediate support to our employees and their families if a crisis would occur? What kind of support would we provide?
- How is critical job training being rolled out so that personnel gaps can be filled or capabilities outsourced to business partners at a moment's notice?
- Do we have short- and long-term succession plans for critical management and operational roles?
- How should our resource plans and sourcing strategies change to accommodate crisis?
- What plans are in place to provide critical services?
- What components of our organization's culture do we believe will support or hinder individuals in the event of a crisis?
If you are an IT leader, ask yourselves these questions as well:
- In the event of a disaster, will our company be able to keep critical communications systems up and running?
- What can we do to establish and optimize virtual infrastructure, so employees can work effectively in remote locations if required?
- Do our third-party providers have business continuity plans to ensure that critical systems, reporting and processes can operate during a crisis?
- Can our company provide crisis response materials and training on demand?
By focusing on the people as well as the critical services and infrastructure needed to run the business, our plans become more robust and more useful. It brings the planning back to the people, who will ultimately be the ones ensuring the recovery plans are followed in the first place.
A Risk Assessment is a survey which assists in analyzing potential threats, determining the impacts of those threats and identifying controls that are in place to minimize the impact of the threat.
When we work with some of the departments/campuses at the University on the risk assessment survey for their area, it's hard for them to see the real benefit of this assessment. This tool can be used for multiple purposes. The Risk Assessment:- provides insight to which threat is most probable and will have the largest impact on in areas such as employees, facilities and operations of critical services.
- presents a "gap analysis" of what mitigation controls exists vs. the lack of mitigation controls
- assists in defining which of these threats should be used in table top exercise scenarios for recovery plans.
- presents a "gap analysis" of what mitigation controls exists vs. the lack of mitigation controls
By using information provided by our insurance company, departments and campuses at Penn State are able to determine the impact based on the probability and severity of both environmental and man-made threats.
Environmental threats include anything that is produced by a force of nature and include flood, ice storms, earthquakes, etc. Man-made threats are self-explanatory and examples are fire in buildings, civil unrest, chemical spill, etc.
After the impact of threats are determined and identification of mitigation controls is complete, the most challenging piece is how to use this information in business continuity planning. Management needs to determine how they will handle each of the threats. Management should use the information gathered to:- List the threats in greatest probability and impact order.
- Determine what the risks would be if they chose to do nothing to mitigate or lessen the threat or impact
- Determine how to deal with the threat
- Mitigation - implement a procedure or infrastructure that lessens the probability or impact of a threat
- Avoidance - stop performing an activity or operation that carries the threat
- Acceptance - accept the risk and do nothing to mitigate the threat or lessen the impact
- Transference - transfer the risk to another group
- Avoidance - stop performing an activity or operation that carries the threat
Perform cost benefit analysis to ensure the decision makes sense from a business standpoint
- Determine what the risks would be if they chose to do nothing to mitigate or lessen the threat or impact
If mitigation controls already exist, they should be analyzed. Both from the standpoint of how does the mitigation control that exists today helps in minimizing the possibility of the threat or lessens the impact and identify if any new mitigation controls could be utilized which could be more efficient or more economical. If the mitigation control costs money (depending on the amount), a cost benefit analysis should be performed to determine whether implementing the new change is worthwhile.
At Penn State, we encourage departments and campuses to complete the Risk Assessment at the same time as the Business Impact Analysis (BIA). It is important to identify the critical business functions the University provides using the BIA and identifying what could cause an outage of those business functions. We use a standard survey for all units going through the business continuity planning process, but the probability of environmental and man-made threats vary based on the location of the participant.
Once the Risk Assessment and BIA are complete, the management team from the department and campus will analyze the results to make recommendations for recovery strategies in Phase 3 of our planning process.
Though the title of this submission may not sound like a bedtime story you may read to your children, there is a moral to this story. In my last submission, I explained how important it is to complete a Business Impact Analysis (BIA) for the organization. I listed a number of reasons why this is important, but the most important reason gets it's own blog submission, this one!
So what is an RTO you may ask? According to DRII and DRJ glossary, the RTO is...
RECOVERY TIME OBJECTIVE (RTO): The period of time within which systems, applications, or functions must be recovered after an outage (e.g. one business day). RTO’s are often used as the basis for the development of recovery strategies, and as a determinant as to whether or not to implement the recovery strategies during a disaster situation.
The RTO is a powerful number that should NOT be randomly chosen. By completing the BIA, you should be able to determine what critical functions your area is responsible for. Through the analysis, you should be able to determine within a reasonable timeframe how long a critical function could be down without negatively impacting your customers.
Most units want to skip this part. Just like the prioritization the critical services, they want to pick a number for the RTOtime frameprioritizing that they think would make their management happy. By choosing RTO's instead of really understanding the critical function, we create false criteria in which recovery plans are built on. Recovery plans have no hope of working because everyone knows they cannot possible hit the targeted RTO stated in the plan. In knowing this, the recovery teams do not take the recovery plans seriously. Like "The Boy Who Cried Wolf", how will you know when your recovery plan is telling the truth or just making it up?
How can this all be prevented? Taking the BIA seriously and understanding what it is to be used for. The BIA is more than a silly survey that slows you down in the planning process. Like "The Tortoise and the Hare", slow and steady DOES win the race. In this case, the race is to create a recovery plan that is actually usable and practical.
Here are some critical decision that use the RTO
- The strategies you need to implement
- The cost of implementing those strategies
- When should you declare a disaster or wait out the event
Don't let your Recovery Time Objectives (RTO) be a "Wolf in Sheep's Clothing". Complete the BIA and analyze the results for a better understanding of the services you provide. The moral of this story is: will your plans be built out of straw, sticks or bricks and can they withstand the huffing and puffing of an outage?
What is a BIA?
According to DRII and DRJ glossary of terms, a BIA is: A process designed to prioritize business functions by assessing the potential quantitative (financial) and qualitative (non-financial) impact that might result if an organization was to experience a business continuity event.
Penn State has adopted a 5-phase approach to business continuity planning. See the Recovery Planning Process Flow Chart for the 5-phase process. The BIA is completed in Phase 2 of the process. We provide departments and campuses with a survey that helps them identify the critical services they run within their area. They need to judge what the impact to Penn State would be both financially and operationally if that service is not operational for a specified period of time. Peak periods of operation would need to be identified for each critical service. For example, students registering for classes would be a higher priority service at the beginning of the semester than the ability to post grades. However, posting grades would take priority at the end of the semester when registering for classes would be lower priority.
A BIA is designed to help departments and campuses understand their "business" better and help them prioritize their critical services. It will also help determine which services need funding for recovery strategies.
Why is the BIA misunderstood?
If the BIA provides a lot of answers, then why is it the least understood. Most units want to skip this step primarily because they do not see the "value" or IT units do not want the business units to get involved.
Most units feel they already understand their business and in most cases they are correct, but trying to prioritize services based on intuition does not provide concrete reasoning for decision-making. Nor can intuition explain how decisions were soundly justified after an event occurs and hindsight comes into play. Knowing up front what impacts would occur and what they would be, gives units the authority to make decisions that have the University's outcome in the best interest.
By understanding the priorities of the business units, IT can adjust their priorities and strategies to fit the business needs. Also, cost can be evaluated based off of the time needed for recovery of the technology which supports the business requirements. Typically, the faster you need technology the higher the cost to recover and vice versa. If you need technology back fast, the BIA will provide the data that supports the reasoning for spending the money to recover quickly.
How can Administrative Information Services (AIS) help?
We work with departments and campuses to show them the how critical the the BIA is. We also provide an easy method of collecting this information. We provide a 20 question survey to managers within the departments and ask them to complete it. This should take no longer than 4 hours to complete and is web-based. Penn State uses Strohl Systems BIA Professional to create the survey, collect the data and analyze it. We hope that in making this process as "easy" as possible, units will start to see the benefit and will want the survey to assist them on many levels, from short term decision- making to long term strategy.
Next week: BIA and the RTO!
In every book I read regarding business continuity planning, they always mention that the corporate culture needs to change to embrace business continuity planning in their day-to-day work environment. It makes perfect sense why this is necessary, however, it is also the most difficult thing to do.
There are many reasons why changing a culture is difficult to do, even when people know the change is the right thing to do. We need to move away from the cultural barriers we have created. We must remove the fear and distrust we all have for each other, move forward from holding onto the past and refrain from playing the blame game.
To change the culture across a University requires leaders in all levels of management and staff to make business continuity planning a priority. Communication across university departments is extremely critical, which means politics and past negative relationships need to be set aside for the good of the University as whole. It is easy to forget why we come to work every day because we get caught up working only in our silos. At Penn State, we need to keep the Penn State's mission in mind. Our mission is to improve people's lives through teaching, research and service.
In order to successfully embed business continuity planning into the culture, you need to make sure these pieces are in place:
1. Executive-level management MUST believe business continuity planning is critical. The University should continue providing critical functions in the event of an outage. Adjustments would be made on recovery efforts based on the severity of the event.
2. Policies and procedures for business continuity planning MUST be in place for departments to follow. There must be standards for recovery plans to ensure compatibility with other unit's plans or communications and dependencies across organizations will not work together during an event.
3. Awareness of high-level business continuity plans for the University should be shared with all employees including stakeholders (students, parents, etc). Details do not need to be shared, but an overall understanding of how the University will continue operations is critical for building TRUST and CONFIDENCE. Employees will feel the importance of building their own plan for their own business unit's critical services and contributing to something on a larger scale.
4. Show full support that business continuity planning is a University-wide responsibility across all departments and NOT just an IT function. IT may support business services, but these critical services need to plan for work around procedures.
5. The University needs to believe that business continuity is NOT a project, but an on-going process that requires everyone to play a part in. The success of business continuity planning rests with the leaders of the University.
When we go out to talk with departments and campuses, we hear the terms business continuity and disaster recovery used interchangeably. We are trying to raise awareness that these terms are NOT the same.
Using industry standard terminology and definitions, we use the Disaster Recovery Journal's glossary for both Business Continuity and Disaster Recovery.- BUSINESS CONTINUITY: The ability of an organization to provide service and support for its customers and to maintain its viability before, during, and after a business continuity event.
DISASTER RECOVERY: The technical component of business continuity planning
Penn State's Business Continuity Plans include both Service Recovery Plans, plans that recover critical university functions and Disaster Recovery Plans, the plans that recover the technology functions for these critical services.
The priority of Services at the university is determined from the Business Impact Analysis. Once the services are identified and prioritized, the technology is identified which supports these services.
In the past, most people focused on disaster recovery plans. They felt if they had plans for the technology, they didn't need anything else. Recent events, however, proved that kind of thinking has many drawbacks. Technology is just a piece of the overall plan. Communicating with your stakeholders, having work around procedures and creating policies for employees to follow during an outage are critical to sustain university business. In a world of 24/7 operations, our customers expect minimal inconvenience during an unanticipated outage.
The purpose of this blog is to share information about Penn State planning efforts in the area of business continuity.
Although my expertise focuses on business continuity, I will from time to time, post information about other aspects of planning as I come across it. Some of these areas will include emergency notification, crisis communication and crisis management.
Please feel free to comment on any of the information. This tool is for information sharing so I would like to read your thoughts and comments in these areas as I am sure other readers would too.
Please visit our webiste for more information:
http://ais.its.psu.edu/disaster_recovery/index.html
