IT Disaster Recovery Methodology and Guidelines
These guidelines define acceptable methods for disaster recovery development, planning, preparedness, management, mitigation and maintenance of Information Technology (IT) systems and services at Bryant University. The guidelines address the key elements of a disaster recovery framework for the systems and services managed by Information Services (IS).
For the Disaster Recovery Plan to be viable and effective, it is critical that a consistent methodology be followed by all groups within IS— from identifying applications used throughout the university, to reviewing and testing the plan on a regular basis. As infrastructure updates and new software applications are implemented, the plan must be revised to reflect these innovations. There also needs to be consensus from the management teams on both the critical business definitions and the role that management will play in integrating, coordinating and executing the overall plan to mitigate the unexpected/unforeseen disruption to normal operations.
Disaster Recovery planning is a program that has a continuous lifecycle. The high-level process for DR Lifecycle is as follows:
- All IS managed systems must comply with these guidelines.
- All IS managers are responsible for developing plans specific to their IT domains in compliance with these guidelines.
- All IS managers will review and update DR plans as necessary, at least once a year. All modifications must be approved by IS senior management.
- IS senior management is responsible for ensuring DR development, planning, coordination, and testing.
- IS senior management is responsible for ensuring sufficient financial, personnel and other resources are available as needed.
- IS senior management will review DR plans and guidelines as necessary, at least once a year.
Guidelines and Recommendations
Plan Development Responsibilities
IT managers must develop disaster recovery plans to ensure that they have a documented, detailed, tested blueprint for directing the IT recovery process in the event of a man-made or natural disaster, within their domain of responsibility. IT managers must consider key elements in their planning including network requirements, infrastructure needs, data recovery, data and record management, security and compliance. In general:
- IT managers must maintain a single, comprehensive electronic inventory of all servers, network equipment, relevant configuration, and model information, and the applications they support. This inventory should be aligned with the centralized CMDB.
- All backup/recovery data/media must be identified, logged, and available for use during an emergency within stated recovery time objectives.
- DR plans should be stored in a single, comprehensive electronic repository with paper-based copy stored in a secure location.
- DR plan owners need to be able to access a copy of emergency and recovery plan(s) independent of IT services and/or the network.
- Upon completion or update, DR plans should be sent to the Disaster Recovery Manager and the IT Change Process committee for review.
- Plan information should be reviewed and updated as warranted by business and/or information systems environment changes, at least annually.
IT managers need to identify critical applications and data recovery objectives. Plans must contain Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). The plan needs to be architected such that specific strategies and solutions ensure that recovery objectives for applications, network and data meet the appropriate timeframes. Meeting these recovery objectives may involve deploying different architectures, tools, and infrastructure, internally or externally, with the assistance of an external service provider.
Plans should account for services, systems, and assets - of critical business processes. These IT services, systems, and assets must be inventoried, prioritized and ranked based upon their criticality and impact to the business organization.
The disaster recovery methodology should be followed in the event the data center, network, systems, or applications experience a significant interruption in service that has resulted from unexpected/unforeseen circumstances and requires recovery efforts in excess of what is experienced on a normal day-to-day basis or a short-term unplanned interruption.
If the disruption is a result of a catastrophic event (such as fire, weather, earthquake, etc.), the university would likely invoke a university-wide emergency management response and override the protocol described here. The parties responsible for managing the overriding incident will determine if and when it is safe for Information Technology (IT) personnel to report to campus to address the IT disruption. The safety of IT personnel is of prime importance and the safeguard of such will supersede concerns specific to hardware, software and other recovery needs.
Plan Execution, Coordination and Control
The Manager of Infrastructure Services has the authority to make decisions regarding the use of university resources and will supervise technical operations and serve as the Disaster Recovery Manager once the plan is invoked. In the absence of the Manager of Infrastructure Services, the Vice President for Information Services or designee will assume the Disaster Recovery Manager role and will supervise operations. Once the Manager of Infrastructure Services, or assigned designee responsible for supervising operations, is alerted to or determines a significant interruption in service has resulted from unexpected/unforeseen circumstances and requires recovery efforts in excess of what is experienced on a normal day-to-day basis, the DR plan will be activated. Recovery teams will proceed within their areas of responsibility with primary emphasis on assessing and returning their services to a normalized (secure) state as quickly as possible, while minimizing the adverse impact to the University.
The Recovery Teams specific to each IT domain will be activated as required and recovery plans invoked following a situational assessment by the Manager of Infrastructure Services, or designee. Recovery activities shall proceed in accordance with the outlined priorities unless modified by the DR Manager.
- Data Center and Server Recovery: During a disruptive event, reestablishing the Data Center will be the highest-priority and a prerequisite for any IT disruption recovery.
- Network & Telecommunication Recovery: During a disruptive event, reestablishing network and/or telecommunications connectivity will be a high priority and a prerequisite for any IT disruption recovery. Recovery of these services will be accomplished in parallel or immediately following recovery of the Data Center.
- Application Recovery: During a disruptive event, reestablishing applications and application services will be a high priority and a prerequisite for any IT disruption recovery. Recovery of these services will be accomplished immediately following recovery of the Data Center and Network Services.
- Desktop/Mobile Recovery: During a disruptive event, reestablishing the desktop and the mobile environment will be a high priority and a prerequisite for any IT disruption recovery. Recovery of desktop and mobile services will be accomplished immediately following recovery of the Data Center, Network, and Application Services. (Recovery of this environment will likely occur during business hours only.)
The IS manager of a recovery team has the responsibility to keep the Manager of Infrastructure Services, or designee, and the other recovery teams up to date respective to their team's activities. As the campus resumes full functionality, the Manager of Infrastructure Services, or designee, will apprise the Vice President for Information Services, and determine, based on prior communication, how the resolution is communicated to the campus population. If appropriate, the DR Manager may decide to invoke the IT Disruption Response Protocol.
Maintenance of Plans
- Plans must contain current and accurate information.
- Planning must be integrated into all phases of the IT system lifecycle.
- IT DR tests that demonstrate recoverability commensurate with documented IT DR plans should be conducted regularly; as well as when warranted by changes in the business and/or information systems environment.
- Backup media supporting critical business processes should be tested semi-annually. Reviews should be completed within 60-days after a test to correct exposed deficiencies.
- Plan revisions should be completed within 60-days after a DR test is completed.
- The following maintenance activities should be conducted annually:
- Updating the documented DR plan.
- Reviewing the DR objectives and strategy.
- Updating the internal and external contacts lists.
- Conducting a simulation/desktop exercise.
- Conducting a telecommunication exercise.
- Conducting an application recovery test.
- Verifying the alternate site technology.
- Verifying the hardware platform requirements.
- IT managers are responsible for briefing staff on their roles and responsibilities related to DR planning, including developing, updating, and testing plans.
The University considers any violation of the directives outlined within this document to be an objectionable offense. Failure to comply shall subject the violator to disciplinary action by the University.
Any exceptions to directives outlined within this document are to be reviewed and approved by the Information Security Program Committee as needed.