How to Identify and Record Critical Incidents – Phase III
Resolution and Recovery of Critical Incidents
Background
Resolution and Recovery of Critical Incidents is a continuation of the previous blogs I have written about Phase I and Phase II which deal with Identification and Recording, and Investigation and Diagnosis. Understanding the previous blogs makes reading this blog easier as it is a continuation, I would recommend you to read them first.
We use the RACI to assign responsibility to the following personnel or groups.
Technical Restoration Manager
Incident Manager/IT Operations
Support Group
Process flow for Resolution and Recovery of critical incidents
1. The Technical Restoration Manager reviews the proposed recovery activities to determine if a Technology change is required. This activity is to confirm if Change is required.
RA | I | CI |
2. A decision is made that service restoration activities must be progressing through the implementation of a Change. (a) If Yes, go to step 3 (b) If No, go to step 7.
RA | I | CI |
3. A decision regarding the type of change is made based upon the Change Management Procedure and the Change Management Process Policy.
(a) If Latent, go to step 4
(b) If Emergency, go to step 6
A | CI | CI |
4. Change activity is reviewed and a Latent Change created to document historic change activities. The decision is that the change is Latent in nature.
A | R | CI |
5. Either a Latent Change is created OR Emergency Change created. The Change Management process is engaged to manage the lifecycle of all Technology Change activity.
CI | A | R |
6. Emergency Change created. The Change Management process is engaged to manage the lifecycle of all Technology Change activity. Restoration activities are reviewed with an Emergency Change created to restore or protect service.
CI | CI | RA |
7. The Support Group verify that a temporary work around or solution can be applied without the need for a change – while meeting the requirements of the Change Management Process Policy. The process moves to closure of the incident.
Additional Tasks to be performed
i) Following service restoration All Major Incident Bridges are closed.
ii) The Problem Management process commences, through: 1) The creation of a Problem Record where root cause is not known. 2) Linking the Incident to an existing Problem Record where appropriate. 3) Creating a Known Error record where root cause has been identified. 4) Linking the Incident to an existing Known Error record where appropriate.
iii) The Major Incident Manager reviews the Incident criteria against the Incident Review entry criteria. Upon review; (a) If a review is required go to step (iv) (b) If a review is not required go to step (v).
iv) The Major Incident Manager initiates an Incident Review following the resolution of the incident. A completed Incident Review distributed to agreed recipients within the scheduled timeframe agreed upon.
v) The Major Incident Manager reviews the Incident criteria against the Major Incident Review entry criteria. Upon review; (a) If a Major Incident Review is required go to step (vi) (b) If a Major Incident Review is not required the process ends.
vi) The Major Incident Manager initiates a Major Incident Review following the resolution of the incident. A completed Major Incident Review distributed to agreed recipients within the scheduled time-frame.
Authored by Vijay Chander – All rights reserved – 2023