ITIL Incident Management Process for IT teams – A complete guide

What is Incident Management Process?

 

Purpose & Scope

Purpose

The objective of Incident Management process is to restore normal service operation (as defined in the SLA with <Customer’s Name>) as quickly as possible, thus ensuring that the best possible levels of service quality and availability are maintained for <Customer’s Name>’s business.

Scope

All Incidents with a (possible) negative effect on the <Customer’s Name>’s service for which a Service Level Agreement (SLA) is signed, lies within the scope of ITIL Incident Management

The process will be applicable to all incidents or performance related requests by customer and internal automated alerts.

As incident volumes grow and IT environments become more complex, even a well-designed process can drift from its intended outcomes. Periodic evaluation ensures that incident response remains aligned with business priorities and operational realities. I would suggest that you consider a short ITSM Process Maturity and Gap Analysis, one that provides an end-to-end review of key ITSM practices — including Incident, Problem, and ITSM Change Management — to pinpoint performance gaps, tool limitations, and automation opportunities.

These types of engagements delivers a prioritized improvement roadmap tailored to your organization’s maturity level. We have laid out how a typical complete 4-week engagement looks like, and you could get it done by anyone — a strategic first step toward optimizing your ITSM ecosystem.

Why Do Incidents Keep Slipping Through the Cracks

Definitions

Incident

Incident is an unplanned interruption to an IT service or reduction in the quality of an IT service or failure of a configuration item that has not yet impacted service.

Major Incident

The highest Category of Impact for an Incident. A Major Incident results in significant disruption to the <Customer Name>’s Business.

A definition of what constitutes a major incident must be agreed and documented.

Configuration Item

Configuration Item is any component that needs to be managed in order to deliver <Customer’s Name> IT Service.

Workaround

The workaround is an identified means of resolving a particular incident by allowing normal service to be resumed; however, it does not actually resolve the issue that caused the incident in the first place.

Service Request

It is a request from a User for information or advice, or for a Standard Change or for Access to <Customer’s Name> IT Service.

Supplier

A Third party responsible for supplying goods or Services that are required to deliver <Customer Name>’s IT Services.

Service Knowledge Management System (SKMS)

It is a set of tools and databases that are used to manage knowledge and information. The SKMS includes the Configuration Management System (CMS), Known Error Data Base (KEDB) as well as other tools and databases. The SKMS stores, manages, updates, and presents all information that an IT Service provider needs to manage the full Lifecycle of <Customer Name>’s IT Services.

Roles and Responsibilities

Incident Manager

Responsibilities

  • Driving the efficiency & effectiveness of Incident Management Process.
  • Producing management information.
  • Managing the work of incident support staff.
  • Monitoring the effectiveness of incident management process and make the recommendations for improvements.
  • Managing Major Incidents.
  • Develop & manage Incident management process and procedures.

Service Desk

Responsibilities

  • Accept & register incidents
  • Categorize & Prioritize Incidents
  • Execute Initial diagnosis to restore the incident
  • Refer incidents to the appropriate Support Group
  • Tracks the progress incident during entire lifecycle (from start to end, register to close) to ensure that it is resolved within the agreed Service Level Agreement (SLA) and updates incident records if necessary
  • Keeping the entire Incident communication open with users and other key departments, stakeholders by informing them of progress immediately
  • Escalate to the appropriate management level when thresholds are violated
  • Closing all resolved incidents, requests and other calls
  • Conducting customer/user satisfaction call backs/surveys as agreed
  • Report about Incidents

Application Management Team (2nd & 3rd Level Support)

Responsibilities

  • Incident diagnosis and resolution.
  • Identify the required changes to resolve an incident.
  • Identify the requirement of partner involvement and initiate the process

Input, Output

Inputs

  • Phone calls
  • Emails
  • Web Interface
  • Event
  • Input from Technical Staff
  • Knowledge base, Known Error Database

Outputs

  • Service Request
  • Communications
  • Notifications
  • Solutions/Workarounds

Incident Management Process

Generic Incident Management Process

The ability to detect and resolve incidents, which results in lower downtime to the <Customer Name>’s business, which in turn means higher availability of the service.

This means that the business is able to exploit the functionality of the service as designed.

Incident Management Process Flow

Incident Management Process

Check out the other related Incident Management Blogs

Modify and record critical Incidents

Identify and record critical incidents Phase ii

Identify and record critical incidents Phase iii

Identify and record critical incidents Phase iv

Activity No.StepDescriptionInput/ OutputRole
1Incident Identification processAll key components should be monitored so that failures or potential failures are detected early so that the incident management process can be started quickly.  

Incident can be identified by proactive Event detection, web interface provided to user, User phone call, e-mail, Technical staff observation.  
Input Event management Web interface E-mail Phone call Input from technical staff Output IncidentUser
2Incident LoggingAll incidents must be fully logged and date/time stamped, regardless of whether they are raised through a Service Desk telephone call or whether automatically detected via an event alert.  

All relevant information relating to the nature of the incident must be logged so that a full historical record is maintained – and so that if the incident has to be referred to other support group(s), they will have all relevant information to hand to assist them.   Minimum Information required when logging an incident can be found in <Incident Logging Checklist>.
Output IncidentService desk
3Incident CategorizationAllocate suitable incident categorization coding so that the exact type of the incident is recorded.Input Incident Output Categorized incidentService desk
4Service Request?Service Requests which are part of ITIL Service Request Management are sometimes incorrectly logged as incidents (e.g. a user incorrectly enters the request as an incident from the web interface).   This check will detect any such requests and ensure that they are passed to the Request Fulfilment process.Input OutputService desk
5Incident PrioritizationAllocate an appropriate prioritization code using <Priority Guidelines>.   Prioritization can normally be determined by taking into account both the urgency of the incident (how quickly the business needs a resolution) and the level of impact it is causing.  

It should be noted that an incident’s priority may be dynamic – if circumstances change, or if an incident is not resolved within SLA target times, then the priority must be altered to reflect the new situation.
Input Incident Output PrioritizationService desk
6Major Incident Management?A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents.   The process is discussed in detailed in Section 5.2.Input Major Incident SKMS Output Closed IncidentService desk
7Initial DiagnosisService Desk Analyst must carry out initial diagnosis.   Try to discover the full symptoms of the incident and to determine exactly what has gone wrong and how to correct it.  

It is at this stage that diagnostic scripts, knowledge base and known error information can be most valuable in allowing earlier and accurate diagnosis.
Input Incident SKMS Output Symptoms Resolution StepsService desk
8Functional Escalation Level 2/3As soon as it becomes clear that the Service Desk is unable to resolve the incident itself (or when target times for first-point resolution have been exceeded – whichever comes first!) the incident must be immediately escalated for further support.Input Incident<Functional Escalation Matrix> Output Functional EscalationService desk
9Investigation & DiagnosisEach of the support groups involved with the incident handling will investigate and diagnose what has gone wrong – and all such activities (including details of any actions taken to try to resolve or re-create the incident) should be fully documented in the incident record so that a complete historical record of all activities is maintained at all times.  

During this step, support group will identify the changes required in order to restore the service. If the change is required then a RFC should be raised with Change Management process. And review the resolution after the change implementation.  

Support groups may also identify the involvement of suppliers or 3rd Party to restore the service. In that case Supplier Management process or 3rd Party communication should be invoked.
Input Incident SKMS Output RFC Partner involvement Resolution steps2nd/3rd Level Support
10Unable to find Workaround/Permanent Fix?If the Incident support groups (Level 1/2/3) unable to identify a workaround/permanent fix (With in the SLA time), then problem management process should get involved in investigation & to find the root-cause.Input Incident, SKMS Output Problem RecordService desk
11Resolution & RecoveryWhen a potential resolution has been identified, this should be applied and tested.   Sufficient testing must be performed to ensure that recovery action is complete and that the service has been fully restored to the user(s).  

Regardless of the actions taken, or who does them, the Incident Record must be updated accordingly with all relevant information and details so that a full history is maintained.   The resolving group should pass the incident back to the Service Desk for closure action.
Input Incident Resolution steps Output Incident restored2nd/3rd Level Support
12Hierarchic EscalationHierarchic escalation is also used if the ‘Investigation and Diagnosis’ and ‘Resolution and Recovery’ steps are taking too long or proving too difficult.  

Hierarchic escalation should continue up the management chain so that senior managers are aware and can be prepared and take any necessary action, such as allocating additional resources or involving suppliers/maintainers.  

Hierarchic escalation is also used when there is contention about to whom the incident is allocated.
Input Incident<Management Escalation Matrix> Output Management EscalationService desk
13User Confirmation?After the incident resolution, it is required to confirm with the user to check the effectiveness of the resolution.   If the incident is solved from the user prospective then we can proceed to close the incident.   If the user not confirmed with in the agreed time lines then also we can proceed to close the incident.Input Incident, Incident Resolution Output User confirmationService desk
14Need to Re-open the ticket?If the user not accepted the resolution (not solved the incident), then decision has to be taken to re-open the ticket.   If the user comes back with the agreed timelines then we need to re-open the same incident (Rules to re-open should be agreed and documented).

Otherwise we have to open a new Incident ticket and follow the process.
Input Incident Output Re-open Incident, New IncidentService desk
15Workaround/ Chance of reoccurring/ P1 CallIf the incident is resolved by providing a workaround (Not a permanent fix) or the support staff/service desk identify that the incident may reoccur again then, the update should be passed on to Problem management (To create a Problem ticket) for a permanent fix.

If the Incident is a P1 Call, then it should be passed to Problem Management for Root cause Analysis (RCA )as a Proactive measure.
Input Incident Output Problem ticketService desk
16Incident ClosureThe Service Desk should check that the incident is fully resolved and that the users are satisfied and willing to agree the incident can be closed. The Service Desk should also execute the <Incident Closure Checklist>Input Incident Output User satisfaction, Problem Ticket, Closed IncidentService desk

Another representation and interactions between other process is shown in the diagram below:

Incident Management Interaction with other ITSM Processes

Major Incident Management Process

A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents. A definition of what constitutes a major incident management must be agreed and ideally mapped on to the overall incident prioritization system – such that they will be dealt with through the major incident management process.

Major Incident Management Process flow

Major Incident Management Process flow

Note: Most of the activities are explained in the ITIL Incident management process. Only the activities which are not discussed earlier will be discussed here.

Activity No.StepDescriptionInput/ OutputRole
1Functional Escalation to Level 3As the business impact of the incident is high. Incident should be directly assigned to the 3rd Level for a quick resolution.Input Incident Output AssignmentService desk
2Create Bridge CallTaking the impact and urgency into the consideration, bridge call/Conference call should be initiated and all the stake holders should participate in the call.

Problem management team should also get involved and try to find the underlying cause. 
Input Incident Output Bridge CallIncident Manager
3Workaround/Permanent Fix?During Investigation & diagnosis, if there is a work around or a permanent fix available, then the solution will be applied.  

If there is no work around or a permanent fix available then it should be referred to Problem management for Root Cause analysis (RCA).
Input Incident Output Problem ticketService desk

FAQs

What is Incident Management?

Incident management is a systematic process for identifying, analyzing, and resolving unplanned disruptions to normal operations or services to restore them quickly and prevent recurrence. Its primary goal is to minimize downtime and the impact of these disruptions on business functions, customers, and overall operations.

Incident Management Process vs. Incident Management?

Incident management is the overarching practice of restoring normal IT service operation as quickly as possible after an unplanned disruption, aiming to minimize negative impact on business operations. The incident management process refers to the specific, structured steps and activities undertaken to achieve this goal, including logging, categorizing, prioritizing, investigating, resolving, and closing incidents, often following frameworks like ITIL.

What is the different between Incident and service request?

The core difference between an incident and a service request lies in their purpose: an incident is an unplanned interruption or reduction in service quality requiring immediate restoration, while a service request is a formal user request for something new or additional that is typically pre-approved and part of regular IT operations.

What are the key stages of Incident Management Process?

The typical incident management process involves:
Identification: Detecting and logging the incident, including details like name, ID, description, and date.
Categorization: Assigning a logical category to the incident for analysis and future reference.
Prioritization: Assessing the incident’s impact on the business, number of users affected, and any relevant SLAs to determine urgency.
Response: Sending the incident to the appropriate team for troubleshooting, with escalation if needed.
Resolution and Closure: Resolving the incident, confirming with the end-user, and documenting all steps for future reference.

What are some of the best practices of Incident Management Process?

Some of the best practices are listed below, you may have your own depending on your organization:
* Establishing clear incident escalation and notification procedures.
* Implementing effective incident categorization and prioritization methods.
* Encouraging effective IT communication and collaboration among teams.
* Regularly reviewing and updating incident response plans and procedures.
* Capturing lessons learned after incidents for continuous improvement.
* Automating incident management processes where possible.

If your IT team is struggling with recurring incidents, slow resolution times, or rising SLA breaches, our incident management services can help you stabilize operations quickly. We assess your current IM workflow, identify root causes of delays, and design a standardized, ITIL-aligned process tailored to your environment. Our team helps set up categorization, prioritization, escalation paths, and major incident handling frameworks. With improved visibility, dashboards, and team training, you can reduce firefighting, improve MTTR, and deliver a consistent, reliable service experience. Contact us.

Vijay Chander

Authored by Vijay Chander – All rights Reserved 2023

Comments are closed

Calendar Link