top of page

Incident Data Presentation

  • Writer: David Peček
    David Peček
  • Jan 14, 2020
  • 3 min read

Updated: Sep 12, 2020


We all hate it when they happen, but major issues do arise with our systems. How we present information when these major events occur can make or break faith in our ability to solve an issue. It is vital to be consistent, clear, and up to date with data being presented by an operations team during a crisis. Part of setup of an operations team is defining your incident presentation as discussed in this article.

Design your presentation strategies for incidents to be consistent, easy to understand, and customized to the intended audience.

Presentation of the current state of incidents can be broken down into 3 sets of data based on the customers you are trying to communicate with: external customers, internal support staff, and management for analysis. Each of these presentation methods should derive from the same data source to avoid any extra work. Just collect the needed data as the incident progress at the points when the data is most fresh in peoples minds.


Use OTS Incident Software

 

If you already have an operations management system like OpsGenie, those have built in incident management tools. They follow common practice incident standards and will guide you into doing the right thing just by using them. They will handle collecting the correct information, sending it to internal and external customers, and let you report on various aspects of the incidents.


If you don't have anything like this, read on for a guide in how to manually set this up yourself.


Internal Operations Dashboard

 

For employees interested in which incidents are ongoing, you should make a one stop shop for them to see the statuses of all related tickets. Create a dashboard tied to your incident ticketing system which visualizes this data for them. The goal here is to let people know what is coming up, actively being worked, and items which might need bigger fixes.


  • Escalation candidates. A list of tickets internal people have raised as potential major issues.

  • Active incidents.

  • Operational fix tickets. Show the list of tickets open being worked on by operations personnel to correct an incident.

  • Development tickets. Populate a list of development facing tickets which will resolve current open incidents.


Customer Facing Status Page

 

The largest audience you will have is your customers. The top level messages you convey here will make or break your customers faith in your service, its stability, and your ability to solve problems. There are many vendors out there who do an excellent job having a ready to go professional looking status page for you to use when needing to send out and display updates about the status of your products and services. It is best to use one of them and not try and code your own. Pick one which ties into your current ticketing or operations alerting systems for easy integration. This way you don't have to double type information between the systems.



Management Analysis Dashboard

 

Incidents need to be analyzed from a strategic point of view as well to see current status and what can be learned from these issues.


  • Escalation candidates.

  • Active incidents.

  • SLA adherence. Is the team able to triage and respond to incidents within an acceptable timeframe?

  • Components impacted by date. Used to see which components are having the most issues within a given timeframe.

  • Origins / root cause. A simple data point should be captured for each incident which is if the issue came from an internal or external source.

  • Description and post mortem word cloud. Used to see if there are some common phrases in the incident itself or post mortem to determine if there are any common themes which should be addressed.

  • Post mortem tickets. Listing of tickets which are created as a result of a post mortem along with their post mortem details outlined and follow on tickets. It is important to keep up with corrections that need to be made after to ensure the incident will not happen again.

Comments


bottom of page