Project Management

5 steps to prevent an IT outage, through project management

We spoke to QA’s Capability Lead for Project Management, Jackie Hewett, to uncover how project management best-practices can help to avert IT crises

We are all familiar with frequent tech updates, especially when it comes to our mobile devices. Now scale that up to include all of the systems updates that a large organisation may require...

This covers a huge range of updates with objectives such as:

  • Improving performance / efficiency
  • Rolling out a new product or service
  • Amending business processes
  • Perhaps most important of all, protecting the business from hackers and viruses

Updates are a necessity. But what happens when they go wrong?

We have seen an increase in recent incidents, such as the 19th July CrowdStrike breach, with results that can be catastrophic for business.

The Crowdstrike incident caused issues globally, with many flights cancelled, and banking services impacted. The knock-on effects left retailers unable to take card payments, healthcare providers having to write prescriptions by hand rather than sending them electronically to pharmacists, media channels going down – and it doesn’t end there.

Project Management ‘prevention‘ of outages

It is true that all businesses should have robust disaster recovery planning to handle the fall-out of worst-case and unexpected scenarios, including what are often termed ‘acts of god’.

However, what if you could prevent a harmful incident in the first place?

We spoke to QA’s Capability Lead for Project Management, Jackie Hewett, to uncover how fundamental best-practices within project management can help to avert such crises; “As they say -prevention is better than cure.”

System updates are usually run as projects, so here are Jackie’s top five tips to prevent an outage through masterful project management.

  1. Risk management – This is the obvious one, Jackie says; ‘Identifying and handling the potential ‘gotcha’s’ is fundamental to running any project, as it can prevent a risk from materialising and becoming a live issue!’
  2. Release planning – This includes assessing whether the business is ready for the roll-out and has the resource to deal with any issues.
  3. Quality control – Testing a new product or release is essential. It is often impossible to test absolutely everything; so what’s Jackie’s advice? “Prioritise testing to focus on things that have the potential to cause the greatest pain.”
  4. Training and skills development – Ensure that end users of the new product are trained, but also those who are going to support the roll-out and maintain the new system.
  5. Managing scope and requirements – “Especially those unrealistic user expectations which can lead to problems,” says Jackie. Consider whether the project scope should include making adjustments to a disaster recovery plan if the change is significant enough.

Effective risk management steps for IT outages

Since risk management is tip number one, let’s explore in more depth, with the steps commonly included in a project risk management process:

Risk identification

When identifying individual risks, it’s vital to get the right people involved, especially technicians. A project manager likely won’t have the perspective to identify all project risks themselves.

“Hold workshops and make sure you have someone who provides the perspective for ‘go-live risks’.” Jackie advises. “Part of the risk identification step also entails writing a risk management strategy at the project start – setting out how risks will be managed throughout, detailing plans like risk identification workshops.”

Risk analysis

There are many different scales or mechanisms used to assess how bad a risk could be. Most organisations will consider probability (likelihood of the risk occurring) and the impact (such as financial) if it does, but Jackie suggests that ‘proximity’, meaning when the risk could occur, is also important in this scenario – specifically thinking about ‘go-live’ risks.’

Response planning

“The key here is proportionality” says Jackie.

Make sure that whatever resources (including people, time and money) you put toward your response plans, is proportionate to the risk.

“There are many different response types we can use. When thinking about a risk with the potential for system outage, it’s important not just to think about reducing the probability of that risk occurring if it isn’t entirely preventable. You must also have ready-to-go contingency (or fall-back) plans.”

These are the pre-determined actions you will apply if the worst happens.

Risk response implementation

Carrying out the planned response actions will mean assigning an owner to each risk and possibly someone else to take the actions. “An important aspect here’ Jackie reminds us, “is to set aside the appropriate risk budget – how will it otherwise be paid for?”

Risk communication

Risks and response actions should be communicated throughout the project. There are many documents project managers use for this. As well as the aforementioned risk management strategy, you will need some kind of risk log to detail individual risks and track associated actions. This can also be used as input for project reporting.

In summary

The ‘prevention rather than cure’ approach that good project (and risk) management provides won’t identify, prevent and plan for every potential IT system outage… but it will take you a long way.

“To my mind” says Jackie, “one of the key things that project managers need to do is manage risks and issues. It may just be even more important than having overly detailed plans.”

If you are ready to learn more about project and risk management, here are Jackie’s recommended courses to start with:

Beginner:

Intermediate:

Expert: