How much does a disruption cost?

Some thoughts on the complexity of determining the costs of downtime.

The Money Question

In designing our new product, the Risk Assessment Toolkit, one of the things we needed to do was model the cost of disruption of an internal activity or process.

External facing activities are often the easiest to discuss: if our web site or phone system goes down, it’s clear that we can’t take customer orders. If the shipping department is under water, then goods can’t be shipped. But what about less defined activities such as Human Resources, or Marketing. How much does a disruption in one of these departments cost?

A naive approach would be to assume that each hour lost is equal to the cost of the staff sitting (or standing outside in the car park) idle. With this approach we must be careful to include any overheads – payroll taxes, medical insurance and other expenses – which might not show directly on a department’s budget.

But this is only half the story.

A digression here: an example from a lecture on Computer Security at Cambridge University made a lasting impression on me. (This was a real world example -although the names and details have been omitted for reasons of confidentiality).

A large utility bought an expensive laser printing and folding system to send out its monthly invoices. It was a unique and expensive system. The company could only afford one of them. With the anticipated workload, it was expected to operate 24×7 and be 99% occupied. When one of its parts failed, the system was down for a week. How long did it take before the bills are once again sent out on on time? At the time interest rates were high – around 10% – so the supplemental question was to estimate how much the company would lose if it had a million customers and the average monthly bill was $100.

Don’t worry – you don’t need to work out the answer. But do consider what the implications are for our hypothetical department.

Suppose the power goes out for half an hour. People stop working. But by the end of the day, are they significantly less productive? If your department is like most departments, there is a degree of over-staffing. People are not 100% occupied most of the day. This is required for resilience under normal operating conditions. Can the department continue operating with one person off sick or on holiday? Almost certainly. Is there a peak workload which has to be handled at certain times of year? Then at other times of year people aren’t 100% busy. People may be idle for half an hour, but by the end of the day, any small outage is unlikely to have had any noticeable effect on productivity. The monetary loss is quite likely to be zero.

How big can such an outage without loss of productivity be? An instructive example here comes from Britain during the 1970s. There was a shortage of electricity due to a miner’s strike. Businesses were switched to a three day work week to reduce electricity consumption. Surprisingly, some businesses reported no drop or even an increase in productivity. People were working shorter hours, but working harder. This demonstrates that, depending on the nature of the work, a department may be able to cope with even quite a long outage with negligible effect on productivity.

But there comes a time when a department will not be able to catch up with its backlog of work in a timely manner while working normally. At this point, overtime is required. Typically this will cost more than standard working hours – perhaps 150% or 200% of standard time. However, this cost increase will only apply to salaries and payroll taxes. It won’t apply to other overhead costs (such as medical insurance). (This is one of the reasons why some companies prefer to increase overtime rather than increase staffing). After an initial period, the amount of overtime required approximately scales with the length of the disruption.

There are only 168 hours in a week, and even with overtime it is not practical (and generally not even legal) to expect your staff to work all of these hours.

At this point you need more staff. Hopefully you can get the extra staff you need on a temporary or contract basis. Contract staff cost more on an hourly basis, but since this cost includes most overheads the difference in cost is not as great as a simple comparison of hourly rates might suggest.

The conclusion from this is that the model for staffing costs contains three phases: an initial phase (of low cost) where lost productivity can be made up without additional overtime. An intermediate phase, where overtime can be used to recover in a timely manner. And finally there is a phase where additional staff will be required to clear up the backlog of work in a timely manner.

However, there are other non-staffing costs to consider if the department must relocate to an alternative location. This will involve transport costs, purchasing of additional or replacement equipment, costs of installing telephone and network services, and perhaps an activation fee for the use of the alternative location. Assuming the department is forced to relocate, there is a sizable cost component which is independent of the length of an outage.

In addition, if facilities are rented, there will be rental costs for the new location. Equipment may need to be hired or leased. Security services may need to hired. And if the new location is a significant distance from the original location, staff will also require travel and meal expenses.

There are thus an additional fixed and variable costs which will be incurred if the department must relocate. The variable costs will continue until a new location is purchased and prepared, or the original location can be repaired for re-use.

Finally there may be some costs which are incurred if deadlines can’t be met: for example, there may be statutory fines if paperwork isn’t submitted on time. There may also be losses in corporate reputation if some non-critical activities (such as updating a website or sending out press releases) don’t happen for an extended period of time.

In summary, we have:

  • An initial hourly cost, when normal reserve capacity can be used to clear the backlog of work after an outage.
  • An interim hourly cost, when additional overtime or equipment must be rented
  • A final hourly cost, when additional staff or equipment must be brought in to clear the backlog.
  • A fixed relocation cost, which is incurred when work is moved to an alternative location.
  • An daily relocation cost, reflecting the increased expense of operating at a second temporary location. This will persist until a new permanent location is found. Given the time taken to identify and purchase or rent real estate, working for 30 days at an alternative location is not unlikely.
  • Various fixed costs which are incurred if the activity has been delayed for an extended period, such as statutory fines or loss of customer goodwill.

Obviously this is all still a highly simplified model, but it does capture some of the important distinctions. In particular, it captures the distinction between a short disruption and an extended one; and between a disruption which requires relocation and one which only requires staff to wait until the disruption ends. Without these distinctions it would be easy to over-estimate the cost of a minor disruption, or under-estimate the cost of a major one.

28 March 2013

To get notified when new articles appear, subscribe to the Risky Thinking Newsletter. It's low volume: we don't send out an issue unless there is something interesting to say. You can also subscribe to our RSS Feed

Recently published articles can also be found here.

Agree or disagree? I'd like to hear your thoughts. Please initially use the contact form to get in touch.