Risky Thinking
July 2014
Michael Z. Bell
www.RiskyThinking.com

Risky Thinking is a free newsletter providing essays, analysis, insights, and oddities related to Business Continuity, Disaster Recovery, and Risk Management.

To subscribe, visit: http://www.RiskyThinking.com/newsletter/

For more information and articles, visit the RiskyThinking website at
http://www.RiskyThinking.com/.


In This Issue
  • Single Point of Failure in the Cloud
  • Business Continuity Plan Review
  • MH370 Dsappearance - No Theories, Only Lessons Learned
  • Does Everyone Know What to Do in an Emergency?
  • News: Hurricanes, Fire, Flood and Power
  • Risk Assessment Toolkit
  • Administrivia, Subscribing and Unsubscribing

Single Point of Failure in the Cloud

On June 17th this year an Australian company called CodeSpaces discovered it had a problem.

CodeSpaces provided cloud based source control services to software developers. Its infrastructure was build using Amazon Web Services.

On June 17th it was suffering from a distributed denial of service attack. This unfortunately is not an unusual event for web-based companies.

However, at the same time it started receiving extortion threats from someone who had evidently obtained access to its Amazon Web Services control panel.

It did not end well.

The company attempted to regain exclusive access to its control panel. When the extortionist realized this, he set about deleting everything he could in the company's web services account. This included virtual servers, data stores, and backups. By the time the company had regained exclusive control of its web services account, there was not enough of the company's digital assets left for it to continue in operation.

The company's single point of failure in this case proved to be the credentials required to login to its web services account. It's not known how the attacker obtained these credentials - perhaps a phishing or a social engineering attack. It's also not clear if the company had one set of credentials or many, or if the sets of credentials had been properly limited in scope using the "least privilege" principle.

In the Amazon Web Services security model there is always a master user, protected only by an email address and password. This account has total administrative access. Secondary users can be set up with highly restricted roles and optionally using two-factor authentication. The attacker may wll have set up additional accounts to maintain access if a compromised account was removed.

Some important lessons can be learned here:
  • Accounts with full access need to be very well protected. Preferably they should only be used to set up other accounts with lesser privileges. There will always be a danger that they can be phished, or that the account credentials can be reset by social engineering of the supplier, but if the number of people using credentials with full access is restricted, the risk can be minimized.
  •  The list of users should be regularly checked to ensure that no extra users have been added, and that all users (human or machine) have the least privilege needed to perform their roles.
  •  Online backups are not a perfect substitute for offline backups. This is particularly true when human actors are involved. If the attacker had to request that tapes or disks be mounted or shipped before data could be deleted, it is unlikely he would have succeeded in causing so much damage. (This also applies in the case of human error, as well as in the case of malware problems).  
For CodeSpaces own description of the incident:
http://www.codespaces.com/


Business Continuity Plan Review

It's often difficult to see your Business Continuity / Disaster Recovery Plans from a different perspective. Is it realistic? Does it miss something obvious? An external pair of eyes can often see problems or solutions which you can't.

We offer an economic fixed price service to review your business continuity plan. We will review documentation, interview key staff, and prepare a confidential written report identifying the strengths and any weaknesses we can see in your current plan.

Contact us for further details
http://www.RiskyThinking.com/


MH370 Disappearance: No Theories, Only Lessons Learned

"Malaysian Three Seven Zero contact Ho Chi Minh 120 decimal 9 Good Night"
"Good Night Malaysian Three Seven Zero"

With those words, flight MH370 from Kualar Lumpur to Beijing disappeared, and was never seen or heard from again.

About an hour and twenty minutes later, it was concluded by Subang Air Traffic Control that the aircraft was genuinely missing. After leaving Malaysian air space, it had apparently never contacted Ho Chi Minh air traffic control. At this point there was still hope: aircraft radios have been known to fail, pilots have been known to divert, and it was too early to conclude that the plane was definitely lost. As the clock ticked on, past the point where the plane's fuel would have run out and with no reports of emergency landings, it became clear that the plane had crashed.

So began one of the most confusing air crashes in history: there are some military radar tracks possibly associated with the plane,  a theoretical track based upon the delays and doppler shift of a satellite signal, but no evidence to suggest why the aircraft changed course, or where it is now.

However, the impression left on the rest of the world by Malaysian Airlines and the Malaysian authorities is one of total confusion. Everybody seemed to be giving statements and announcing new findings, only to have the information contradicted minutes or hours later. The friends and relatives of those on board were quickly exasperated: only conspiracy theories seemed to explain the lack of reliable information. It was a communications disaster.

There's a quotation from Sherlock Holmes that seems appropriate here:

"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."

(A Scandal in Bohemia, Arthur Conan Doyle, 1892)

However, the problem here was not just a lack of facts: it was a surplus of irrelevant facts.

Debris fields were quickly spotted in the ocean near where the plane was last seen on radar. These proved not to be related to the aircraft. Then debris fields were reported in another area. Then an oil slick. Governments were castigated  for not investigating a particular sighting, or reported that plane debris had been found. All these reports were eventually discounted. Unfortunately there is so much debris in the worlds' oceans that anywhere you look for debris, you will quite likely find some.

Then there were possible pings from the flight recorder. These came and went for tantalizing short periods, and led to a sea bed search. Nothing was found. Were they from the plane? It's still possible that they were. Unfortunately sound sometimes travels extraordinary distances in the ocean, and the few signals heard could have been from anywhere over a wide area.

The complexity of the conflicting information can be seen from this rather confusing summary of all the information known so far found in The Aviation Herald. http://avherald.com/h?article=4710c69b

What can we learn from this disaster? We can't learn anything yet about air safety, but we can learn something about disaster communication. From the press reports, it was evident that we should:

  • Communicate through a single point of contact.
    (In the MH370 disaster too many people were commenting and providing unverified information.)
  • Be very clear about what is known, what is not known, and what is simply a theory that is being investigated.
  • Explain normal procedures and technical operations to defuse conspiracy theories:
The last of these is particularly important, and is one of the areas that the Malaysian authorities had great difficulty. For example:
  • Why weren't friends and relatives told immediately the aircraft was reported missing?
    (Because in most cases the aircraft is simply delayed, a pilot has forgotten to report, or a radio is nor working correctly. It's not fun being told your relatives are dead when they aren't.)
  •  Why did passenger's mobile phones still appear to ring after the plane disappeared?
    (Because the ring tone indicates the network is trying to contact the subscriber, not that the subscriber's phone is actually ringing)
  •  Why didn't the authorities know where the plane was at all times?
    (Because of the ranges involved, and because the curvature of the earth means that much of the world is outside ground radar coverage.)
Before a disaster happens, it is therefore important to identify in your planning:
  •  Who will communicate?
  • What will they say?
  • How will they react to outlandish theories and suggestions?
  • How will they make it very clear what actions are being taken in the public interest?
  • How will they convince a cynical audience that the disaster will never happen again?

This is a good area for tabletop training exercises for those who might be involved in public and internal communications. Until you have had the experience of dealing with a small crowd of noisy mock journalists asking difficult question and misinterpreting your answer, you are not prepared for crisis communications.


Does everyone know what to do in an emergency?

We're currently working on a system which will distribute emergency response plans to employees' mobile phones. One of the problems with business continuity plans is that they aren't readily available at the time when they are most needed, and they are too complex and too detailed to use during an incident. By putting a simple emergency response plan on everyone's phone, you make sure that your staff will know what to do when something happens.

If you would like to get involved in this development by answering questions and trying out prototypes, please sign up at
http://www.riskythinking.com/plan424/


News: Fire, Flood and Power


The Hurricane Season starts, Typhoon season continues

The 2014 hurricane season arrived with hurricane Arthur creeping up the East Coast.
http://online.wsj.com/articles/hurricane-arthur-heads-toward-nova-scotia-1404567515

The typhoon season continues, with super-typhoon Neoguri approaching Japan
http://www.theguardian.com/world/2014/jul/07/super-typhoon-neoguri-japan-okinawa-islands

Don't forget that the best time to make plans and stock up on supplies is before the government issues a warning in your area, not after.


FIRE

You don't have to make any mistakes to have buildings destroyed by a fire... Here are some recent examples.

Massive Industrial Fire in Phoenix
Widespread fire across city block. Believed to have been caused by failure of electrical poles during monsoon like weather.
http://www.azcentral.com/story/news/local/phoenix/2014/07/04/phoenix-fire-multiple-businesses-abrk/12207987/

Lightning Strikes cause Mulitple Fires
http://www.clickondetroit.com/weather/9-clinton-township-businesses-completely-destroyed-by-recent-storms/26561582


Wisconsin  fire destroys multiple businesses - cause unknown.
http://www.wearegreenbay.com/1fulltext-news/d/story/businesses-destroyed-in-downtown-sturgeon-bay-fire/48737/jpQxDz21AUaF6qc3p3Mj7g

Trucking business destroyed by fire
http://www.limaohio.com/news/home_top/5200368/Trucking-business-destroyed-by-July-3-fire


FLOOD

Severe weather often brings flooding. Locally, the road we are on is currently closed following a flash flood which washed out a culvert leaving a hole ten feet wide and twelve feet deep.

The UK has been hit by a number of high profile floods recently. Here's some useful advice on preparing a flood plan from the UK government
https://www.gov.uk/prepare-for-a-flood/make-a-flood-plan

The UK government also provide a checklist of things companies should keep on hand if they believe they are at risk of flooding. See page 12 of the following:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/292937/LIT_5284_ab06c2.pdf


POWER

It's not very unusual for power failures to affect both primary and backup systems. Here are some recent events:

Passengers stuck in the Channel Tunnel
http://www.independent.ie/world-news/europe/hundreds-of-passengers-evacuated-from-channel-tunnel-after-power-failure-30412107.html

Visitors stuck on amusement ride
http://www.theglobeandmail.com/news/world/48-stranded-for-hours-as-power-failure-halts-seaworld-skytower/article19388116/

Squirrel in power substation causes large power outage
http://www.katu.com/news/local/Thousands-lose-power-in-Washington-Co-265535171.html

Falling tree causes five hour power outage
http://www.gastongazette.com/spotlight/felled-tree-power-failure-1.338288

Power outage takes out emergency call center
This report is interesting because it may have been caused by a failure in the backup system,
and the emergency communications center had to revert to "manual mode" during the outage.
http://www.ktvu.com/news/news/local/power-outage-impacts-santa-clara-county-communicat/ngYYr/

Power failure shuts down Australia's Sydney Airport terminal for a number of hours
http://www.dailytelegraph.com.au/travel/travel-news/power-outage-causes-travel-chaos-at-sydney-airport/story-fnjjv9zk-1226968557455



Risk Assessment Toolkit

Our Risk Assessment Toolkit is designed to assist you in creating and maintaining a Risk Register and Business Impact Analysis by modeling dependencies, simulating disruptions, and calculating potential losses. An evaluation copy is available.

For more information:
http://www.riskythinking.com/risk_assessment_toolkit/


Administrivia, Subscribing, and Unsubscribing

RISKY THINKING is a free newsletter providing essays, analysis, insights, and oddities related to Business Continuity, Disaster Recovery, and Risk Management. You can subscribe on the web at http://www.RiskyThinking.com/newsletter/.

Please feel free to forward RISKY THINKING to colleagues or friends who will find it valuable. You may reprint this newsletter provided it is reprinted in its entirety.