Significant lessons must be learned from major faults that impacted millions of computers across the globe and caused chaos for 999 callers.
Cybersecurity firm CrowdStrike actioned a security update on Friday 19th July, causing an estimated 8.5million computers running Microsoft Windows to crash across the world.
The outage has already been described as one of the worst in history, with businesses, banks, airlines, and government departments all effected. Over 1,400 flights into and out of the US were cancelled on Sunday, with Delta and United Airlines bearing the brunt.
In the UK, the NHS and other healthcare providers were also among the victims, leading to services and appointments being cancelled. Some organisations were reportedly still trying to restore systems yesterday, Monday 22nd July.
While incidents like this are extremely rare, analysts have suggested the CrowdStrike outage should be considered a wake-up call to the vulnerabilities of connected and networked IT systems. In particular, how a single glitch can cause worldwide chaos, placing businesses, livelihoods and, in some instances, lives in danger.
Just days after the situation unfolded, Ofcom confirmed it was fining British telecoms giant BT for a 10 hour outage that disrupted calls to the UK emergency services line, 999. A configuration error in a single file on the company’s server led to call handlers’ systems restarting the moment a call was received, while other were logged out of the system entirely. Some calls were disconnected upon transfer to authorities, and several were simply returned to the back of the initial queue.
Raising further concerns, the watchdog stated in its report that BT had been unable to determine the cause of the issue. An attempt to switch to a backup server and disaster recovery platform then failed due to human error. This was ‘a result of instructions being poorly documented, and the team being unfamiliar with the process.’
The migration eventually happened just before 9AM, four hours after problems began, but difficulties persisted throughout the day until around 5PM as the backup platform struggled to cope with demand. Overall, some 14,000 calls are believed to have been impacted by the incident.
Both the BT and CrowdStrike cases led to potentially life-threatening circumstances, and the two incidents were caused by oversights and errors, rather than nefarious criminal activity on the part of hackers. As such it’s vital that lessons be learnt from these events, not least in terms of the need to improve closed testing and improve training and preparedness for major outages and cyberattacks.
This month, we have already reported on the vulnerability of the UK’s offshore wind farms to digital attacks. Meanwhile, new research has also pointed to ‘backdoors’ in electric vehicle rapid charging technology systems which could offer unauthorised access to personal data.
As we continue to move from localised to globalised networks, with the rise of cloud storage a prime example, the impact and fallout of outages will inevitably become more widespread and damaging, revealing an urgent need to significantly improve fail-safes and develop more effective ways of identifying and correcting fail points.
‘The dangers of deploying commercial grade software in safety critical systems cannot be understated. The immense body of software developed using Silicon Valley’s ‘move fast and break things’ culture means that the software our lives depend on is riddled with defects and vulnerabilities,’ said secure software expert and founder of advocacy group The Dawn Project, Dan O’Dowd. ‘Defects in this software can result in a mass failure event even more serious than the one we have seen today.
‘We have seen that our healthcare, communications, transportation, water treatment plants, and power grids are all reliant on connected systems built on defective software that can cause the world to grind to halt from a single defect,’ he continued. ‘We must convince the CEOs and boards of directors of the companies that build the systems our lives depend on to rewrite their software so that it never fails and can’t be hacked. The clock is ticking down to a cyber Armageddon. Secure and reliable software exists – it is already deployed in military applications and on commercial airliners. These companies will not take cybersecurity seriously until the public demands it. And we must demand it now, before a major disaster strikes.’
Leave a Reply