What is Microsoftâs âBlue Screen of Deathâ?
On July 19th, 2024, Microsoft experienced a widespread âblue screen of deathâ failure across multiple locations globally, affecting numerous users who were unable to operate their operating systems, with many encountering errors related to âcsagent.sysâ. The root cause was identified as an erroneous update by the U.S. cybersecurity services provider CrowdStrike. This incident impacted various organizations and services worldwide, including airports, television stations, and hospitals.
On the day of the incident, many users reported that after installing the latest update for the CrowdStrike Falcon Sensor, Windows hosts entered into a boot loop or displayed the Blue Screen of Death (BSOD).
The security provider acknowledged the issue and issued a technical alert, explaining that their engineers âidentified the deployment associated with this issue and have rolled back those changes.â
How Do I Fix the âBlue Screen of Deathâ on Windows Host
CrowdStrikeâs engineering team responded swiftly to the crisis. According to a pinned post on their forum, the team identified and restored the deployment associated with the issue.
CrowdStrike revealed that the culprit was a driver file containing sensor data. As it was merely a component of the sensor update, the issue could be addressed by handling these files individually, without requiring the removal of the Falcon sensor update.
For affected users, CrowdStrike provided the following remediation steps:
Step 1. Boot Windows into Safe Mode or the Windows Recovery Environment
- Restart your computer.
- When your computer restarts, press F8 (or Shift + F8) to open the Advanced Boot Options menu.
- Select Safe Mode and press Enter.
Step 2. Delete Relevant Files
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
- Locate files matching âC-00000291*.sysâ and delete them.
Step 3. Restart Your Computer
CrowdStrikeâs CEO confirmed the release of a fix and advised customers to download the latest updates.
CrowdStrike CEO comments on the Windows host crashes caused by faulty updates.
In their updated statement, CrowdStrike mentioned that âthe problematic file [C-00000291.sys, timestamp 0409 UTC] has been revertedâ and that the correct version is C-00000291.sys with a timestamp of 0527 UTC or later.
How to Fix the âBlue Screen of Deathâ on Cloud and Virtual Environments
The company also provided two options for addressing the issue in cloud and virtual environments: rolling back to a snapshot before UTC 04:09 or following a seven-step procedure.
- Detach the operating system disk volume from the affected virtual server.
- Before proceeding, create a snapshot or backup of the disk volume to prevent any accidental changes.
- Attach/mount the volume to a new virtual server.
- Navigate to the %WINDIR%\System32\drivers\CrowdStrike directory.
- Find the file matching âC-00000291*.sysâ and delete it.
- Detach the volume from the new virtual server.
- Reattach the fixed volume to the affected virtual server.
Expert Comments
For this incident involving Microsoftâs blue screen of death, Wang Lijun, a security expert from Qihoo 360, stated:
The main cause of the Windows computer crashes due to CrowdStrike software updates was a bug in its core driver csagent.sys, which prevented the operating system from starting normally and led to blue screens. Unlike typical applications, the operation of security software drivers involves the low-level operations of the operating system, so any problems directly affect system stability.
This incident had a broad impact, particularly evident in the Asia-Pacific region such as Japan, but also caused significant disruptions in Europe and the Americas. Affected areas were mainly multinational companies using CrowdStrike and their branches in China, as well as some foreign cloud computing environments, particularly those running Windows-based application instances.
While the incident affected multiple versions of Windows, the specific scope of impact may vary due to technical details. Moreover, despite simple fixes such as manually deleting or renaming related driver files, the repair process is relatively time-consuming and complex when dealing with a large number of machines and decentralized management.
In conclusion, this incident highlights the systemic risks that security software updates may pose, particularly challenging the management of impacts and emergency response capabilities for large-scale deployments.