After Crowdstrike fiasco, Microsoft tests Windows repair automation tool

Microsoft testuje nową funkcję, "Szybkie Odzyskiwanie Maszyny" (QRM), która ma zautomatyzować proces naprawy systemów Windows, które nie mogą się uruchomić. Inicjatywa ta, będąca odpowiedzią na problemy z masowymi awariami rozruchu, ma na celu zwiększenie odporności systemu Windows i zminimalizowanie przestojów.

Kuba Kowalczyk
source: Unplash/ Sunrise King

Microsoft is piloting a new feature, ‘Quick Recovery Machine’ (QRM), to automate the recovery of Windows systems that cannot boot. This initiative, part of the wider ‘Windows Resilience’ programme, aims to mitigate the effects of widespread boot failures, a problem highlighted by a significant incident last year.

Automatic recovery in the spotlight

QRM works by monitoring the Windows start-up process and detecting critical errors that prevent a successful boot. When such an error occurs, the system automatically restarts in a Windows recovery environment. Traditionally, this environment often requires manual intervention, which is a cumbersome process, especially in the event of a large-scale failure.

Microsoft’s innovation is to automate the recovery process in this environment. Affected systems will now attempt to connect to Microsoft servers via Wi-Fi or Ethernet. This connection enables Microsoft to analyse failure data and identify recurring patterns. If a common cause is detected, Microsoft can develop and deploy a recovery package. Affected systems can then download this package directly from the recovery environment and attempt to repair themselves.

Reaction to the Crowdstrike incident

The development of QRM appears to be a direct response to the challenges posed by incidents such as the one that occurred on 19 July last year. At that time, a faulty software update from Crowdstrike’s Sensor Tower prevented some 8.5 million Windows PCs from booting. The lack of an automated recovery mechanism forced manual intervention to restore the affected systems.

Ad imageAd image

With QRM, Microsoft envisions a more streamlined response to similar incidents. In a scenario with Crowdstrike, affected machines would automatically contact Microsoft, allowing the company to quickly develop and deploy a recovery package to roll back the problematic update. This would significantly reduce downtime and the burden on IT administrators.

Phased implementation and control

Microsoft is currently testing QRM with Windows Insiders in the Windows 11 24H2 Beta channel. The company plans to eventually enable the feature by default for Windows 11 Home users. Importantly, Microsoft is aware of the need for control in enterprise environments. Windows 11 Pro and Enterprise customers will retain the ability to enable or disable QRM, both at the local administrator level and at the organisation level. This flexibility addresses concerns about data privacy and the need for IT departments to maintain control over system recovery processes.

Implications and considerations

QRM represents a significant step towards improving the resilience of the Windows ecosystem. By automating the recovery process, Microsoft aims to reduce downtime and minimise the impact of common boot failures.

Udostępnij