Azure Virtual Machine Users To Get Failure Analysis Reports
Microsoft this week announced emerging efforts to better explain Azure virtual machine failures for customers via coming root cause analysis (RCA) reports.
Currently, IT pros get Azure virtual machine downtime information within the Azure Portal using its "Azure resource health" screen. This screen, with green and red indicators, shows the status of an Azure resource that's either "available" or experiencing problems. Microsoft defines an Azure resource as a "specific instance of an Azure service, such as a virtual machine, web app, or SQL Database," per this Microsoft document.
The Azure resource health screen is there so that IT pros can see "if an SLA [service-level agreement] was violated," the document added. SLAs are an assurance of uptime for organizations. Microsoft's SLAs sometimes offer service credits if organizations had no other recourse to avoid a service downtime.
Root Cause Analysis Reports
Microsoft plans to bolster the Azure resource health screen by also supplying RCA explanations of Azure virtual machine failures. The addition of RCA explanations was described as adding a new capability.
"While the initial downtime notification functionality has existed for several years, the publishing of a root cause statement is a new addition," explained Nick Swanson, a principal software engineering manager at Microsoft, in the announcement.
Microsoft is currently taking steps to surface these RCAs, which will be sharable with coworkers and customers by e-mail, Swanson noted. It's also aiming to make it possible to subscribe to the RCA announcements.
Based on Azure Data Explorer Service
The engine for generating the RCAs is the Azure Data Explorer service, which processes a high volume (terabytes) of log information from the Azure platform, Swanson explained.
To generate the RCAs, additional steps are needed. First, the downtime needs to be detected. Second, information from other dependent systems needs to be correlated. Microsoft next uses "machine learning and anomaly detection mechanisms" to attribute a cause for the failure. Lastly, Microsoft uses its Azure Functions solution to update the results that get published to the Azure resource health display, as well as to publish the RCAs.
Swanson did not say when the RCAs for Azure virtual machines would be available. When available, though, Microsoft intends to "make it possible to sign up for RCAs directly in your inbox."
Kurt Mackie is senior news producer for 1105 Media's Converge360 group.