Configuration Glitch Caused Microsoft's Jan. 25 Exchange Online Disruption
Microsoft this week described a five-hour service incident that occurred on Wednesday Jan. 25, which was said to have affected its Exchange Online service, as well as other Microsoft 365 services.
The disruption apparently was a different problem from a near three-hour disruption that affected Microsoft 365 and Power BI services, which started on Tuesday Jan. 24. The Jan. 24 problem was said to have been associated with a Microsoft Wide Area Network glitch.
The Wednesday Jan. 25 disruption, reported as "MO502273," had occurred because of "networking configuration issues," Microsoft indicated:
We've isolated the problem to networking configuration issues, and we're analyzing the best mitigation strategy to address these without causing additional impact. Refer to the admin center MO502273 or https://msft.it/6018eAldp for more information.
Microsoft typically just provides details on its service outages to IT pros having certain roles permissions in organizations. However, many people commenting regarding Microsoft's Wednesday Twitter announcement indicated that they had problems accessing the [Exchange] Admin Center, where Microsoft's MO502273 message describing the problem could be found.
Digital experience monitoring company Exoprise, which uses "headless browsers" to track service health, described the Jan. 25 MO502273 incident as "a networking outage that took down Azure and other services like Teams and Outlook, affecting millions of users globally," in a Jan. 25 Exoprise blog post.
According to Exoprise's data, there was a Microsoft Teams outage that started at 2:21 a.m. on Wednesday, adding that its tracking began "at least 1 hour 28 mins before Microsoft diagnosed the root cause."
Microsoft suggested that various Microsoft 365 services may have been affected by the incident, and not just Exchange Online. It eventually reported at 9:28 a.m. on Wednesday that the "majority of impacted services have been recovered and remain stable," according to Exoprise's captured data.
The MO502273 problem was due to "networking configuration issues" associated with a "wide-area network (WAN) routing change." Here's Microsoft's chronology, which suggested that the incident was resolved after more than five hours:
Start time: Wednesday, January 25, 2023, 2:05 AM (7:05 AM UTC) End time: Wednesday, January 25, 2023, 7:43 AM (12:43 PM UTC) Preliminary root cause: A wide-area networking (WAN) routing change resulted in users being unable to access multiple Microsoft 365 services. We'll publish a post-incident report within five business days.
Exoprise uses headless browsers (called "synthetics") to gather service health information. For more on how that's done, read this 2021 Redmond interview with Exoprise's founder and president.
Kurt Mackie is senior news producer for 1105 Media's Converge360 group.