The Schwartz Report

Blog archive

The Irony Behind the Windows Azure Meltdown

Just as I was getting ready to call it a week late Friday afternoon, Microsoft's Windows Azure cloud storage service went down worldwide. As I reported, Windows Azure storage was unavailable because of an expired SSL certificate.

The global outage of Windows Azure late Friday into Saturday is ironic, considering the release of last week's study that Windows Azure storage offered the fastest response times of five large cloud networks -- namely those operated by Amazon Web Services, Google, HP and Rackspace. Good thing for Microsoft that Nasuni, the vendor that ran the shootout, wasn't testing Windows Azure at that time.

Once the service was back up Saturday, I posted an update noting that Microsoft had fixed the problem and users could once again access their data. The company said it was 99 percent available early Saturday and completely restored by 8 p.m. PST. But the damage was already done and many customers and partners were furious.

In comments posted on a Windows Azure forum, Sepia Labs' Brian Reischl, who first pointed to the SSL certificate as the likely culprit, seemed to feel users should cut Microsoft some slack. Reischl said letting an SSL certificate fall through the cracks is a mistake anyone could make. "I know I have. It's easy to forget, right?," he posted. "It's an amateur mistake, but it happens. You end up with some egg on your face, add a calendar reminder for next year, and move on."

But one has to wonder how Microsoft, which has staked its future on the cloud and has spent billions to build Windows Azure into one of the largest global cloud services, could not have put in safeguards to prevent the domino effect that occurred when that cert expired, much less having a mechanism in place to know when all certificates are about to expire. Putting it in admins Outlook calendars would be a good start.

Of course there are more sophisticated tools to make sure SSL certificates don't expire. Among them are Solar Winds' certificate monitoring and expiration management component of its Server & Application Monitor, a Redmond reader favorite. Another option not so coincidently hit my inbox this morning. Matt Watson founder of Stackify, spent a few hours over the weekend developing a free tool called CertAlert.me, which allows a site owner to scan the Web sites its owns and track SSL and domain name expirations.

"It happens a lot," Watson told me in a brief telephone conversation regarding outages such as the one that struck Friday, which affected Stackify.  "All you can do is sit on your hands and pray," he said, adding years ago he had to deal with an expired SSL certificate. "You buy them and you forget about them and the next thing you know your site's gone. It's one of those things that get overlooked."

Asked what's the business opportunity for offering this free service, Watson said he saw it as an opportunity to bring exposure to the startup's namesake offering, a Windows Azure-based server monitoring platform targeted at easing access for developers while ensuring they don't have access to production systems.

Indeed you can bet Microsoft is going to ensure it doesn't happen. "Our teams are also working hard on a full root cause analysis (RCA), including steps to help prevent any future reoccurrence," said Steven Martin, Microsoft's general manager of Windows Azure business and operations, in a blog post apologizing for the disruption. Given the scope of the outage, Microsoft will offer credits in conformance with its SLAs, Martin said.

This is not the first outage Microsoft has had to explain and probably won't be the last. And we all know the number of well-publicized outages Amazon Web Services has encountered in recent years.

If you're a Windows Azure customer, did last week's slipup erode your confidence in storing your data in Microsoft's cloud? Drop me a line at [email protected]

Note: This post was updated to clarify hat the Windows Azure outage affected Stackify. 

Posted by Jeffrey Schwartz on 02/25/2013 at 1:15 PM


Featured

  • How To Configure Windows 10 for Intel Optane Memory

    Intel's Optane memory technology can significantly improve the performance of your Windows 10 system -- provided you enable it correctly. A single mistake can render the system unbootable. Here's how to do it the right way.

  • Microsoft and SAP Enhance Partnership with Teams Integration

    Microsoft and SAP this week described continuing partnership efforts on Microsoft Azure, while also planning a Microsoft Teams integration with SAP's enterprise resource planning product and other solutions.

  • Blue Squares Graphic

    Microsoft Previews Azure IoT Edge for Linux on Windows

    Microsoft announced a preview of Azure IoT Edge for Linux on Windows, which lets organizations tap Linux virtual machine processes that also work with Windows- and Azure-based processes and services.

  • How To Automate Tasks in Azure SQL Database

    Knowing how to automate tasks in the cloud will make you a more productive DBA. Here are the key concepts to understand about cloud scripting and a rundown of the best tools for automating code in Azure.

comments powered by Disqus