Will Microsoft Stand Out In the Big Data Fray?
Microsoft took a key step forward in its quest to bring "big data" to the cloud this week when it released the public preview of its Windows Azure HDInsight offering. The cloud-based service, first made available on a limited basis last fall, aims to let enterprise customers process huge volumes of structured and unstructured data using Microsoft's SQL Server and the Hortonworks distribution of the Hadoop file store.
A growing number of organizations have started using file management systems based on the Apache open source Hadoop Distributed File System (HDFS). The Java-based file repository lets users store huge volumes of unstructured information in massively distributed clusters based on commodity servers. Using query tools, users can rapidly find and access that content.
Organizations of all types -- law enforcement agencies, retailers, financial services firms and health care providers, among others -- are rapidly gravitating to Apache Hadoop to store information gathered from sources such as social media, news feeds and user-generated content in order to determine trends to deliver insights and intelligence in near real time.
With scores of startups and established players all jumping on the Hadoop bandwagon, Microsoft has hitched its wagon with the Hortonworks distribution, which the company emphasizes is 100 percent Apache-compatible. The HDInsight service in Windows Azure lets organizations spin up Hadoop clusters in Windows Azure in a matter of minutes, noted Eron Kelly, general manager for Microsoft's SQL Server group, in a blog post this week.
"These clusters are scaled to fit specific demands and integrate with simple Web-based tools and APIs to ensure customers can easily deploy, monitor and shut down their cloud-based cluster," Kelly noted. "In addition, [the] Windows Azure HDInsight Service integrates with our business intelligence tools including Excel, PowerPivot and Power View, allowing customers to easily analyze and interpret their data to garner valuable insights for their organization."
Among the first to test HDInsight is Ascribe, a U.K.-based Microsoft partner that provides healthcare management systems for hospitals and large medical practices. Its solution handles the lifecycle of patient care using key new components of Microsoft's portfolio including Windows 8-based tablets, SQL Server 2012 and HDInsight to perform trending analysis using anonymous patient data.
Paul Henderson, Ascribe's head of business intelligence, demonstrated the application at the GigaOM Structure Data conference in New York this week. "Rather than building our own server farm or incurring huge capital costs, HDInsight provides us with the ability to process that volume of stuff at scale and that is a huge benefit," Henderson told me after the demonstration.
But at the Structure Data conference, there were scores of other players talking up new ways of capturing, analyzing and processing huge amounts of data. While Microsoft once only had to worry about players like Oracle, IBM and Teradata, now there are a vast number of players looking to offer modern alternatives to traditional SQL database stores.
For example, a growing number of customers are using NoSQL databases such as those based on MongoDB (the leading player here is 10gen) to store data in the cloud, as well as a number of other approaches I'll touch upon in future posts. "The majors, as we may call them, Amazon, Google and Microsoft all have multiple plays going on in the cloud database world," noted Blue Badge Insights analyst Andrew Brust, who was on a cloud database panel at Structure Data.
Despite the growing number of players and approaches, Brust believes many customers will look for the mainstream providers to embrace them. "We're seeing specialized products from specialized companies doing things that the major databases have glossed over," Brust said. "That's great, but when it's going to really become actionable for companies is when the mega-vendors either implement this stuff themselves or do some acquisitions and bring these capabilities into their mainstream databases that have the huge installed bases, then it becomes a lot more approachable to enterprise companies."
Noted cloud analyst David Linthicum, also on the panel was more skeptical. "It pushes them to be more innovative but I haven't seen much innovativeness come out of these larger database organizations in the last couple of years," Linthicum said.
As for Microsoft, the company is addressing growing demand for in-memory databases, brought to the mainstream last year by SAP with HANA. In-memory databases can perform queries much faster than those written to disk. Microsoft revealed its plans to add in-memory capabilities to the next release of SQL Server, code-named Hekaton, at the SQL Pass Summit back in November.
"This is a separate engine that's in the same product in a single database and will have tables optimized for either the conventional engine or the in memory engine," Brust said. "You can join between them so you are going more towards an abstraction."
But with a growing number of startups looking to re-invent the data repository, such as NuoDB, Hadapt and the new Pivotal initiative from EMC, Microsoft is now in a more crowded field. While Microsoft has broadened its data management portfolio with SQL Azure and now HDInsights, the requirement to find, process and analyze new types of information is greater than ever. All eyes will be on Hekaton and Microsoft's ability to deliver new levels of performance to SQL Server.
Posted by Jeffrey Schwartz on 03/22/2013 at 1:15 PM