The Schwartz Report

Blog archive

Microsoft Extends Open Source Support for Big Data Services

Microsoft CEO Satya Nadella "loves" Linux. So it should come as little surprise that Microsoft is planning to support its Azure HDInight big data analytics offering on the open source server platform. The company announced the preview of HD Insight on Linux at the Strata + Hadoop World conference in San Jose, Calif. Microsoft also announced the release of Azure HD Insight running Storm, the popular Apache streaming analytics platform for streaming analytics.

The open source extensions aim to widen Microsoft's footprint in the growing market for big data services, enable users to gather more information that they can parse and analyze to make better decisions and bring big data into mainstream use, as Microsoft has indicated with its development of Cortana, now available on Windows Phone and in beta on Windows 10.

In addition to the public preview of HDInsight on Linux and general availability of Apache Storm for HDInsight, Microsoft announced Hadoop 2.6 support in HDInsight, new virtual machine sizes, the ability to grow or reduce clusters running in HDInsight and a Hadoop connector for DocumentDB.

"This is particularly compelling for people that already use Hadoop on Linux on-premises like on Hortonworks Data Platform because they can use common Linux tools, documentation and templates and extend their deployment to Azure with hybrid cloud connections," said T. K. "Ranga" Rengarajan, corporate vice president for Microsoft's Data Platform and Joseph Sirosh, corporate vice president for Machine Learning, in a blog post.

Support for Storm is also another key advance for Microsoft as it has emerged as a widely adopted open source standard for streaming analytics. "Storm is an open source stream analytics platform that can process millions of data 'events' in real time as they are generated by sensors and devices," according to Ranga. "Using Storm with HDInsight, customers can deploy and manage applications for real-time analytics and Internet-of-Things scenarios in a few minutes with just a few clicks."

Despite its open source push, Microsoft isn't part of the Open Source Platform Alliance that was announced this week to ensure an interoperable Apache Hadoop core.  Among those on board are GE, Hortonworks, IBM, Infosys, Pivotal, SAS, Altiscale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata and VMware.

Asked why, a Microsoft spokeswoman stated, "Microsoft is already partnered with Hortonworks to use HDP which will utilize the Hadoop core from the Open Data Platform Initiative moving forward. We also will continue to contribute to the broader Apache Hadoop ecosystem." The statement also offered support for the project. Microsoft sees the Open Data Platform Initiative as a good step forward to having everyone run on the same Hadoop core including HDFS, YARN and Ambari. "We see standardization in the Hadoop space as a good thing as it reduces fragmentation and makes adoption of the technologies easier."

In addition, Microsoft is focused on contributing Hadoop projects like Hive (Project Stinger, Tez), YARN, REEF and others, as well as partnering with Hortonworks, she said. "We see this Open Data Platform Initiative as complimentary to these efforts and will help the overall Hadoop ecosystem."

Posted by Jeffrey Schwartz on 02/20/2015 at 12:20 PM


comments powered by Disqus

Subscribe on YouTube