Microsoft Releases Windows HDInsight 'Previews' for Hadoop

Microsoft and its partner Hortonworks took a step forward this week in enhancing Hadoop "big data" functionality via Windows and Windows Azure.

Preview test releases of the new Microsoft HDInsight Server for Windows, as well as the new Windows Azure HDInsight Service, were made available on Wednesday and were announced as part of the Strata Conference + Hadoop World conference in New York City. The server preview is currently available for download, whereas the service preview requires an online sign-up to get an invitation to try it free for five days. Both previews can be accessed via Microsoft's big data portal page here.

Hadoop is an Apache Software Foundation-sponsored open source framework for MapReduce, which supports scale-out data processing across clusters using piles of unstructured and structured data. Companies such as Yahoo have long been using Hadoop to crunch through user clickstream data to glean marketing insights. Microsoft's partner, Hortonworks, sells open source Hadoop products and was formed by software engineers on the Yahoo Hadoop team. Microsoft and Hortonworks plan to contribute all of the code they have developed for the Windows and Windows Azure HDInsight previews back to the open source community, according to a Hortonworks Web page.

Microsoft integrated the Hortonworks Data Platform 1.1 with the previews, which are designed to make it easier for organizations to use Microsoft's familiar business intelligence tools to get insights from masses of Hadoop data. Those Microsoft tools include PowerPivot for Excel and Power View for SharePoint Services, which can display Hadoop query results. The tools also work when used to pull Hadoop data from Linux servers, according to Hortonworks. Power View allows users to visualize data on an ad hoc query basis, while PowerPivot is for structuring multidimensional data using Excel's workbooks.

HDInsight Server has a bidirectional connector that allows users to move data between Hadoop and Microsoft SQL Server 2012 or SQL Server 2012 Parallel Data Warehouse for analyzing structured data. Microsoft's StreamInsight technology can be used with Hadoop and SQL Server to manage streaming data, such as in complex event processing.

Microsoft has also collaborated on management integration with its System Center products. It's possible to use Apache Ambari with System Center to manage Hadoop clusters alongside other computing environment assets. Ambari is "a web-based tool for installing, managing, and monitoring Apache Hadoop clusters," according to the Apache project description. Security also has been enhanced via integration with Microsoft Active Directory in Windows Server.

The Windows Azure HDInsight Service includes "stable versions" of the Hadoop Distributed File System (HDFS) and MapReduce, as well as Hive, Pig and Sqoop, according to Matt Winkler, a principal program manager at Microsoft, in a blog post. Microsoft has also delivered a .NET software developer kit that can be used with Visual Studio to write Hadoop jobs.

Winkler explained that developers can use JavaScript to write MapReduce jobs, as well as Hive and Pig queries, and that using JavaScript allows the data to be viewed using a Web browser. Developers using JavaScript can reduce their code by "up to 10 times compared to Java," according to Microsoft, as described in its brochure, "From Data to Insights."

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.


comments powered by Disqus

Subscribe on YouTube