Windows Azure HDInsight Now Available
Microsoft on Monday announced the "general availability" of Windows Azure HDInsight, the company's cloud-based distribution of Hadoop.
Hadoop is an Apache open source project for so-called "big data" implementations using MapReduce technology. It uses clusters of commodity hardware to analyze heaps of structured or unstructured data, typically at large scale. Hadoop is used by companies to query petabytes of data to gain insights, such as sifting through clickstream Web data. The standard was originally fostered by Google, and further implemented by Yahoo and Facebook.
For its part, "Microsoft recognizes Hadoop as a standard," according to a blog post by Quentin Clark, corporate vice president of the Data Platform Group at Microsoft. The company worked with its partner, Hortonworks, to make HDInsight "100 percent Apache Hadoop offered as an Azure cloud service," according to Clark.
Multiple vendors offer various Hadoop-related products, but Microsoft is claiming full Hadoop compatibility with its HDInsight product. "HDInsight builds on top of the Hortonworks Data Platform (HDP) to ensure full compatibility with Apache Hadoop," Microsoft's SQL Server blog claimed. Hortonworks, in addition to partnering with Microsoft, offers its own Hadoop product, which includes Pig, Hive and Sqoop components, among others. Its HDP 2.0 product will be available for Windows next month, according to Clark.
Microsoft stopped developing its own "HDInsight Server for Windows" product in favor of Hortonworks' product, according to a post by veteran Microsoft watcher Mary Jo Foley.
Until today, Microsoft had just offered HDInsight as a preview test release. The product supports both Java and .NET, and Microsoft has released an SDK that allows dev-ops to use Visual Studio to write their Hadoop jobs. Early reviews, though, described the dev experience as a little bumpy. HDInsight supports PowerShell scripting and can tap the LINQ-to-Hive query language. Hive is an Apache project for data warehouses running atop Hadoop that supports querying and managing large data sets.
HDInsight notably works with Microsoft's Power BI tools. However, that's done by taking a small segment of Hadoop data and putting it into SQL Server. The amount of data that can be analyzed by Excel, one of Power BI's principal tools, is limited to 250 MB, which is a limitation of Excel's workbook size, Microsoft explained back in August.
Microsoft's HDInsight product rollout today is corresponding with opening of the Strata + Hadoop World event, which is ongoing this week in New York City. Clark is scheduled to speak at Strata + Hadoop World on Tuesday, and his talk will be streamed at 9:35 a.m. Eastern Time here.
Kurt Mackie is online news editor for the 1105 Enterprise Computing Group.