SQL Server 2008 R2 PDW To Get Hadoop Support
Microsoft today announced an update to SQL Server 2008 R2 Parallel Data Warehouse (PDW) with other related news, including planned Hadoop support.
The second Appliance Update for PDW was released today. This Appliance Update 2, which is both a software and hardware refresh, adds feature improvements and extended interoperability, based on customer requests, according to Microsoft's announcement. It follows Appliance Update 1, which was released four months earlier.
The list of new features in Appliance Update 2 includes support for "multi-statement batches, T-SQL variables, temporary tables, conditional logic and control flow statements," according to the announcement. New connectors were added for interoperability with Informatica systems and SAP Business Objects systems. MicroStrategy has rolled out some certified connectors for PDW too, according to Microsoft.
Microsoft also plans to deliver on Hadoop interoperability, which its customers have requested. Hadoop is an open source Apache project that provides a framework for running applications on clusters using commodity hardware, according to the Apache wiki description. Microsoft, for its part, plans to release two Community Technology Previews (CTPs) of Hadoop connectors. One CTP will showcase a Hadoop connector for SQL Server, with the other demonstrating a Hadoop connector for PDW. The connectors will enable the transfer of data between Hadoop and Microsoft's two SQL Server products, supporting both structured and unstructured data. No indication was provided about when these CTPs will be launched.
Finally, Microsoft announced a new hardware vendor product release implementing PDW. Dell apparently released its Parallel Data Warehouse appliance today, using Microsoft's PDW software, although no Web site describing the product was apparent. Microsoft's announcement indicated that Dell's PDW appliance can scale beyond 600 terabytes at a cost starting at $12 per terabyte.
PDW is a latecomer to Microsoft's SQL Server 2008 R2 product family. It was known earlier by its code name "Madison." Microsoft took time to incorporate massive parallel processing technology into the product based on the technology it acquired when it bought DatAllegro Inc. HP was the first hardware vendor to roll out PDW on metal, which happened back in December.
PDW is a solution for increasing the scale of operations when traditional clustering technology hits the wall. That limit starts at around 100 terabytes of data, according to James Kobielus, a senior analyst with Forrester Research.
Kobielus, who tracks enterprise data warehousing solutions in a first-quarter Forrester Wave report, has lots to say about Hadoop in a series of blog posts. He noted on June 3 that EMC Greenplum and IBM have already incorporated support for Hadoop "into the core of their solution portfolios," with only partial support shown by some of the other enterprise data warehousing vendors. In his June 9 post, Kobielus predicts that Hadoop will be ready to meet enterprise needs in about two to three years, but it will have to overcome a hype cycle and a possible backlash to the "big data" concept.
Still, Hadoop will gain enterprise traction in the long term, he suggests.
"No kidding. Hadoop is the core platform for Big Data, and it's a core convergence focus for enterprise application, analytics, and middleware vendors everywhere," Kobielus wrote in a June 8 blog post.
Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.