Microsoft Readies Big Data Service That Includes New U-SQL Language
Earlier this year, Microsoft revealed plans to offer a new HDFS-compatible Hadoop File System data store that could run large analytics workloads called Azure Data Lake. So far, the technical preview hasn't appeared but the company today reiterated that the service, which it will actually call Azure Data Lake Store, will be available later this year and also announced some new services planned for its Azure-based Big Data portfolio.
Microsoft describes the Azure Data Lake Store as a single repository that lets users capture data of any size or format without requiring changes to the application as data scales. Data can be securely stored and shared and can be processed and queried from HDFS-based applications and tools, said T. K. "Ranga" Rengarajan, Microsoft's corporate vice president for data platform, in a blog post today outlining the new Azure Data Lake Store.
"Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages," Rengarajan said. "It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications."
Complementing Azure Data Lake Store, Microsoft announced its new Azure Data Lake Analytics, an Apache YARN-based service that's designed to dynamically scale to handle large big data workloads. The new Azure Data Analytics service will be based on U-SQL, a language that will "unify the benefits of SQL with the power of expressive code," Rengarajan said. "U-SQL's scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse."
In a MSDN blog post today, Michael Rys, a principal program manager for big data at Microsoft, explained why U-SQL is suited for Azure Data Lake Analytics:
Taking the issues of both SQL-based and procedural languages into account, we designed U-SQL from the ground-up as an evolution of the declarative SQL language with native extensibility through user code written in C#. This unifies both paradigms, unifies structured, unstructured, and remote data processing, unifies the declarative and custom imperative coding experience, and unifies the experience around extending your language capabilities.
U-SQL is built on the learnings from Microsoft's internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive. For example, we base our SQL and programming language integration and the execution and optimization framework for U-SQL on SCOPE, which currently runs hundred thousands of jobs each day internally. We also align the metadata system (databases, tables, etc.), the SQL syntax, and language semantics with T-SQL and ANSI SQL, the query languages most of our SQL Server customers are familiar with. And we use C# data types and the C# expression language so you can seamlessly write C# predicates and expressions inside SELECT statements and use C# to add your custom logic. Finally, we looked to Hive and other Big Data languages to identify patterns and data processing requirements and integrate them into our framework.
Microsoft also announced the general availability of managed clusters for its Azure HDInsight service on Linux, which the company claims has a 99.9 percent uptime SLA. The company also is offering Azure Data Lake Tools for Visual Studio and said that ISV solutions can be offered in the Azure Marketplace.
Posted by Jeffrey Schwartz on 09/28/2015 at 2:01 PM