The Schwartz Report

Blog archive

Microsoft Readies Big Data Service That Includes New U-SQL Language

Earlier this year, Microsoft revealed plans to offer a new HDFS-compatible Hadoop File System data store that could run large analytics workloads called Azure Data Lake. So far, the technical preview hasn't appeared but the company today reiterated that the service, which it will actually call Azure Data Lake Store, will be available later this year and also announced some new services planned for its Azure-based Big Data portfolio.

Microsoft describes the Azure Data Lake Store as a single repository that lets users capture data of any size or format without requiring changes to the application as data scales. Data can be securely stored and shared and can be processed and queried from HDFS-based applications and tools, said T. K. "Ranga" Rengarajan, Microsoft's corporate vice president for data platform, in a blog post today outlining the new Azure Data Lake Store.

"Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages," Rengarajan said. "It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.  Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications."

Complementing Azure Data Lake Store, Microsoft announced its new Azure Data Lake Analytics, an Apache YARN-based service that's designed to dynamically scale to handle large big data workloads. The new Azure Data Analytics service will be based on U-SQL, a language that will "unify the benefits of SQL with the power of expressive code," Rengarajan said. "U-SQL's scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse."

In a MSDN blog post today, Michael Rys, a principal program manager for big data at Microsoft, explained why U-SQL is suited for Azure Data Lake Analytics:

Taking the issues of both SQL-based and procedural languages into account, we designed U-SQL from the ground-up as an evolution of the declarative SQL language with native extensibility through user code written in C#. This unifies both paradigms, unifies structured, unstructured, and remote data processing, unifies the declarative and custom imperative coding experience, and unifies the experience around extending your language capabilities.

U-SQL is built on the learnings from Microsoft's internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive. For example, we base our SQL and programming language integration and the execution and optimization framework for U-SQL on SCOPE, which currently runs hundred thousands of jobs each day internally. We also align the metadata system (databases, tables, etc.), the SQL syntax, and language semantics with T-SQL and ANSI SQL, the query languages most of our SQL Server customers are familiar with. And we use C# data types and the C# expression language so you can seamlessly write C# predicates and expressions inside SELECT statements and use C# to add your custom logic. Finally, we looked to Hive and other Big Data languages to identify patterns and data processing requirements and integrate them into our framework.

Microsoft also announced the general availability of managed clusters for its Azure HDInsight service on Linux, which the company claims has a 99.9 percent uptime SLA. The company also is offering Azure Data Lake Tools for Visual Studio and said that ISV solutions can be offered in the Azure Marketplace.

Posted by Jeffrey Schwartz on 09/28/2015 at 2:01 PM


Featured

  • Microsoft Nabs IoT Platform Provider Express Logic

    As part of its plan to invest $5 billion in IoT technologies, Microsoft this week acquired Express Logic, which provides real-time operating systems for industrial embedded and IoT devices.

  • Dealing with Broken Dependencies in SCVMM

    Brien shows you how to resolve some broken, template-related dependencies in Microsoft's System Center Virtual Machine Manager.

  • AzCopy Preview Adds AWS S3 Data Transfer Improvements

    Microsoft announced this week that it has improved the preview version of its AzCopy tool to better handle Amazon Web Services (AWS) S3 data.

  • Microsoft Adding Google G Suite Migration in Exchange Admin Center

    Microsoft's Exchange Admin Center will be getting the ability to move Google G Suite calendar, contacts and e-mail data over to the Office 365 service "in the coming weeks."

comments powered by Disqus

Office 365 Watch

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.