Azure Data Lake Service for Big Data Analyses Now Available -- Redmondmag.com

Azure Data Lake Service for Big Data Analyses Now Available

By Kurt Mackie
11/17/2016

Microsoft this week released Azure Data Lake as a generally available (GA), production-ready service, backed by Microsoft's 99.9 percent service-level agreement.

Azure Data Lake is a service for "Big Data" massively parallel types of analyses, with the ability to tap into pools of structured and unstructured data without limits. The service has been at the preview stage since November of 2015, according to a Microsoft Channel 9 presentation, so it's taken one year to arrive fully baked.

Microsoft is marketing the Azure Data Lake service as enabling "Big Cognition." The idea is glean insights from multiple inputs of various data types. It's about "joining all the extracted cognitive data with other types of data, so you can do some really powerful analytics with it," according to a Microsoft announcement.

Azure Data Lake Components
Azure Data Lake is composed of three Azure services, according to the presentation. It has HDInsight, which is Microsoft's Hadoop-based Big Data service. Another component is the new Data Lake Store (GA this week), a repository for structured and unstructured data that can scale to meet developer needs. Lastly, there's the new Data Lake Analytics (GA this week), which permits users to run "massively parallel data transformation and processing programs in U-SQL, R, Python and .NET over petabytes of data," per the announcement.

The overall system is based on the open Apache Hadoop Distributed File System. Microsoft illustrates the Azure Data Lake components in the following diagram:

**[Click on image for larger view.]** Azure Data Lake components. Source: Microsoft blog post.

The U-SQL query language is frequently mentioned in Microsoft's Azure Data Lake announcements. It may seem new, but Microsoft has been using U-SQL internally since 2008, according to the Channel 9 presentation. U-SQL is supported in Azure Data Lake Tools for Visual Studio Code at the preview stage and "combines the declarative advantage of T-SQL and extensibility of C#."

A Reddit Ask Me Anything session conducted by Microsoft Azure team members this week offered an additional definition of U-SQL, as follows:

It [U-SQL] has an official meaning which is that it unifies:

structured and unstructured data processing

declarative SQL with user code (written in C#, Python, R etc.)

querying data in Azure Data Lake with querying data from Windows Azure Blog Store, SQL Server in Azure

The inofficial meaning is that you need a submarine to explore the depth of your data lake and discover your treasures. And in German and Swedish, submarines are called U-Boot :).
It is not true that it is called U-SQL because the U comes after the T.

Developers love the U-SQL query language and pick it up very fast, according to Microsoft's presentation. Microsoft offers a tutorial here.

Spinning up an Azure Data Lake workload takes "30 seconds," Microsoft claims. Essentially, Microsoft takes care of managing the cluster for developers or data scientists. Additionally, Data Lake Store is designed to have "no limits" on the data size or the number of files or objects used in the analysis. No repartitioning of the data is required to run analyses. Developers don't have to define a schema up front.

In addition, Azure Data Lake is integrated with Azure Active Directory. It has role-based access controls over the Data Lake Store via "POSIX-based ACLs for all data" or "Apache Ranger in HDInsight," Microsoft's announcement explained. Organizations have single sign-on and multifactor authentication access options. Data are encrypted at rest via the service or Azure Key Vault. Data are encrypted in motion using the Secure Sockets Layer protocol.

Azure HDInsight Support
Microsoft also announced some additions to Azure HDInsight this week. One addition is R Server for HDInsight, which is now generally available. It's Microsoft's implementation of the R programming language "integrated with Spark clusters created from HDInsight," which can process terabytes of data. When it is run on Apache Spark, R Server "enables handling up to 1000x more data and up to 50x faster speeds than open source R," Microsoft claimed. A new capability is the ability to work with Spark SQL data sources. It's supported for developers and data scientists with an included R Studio Server Community Edition, but they can also use R Tools for Visual Studio.

Microsoft also issued a preview of the open source Kafka for HDInsight. It's designed to enable the ingestion of "massive amounts of real-time data." It can be used for operations such as "fraud detection, click-stream analysis, financial alerts, or social analytics solutions," the announcement indicated. It's also designed to work with Storm for HDInsight or Spark Stream for HDInsight stream analytics solutions.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

Featured

Microsoft Expands Defender Experts With New Threat Intelligence and Multicloud Coverage

Microsoft on Wednesday introduced a threat intelligence service and expanded its managed detection and response offering as the company looks to help security teams face growing volume of threat data into specific defensive actions.
What Happens When Malware Outlives its Intended Lifespan, Part 1?

Aging malware can remain dangerous long after its creators move on, leaving victims with fewer protections and no reliable recovery path.
Microsoft, 3M Partnership Targets AI Infrastructure and Enterprise Transformation

Microsoft and 3M on Wednesday announced a wide-ranging partnership that links two major areas of enterprise AI investment: the infrastructure needed to support AI data centers and the use of AI to modernize large organizations.
Microsoft's Record July Patch Tuesday Fixes 570 Flaws, Including Two Exploited Zero-Days

Microsoft's July Patch Tuesday release broke the record for a second straight month, delivering fixes for roughly 570 holes across Windows, SharePoint, Microsoft 365, Azure and others.
Why Most Backup Success Metrics Are Meaningless

Traditional backup metrics can show perfect health while failing to reveal whether critical workloads can actually be restored.

comments powered by Disqus

Subscribe on YouTube

Office 365 Watch

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 3-7, 2026

Virtual Hands-on Training Seminar: PowerShell Mastery Workshop: From Fundamentals to Advanced Automation
September 9-10, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

TechMentor Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

Virtual Hands-on Training Seminar: AI-Powered PowerShell and Infrastructure Automation with Claude Code
December 10-11, 2026

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 9-13, 2027

Webcasts

More Webcasts

Whitepapers

More Tech Library