Joey on SQL Server
Build 2023: Meet Fabric, the AI-fication of Microsoft's Data Business
Microsoft is transforming its data portfolio for its Copilot era. Here's what that means for DBAs.
- By Joey D'Antoni
|• Microsoft's Fabric is a comprehensive data analytics platform that integrates various data tools into a single SaaS offering, aiming to eliminate data silos and promote data sharing within organizations.
• Fabric leverages the concept of data fabric, combining modern trends like data lakes, the delta store, and parquet file formats, presented through a set of standard APIs.
• Fabric incorporates AI capabilities, including Power BI Copilot for DAX language and natural language query functionality. It also introduces Data Activator, a service that monitors data changes and triggers actions, resembling Logic Apps or Power Automate.
Microsoft introduced a brand-new data analytics platform on Tuesday, during Day 1 of the 2023 Microsoft Build event in Seattle, Wash.
Microsoft Fabric, previously called "Project Trident," is a new product built around Microsoft's data platform tools like Azure Data Factory, Azure Synapse Analytics, Power BI and some new services like OneLake. Fabric aims to be an end-to-end platform that combines some of the best data tools Microsoft has and integrates them into a single software-as-a-service (SaaS) offering. Microsoft built Fabric around the data fabric concept -- a new wave in data analytics that blends many modern trends like data lakes, the delta store and parquet file formats, presented behind a standard set of APIs.
Microsoft has been investing heavily in this space for years, going back a decade or more to PerformancePoint and then the VertiPaq engine that became the backbone of so many of Microsoft's business intelligence efforts. Power BI has been a very successful product and tightly integrates with the rest of the Microsoft 365 suite.
In my opinion, Fabric is somewhat similar conceptually to the late Big Data Clusters concept in that it supports Spark data engineering and T-SQL for querying at scale against any of its sources. What's the big difference? BDC required the customer to deploy complicated infrastructure, while the notion of Fabric is that it is an easily consumed SaaS offering. The other difference is that I see more investment from Microsoft in Fabric.
The driving technology platform behind Fabric is OneLake, shown above. All of the related compute engines -- whether it's Spark, SQL or Power BI -- store their data in this storage pool in delta file format, allowing for data updates in a transactionally consistent file. One of the demos shows a user uploading an image file from their desktop into OneLake, using Windows Explorer. Optimizations have been made to each of the data engines to work with data in this file format, which should help reduce the need for complex data engineering tasks. The design of OneLake allows for security to be applied at the lake level, using the unsurprisingly named "OneSecurity" feature.
The design goal of any data fabric is to avoid data silos in large organizations. A standard data storage and analytics layer allows organizations to share data more easily between divisions and potentially even with partners. However, another trend that Microsoft Fabric supports is the data mesh concept. A data mesh means that teams specify which domain their data belongs to and then groups the data into those domains. Then each domain team builds its data products. Fabric supports domain tagging, allowing organizations to develop their products using a single data store.
One of the concepts Microsoft is introducing with Fabric is the concept of personas. For anyone who has developed commercial software, personas commonly come up as part of the design process to represent the various users of your software and how they will use your product. However, in Fabric, this goes a step further; there are dedicated user experiences that users can select within a drop-down menu. For example, a data scientist will land in a Spark notebook using a Lakehouse store, as shown below:
The data scientist could then use Spark to create the visualization shown below:
A business analyst would have experience analogous to Excel and Power BI, while a data engineer might have a T-SQL querying experience. These personas are built into the Fabric user experience and can be easily changed.
While not shown in the above graphic, I expect Fabric will have integrations with Azure Purview's data governance solution and the security provided by Azure Active Directory. Purview provides both data governance and information protection features for Fabric, and this functionality will become more tightly integrated over time. Currently, sensitivity labels cascade into the current data protection components of Microsoft 365. This may become a more native experience in the future. Governance is a very underrated element of data security; these tools can help protect your data wherever it lives.
Of course, it wouldn't be a Build 2023 announcement without an element of AI and Copilot. Microsoft Copilot now supports the DAX language used by Power BI (and is a very fiddly language, in my experience). This Power BI Copilot for DAX feature is currently available in public preview. Microsoft also highlighted some natural language query functionality in Fabric, now private preview. Power BI has flirted with features like this for a long time; for example, Power BI Q&A has been around for several years. While the Q&A service used labels and synonyms in the data, it doesn't take a lot of imagination to extend that concept to large language models (LLM), like those used in ChatGPT, to build a natural language query option for Fabric data models.
Another exciting component is Data Activator, another service that is in private preview. This service harkens back to SQL Server Notification Services (which I'm old enough to have heard of but never used), where the service will monitor your data for changes, for example, if your sales fall below 80 percent of your forecasts, and then take action. Conceptually like Logic Apps or Power Automate, Data Activator will integrate those services and offer low-code experiences and more complex scenarios. It will be interesting to monitor the evolution of this service; it has a lot of potential. Still, it can be computationally challenging to calculate such notifications across large data sets.
Fabric is a big, bold step from Microsoft. Competitors like Snowflake and partners like Databricks have been making inroads into a traditionally strong business intelligence and analytics market for Microsoft. While Fabric will remain a work in progress for some time, the level of investment and direction from the company in data analytics is promising.
My primary concern is to see what Fabric's pricing model is and if it is a good fit for companies of all sizes, or if it will be exclusively an enterprise play. My other question is how well it will work with the messy heterogeneous data platforms that I see in most enterprises. Stay tuned to this space -- I plan to dive deeper into Fabric in my July column.
Joseph D'Antoni is an Architect and SQL Server MVP with over a decade of experience working in both Fortune 500 and smaller firms. He is currently Principal Consultant for Denny Cherry and Associates Consulting. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. Joey is the co-president of the Philadelphia SQL Server Users Group . He is a frequent speaker at PASS Summit, TechEd, Code Camps, and SQLSaturday events.