Joey on SQL Server
The Microsoft Azure Data Platform Lineup: Key Technologies To Know
Microsoft's portfolio of PaaS databases has grown way beyond Azure SQL. Here are its most important cloud-based database services, and exactly how each one works differently from the other.
- By Joey D'Antoni
- 09/16/2019
As you know, Microsoft offers a wide variety of Azure Data Platform offerings. I recently gave a full-day talk at a conference in India on this topic, and I thought it might be useful to summarize that talk in this column. Some of this is based on my personal experience working with Azure, and some of it is from official and unofficial guidance from Microsoft.
First, what do platform-as-a-service (PaaS) databases have in common? All of Azure's PaaS offerings have a few common architectural tenets. Microsoft (and, frankly, Amazon Web Services) have architectural standards that ensure all PaaS offerings are highly available and backed up. This also means the services are all patched to current software levels with minimal downtime.
While each service may use slightly different means to reach these goals, these architectural tenets should be considered a baseline for data platform services.
Azure SQL Database, OG Edition
Azure SQL Database dates back to the Mesozoic Era of cloud computing, or 2010. The service has improved dramatically over the last decade, but in general its purpose is still best-suited to new application development, ideally for applications that have their inbound data coming from other cloud services. There are a few nuances to Azure SQL Database that make it different from SQL Server, the major ones being that you can't do cross-database queries or use common language runtime (CLR) functionality.
Azure SQL Database is built around support for online transactional processing (OLTP) databases, but can also be used for data marts and online analytical processing (OLAP) workloads. You purchase Azure SQL Database either by the number of virtual cores (the newer model going forward) or by the legacy model (which is based on database transaction units or DTUs).
Recently, Microsoft introduced a "serverless" option to Azure SQL Database, which effectively acts like an auto-scaler in addition to pausing your database after a defined idle period. This feature is really useful. However, you should note that if you are running your database 24/7/365, your costs will be higher running serverless than they would be if you were using reserved capacity.
Azure SQL Database Elastic Pools
This is the same Azure SQL Database as mentioned above but instead of paying for each database you deploy, you pay for a pool.
Think of it like a traditional SQL Server, with multiple user databases on your instance. You still have the same limitations as Azure SQL Database. But if, for example, you have a database for each of your customers, this can be more a cost-effective approach to deployment and management since you pay for and manage the pool versus each database. Elastic pools are somewhat limited by size, which brings us to...
Azure SQL Database Hyperscale
You can read more about this option in a column I wrote about hyperscale a few months ago. Essentially, it's Azure SQL Database with its size limit removed. Azure SQL Database was previously limited to 4TB for a single database due to a variety of underlying limitations. Hyperscale removes this limit -- which leads many to question the difference between Azure SQL Database Hyperscale and Azure SQL Data Warehouse.
Azure SQL Database Hyperscale is designed around OLTP and mixed-used workloads, while Azure SQL Data Warehouse is optimized around traditional star schema models. However, this is not an absolute as some mixed-use workloads may work better in Azure SQL Database Hyperscale than they would in Azure SQL Data Warehouse.
Azure SQL Database Managed Instance
The goal of Azure SQL Database Managed Instance is for you to be able to move your existing applications to a PaaS offering with virtually no changes from on-premises. Azure SQL Database Managed Instance supports multiple databases per instance, just like SQL Server does, and supports CLR and linked servers.
However, there are currently some caveats to using legacy features like distributed transactions, and your total database size is currently limited to 8TB, which can be problematic for many enterprise workloads. Additionally, if you are using the general-purpose tier, you need to pay careful attention to your file sizes when creating your databases, as it will affect your storage performance.
Azure SQL Database Managed Instance also uses a different network security model than most of the other Azure services, so you will need to ensure you have a good understanding of networking in Azure to get connected back to on-premises sources.
Azure SQL Data Warehouse
Azure SQL Data Warehouse is a massively parallel processing database engine that was designed to quickly load data and deliver very fast query response times. The service achieves this by using a scale-out architecture that partitions data across compute nodes and uses PolyBase to load data directly from Azure blob storage.
While Azure SQL Data Warehouse is very powerful and effective at what it does best, many organizations fail to realize that to get maximum performance out of the product, they typically need to re-engineer both their data structure and their extract, transform and load (ETL) process, at least to some degree. Also, in order to take advantage of the parallelism, you need to have at least 1TB of data, if not more.
From an operational perspective, Azure SQL Data Warehouse operates on the same platform as Azure SQL Database. Many companies deploy Azure Analysis Services in conjunction with Azure SQL Data Warehouse to offer business analysts an easy-to-use semantic layer to get their business intelligence answers quickly.
Azure Cosmos DB
While all the aforementioned services fall into the relational database realm, Azure Cosmos DB offers a family of nonrelational offerings such as graph, key-value, column family and document databases. Azure Cosmos DB also offers tunable consistency and global scale.
Microsoft has this excellent document about the best use cases for Azure Cosmos DB. You will note that it leans toward data structures that require extremely low write latency in exchange for reduced consistency or recovery time.
Summary
As you can see, there are a lot of cloud database services. If you were keeping track, you will notice that I even left off several native Azure services like MySQL and PostgreSQL, and that I didn't talk about Amazon RDS. It is just for the sake of brevity.
Much like in the old days of buying hardware, it's important to understand all of your requirements and build accordingly. I also typically say that once you have your workloads running in the cloud, it can be easy to switch between varying services as all of your data is there. However with PaaS databases, the migration process usually involves taking an export and reloading data (to move across services), so choosing the correct pattern upfront is more important than ever.
About the Author
Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.