Joey on SQL Server
Platform as a Service Databases -- Handle with Care!
Let's take a look at the limitations of PaaS database offerings for organizations with robust DBA teams and mission-critical needs.
- By Joey D'Antoni
- 06/28/2023
One of the things that I value about having lots of experience (which is an excellent way of saying, "I'm old") is having a great deal of perspective on technology, especially around databases, which I have worked on for most of the past 25 years.
There are always a few consistent truths: more memory is always better, you always need more IO throughput and multi-master write is always difficult, if not impossible. This experience lets me see through marketing and sales hype to drill into engineering challenges.
I've wanted to write this column for several months but held off until I could get the framing right. Friends and colleagues at both Amazon and Microsoft work on these products. For most workloads, all the platform-as-a-service (PaaS) database offerings from the major vendors will meet your needs successfully and with reduced management overhead. You know, the stuff you see in the marketing for these services.
However, for organizations with robust DBA or database reliability engineering teams and mission critical latency needs, you may need help to get the performance required. Even if you can, choosing a PaaS solution may cost much more than just using cloud virtual machines (VMs). While I won't do a deep economic analysis, PaaS solutions typically cost more than a VM solution. However, the PaaS offerings include benefits like high availability and automated backups, which also affect the cost.
To delve into this further, let's understand the tradeoffs for PaaS offerings. Specifically, in this article, I'm talking about Amazon RDS for SQL Server and Azure SQL Managed Instance (MI) -- the two most popular PaaS migration targets for existing SQL Server installations. You might note I'm not including Azure SQL Database in this list -- it's an excellent service with a different target market than the other two services. It is a familiar axiom that when you adopt cloud services, you are trading control for lower management, but what does that mean?
- You no longer have complete control of the instance -- you have a modified version of the sysadmin role. Still, you may not be able to control everything you need in the case of an emergency. For example, RDS doesn't allow you to use resource governor to manage resources. For example, collecting extended event or profiler traces can be more challenging to write to a local disk on the host. Both Azure and RDS account for this, but you may need to modify scripts you have used in the past.
- A similar concern is encryption certificates for databases protected with transparent data encryption. This function limits migration options for RDS.
- The service takes backups—RDS allows you to manually manage your backups (which is the only way to restore a single database on that service), and Azure Managed Instance enables you to take a copy-only backup.
- Tempdb configuration and performance can be a limitation. Cloud vendors choose tempdb sizing based on median workloads and available storage. However, tempdb utilization can be highly variable across workloads.
You have limited options for high availability (HA) and disaster recovery on both platforms. HA is inherent to the service and is not configurable by the administrator. Managed Instance takes advantage of the same architecture that Azure SQL Database uses. In the general-purpose tier, you have one compute node and three copies of your data on remote disks. The business-critical tiers use the Always On availability group technology to have three or four replicas.
AWS RDS uses the native Windows clustering functionality combined with availability groups while using some of AWS' native network functionality to provide connectivity. You should note that RDS requires the Availability Zone deployment to provide failover capabilities. If you deploy a standalone instance, there is a lower SLA. My assumption is that standalone RDS is like Azure's general purpose, where the storage is highly available, and there is a single VM running with your RDS instance. For disaster recovery, Azure Managed Instance allows you to create a failover group to one other Azure region. Currently, Amazon does not offer multi-region configurations for RDS for SQL Server.
Each of these services has challenges around high-end IO performance. Amazon RDS for SQL Server tops out at 64,000 IO Operations per second (IOPS), which requires using the most expensive instances. Oddly, the RDS Oracle offering tops out at 256,000 IOPs, roughly the maximum number of IOPS you can purchase for an EC2 or Azure virtual machine. Both services currently have a 16 TB limit on the total size of your data.
In MI, business critical offers excellent IO performance, but at a cost -- to increase data volume and IOPs, you need to purchase additional cores of the service. The storage is local and, therefore, proportional to CPU cores. Hence, this pricing makes sense but does not make it less painful for customers. The general-purpose tier of managed instance currently sets storage performance based on the original size of the database and log file, which is esoteric. Effectively to get good IO performance, you need to oversize your databases, which directly limits the capacity of your instance. Microsoft is working on a solution for this general purpose storage issue, but has not announced anything publicly.
I have encountered a big problem with customers of both services: runaway tempdb growth.
Tempdb acts as a page file for SQL Server, and when a query does not receive enough memory to complete, it uses tempdb instead of the RAM it doesn't have. This behavior means that large queries with poor execution plans can generate terabytes (on the very high end) of tempdb activity. In MI, the general purpose tier has a fixed size for tempdb, which means you can't resize in the event of a runaway query. In RDS and the business critical tier of MI, you can add additional files and even increase the size of existing files; however, if you are close to the 16 TB limit of your instance, adding enough tempdb space to your instance may not be possible.
Platform-as-a-service databases come with many promises. And for the median customer, they deliver on those promises. While there are some quirks around the right way to configure services, most of the workloads I've seen that fit into the middle of the performance/volume curve are happy with the benefits they've seen. However, those customers with high-end workloads pushing the service's limits are typically better served with virtual machines that offer more flexibility and control over their databases.
About the Author
Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.