The Current State of the Art Around Cloud Storage and Databases -- Redmondmag.com

The Current State of the Art Around Cloud Storage and Databases

Let's break down performance metrics, cost considerations, and the ever-changing best practices of cloud storage and database solutions.

By Joey D'Antoni
10/20/2023

I recently finished a project doing some evaluation of various storage offerings as they relate to database servers in the cloud. While that project mostly looked at AWS' offerings, in this column, you will learn about the various types of storage offerings, their price points and their trade-offs. Because it is the cloud, offerings, costs and features change regularly, so staying updated with the current best practices is essential.

If you are new to databases or just learning about infrastructure, databases are incredibly demanding of storage subsystems. While databases like SQL Server, Oracle, MySQL and PostgreSQL heavily use memory to buffer data from disk, storage performance can quickly become a bottleneck. Databases can have ever-growing data volumes and demand high bandwidth and storage performance. In the cloud (and on-premises), the way to measure storage performance is in I/O Operations Per Second, or IOPs for short. If you are old enough to remember spinning disks, you may remember all sorts of exotic tricks to maximize performance from those mechanical drives. If you aren't that old, be thankful for modern solid-state storage.

In the early days of cloud virtual machines, storage performance was a significant problem. I remember seeing Mark Russinovich do a demo at TechEd in 2013 where he allocated 26 terabytes of storage, which, at the time, seemed terrific. However, when used in conjunction with a database server, that storage would have latency figured in the high hundreds to low thousands of milliseconds (ms) of latency. For SQL Server, Microsoft recommends disk latency be around 10 to 20ms, which is a good start for other database engines.

Because the performance of individual disks was poor, the way you improved performance was to stripe across multiple disks -- in some cases, you might need to allocate 5 TB for 200 GB of data because the only way to get the 25,000 IOPs you required, was to create that five 1 TB disks. Over time, in Azure and AWS, storage became more performant (mostly moving from spinning disk to solid-state disks). However, performance and capacity were correlated, meaning you still needed to overprovision disks to get the required performance metrics. It is important to note that you will likely never match on-premises direct attached storage performance in the cloud. All cloud storage is network-connected, which will inherently add some latency because of our pesky friend, the laws of physics.

In Windows, you use a feature called Storage Spaces (you can use Windows disk management tools, though it is not a best practice) to stripe disks, which works fine, especially for new VMs (it can have some issues with clustered servers). But extending those volumes can be painful. Note -- this is standalone Storage Spaces, not to be confused with Storage Spaces Direct, which is the basis of Microsoft's hyper-converged infrastructure offering. Additionally, many customers need to learn about these best practices. Still, I run into customers with multiple single disks allocated to their cloud VMs, as though they were on-premises machines connected to a SAN.

This overallocation and striping of disks is painful for cloud providers (who had to vastly overprovision storage) and customers who had to deal with much more complex storage configurations than needed to reach desired performance levels. While this took more time than I thought it would evolve, some services on both Azure and AWS allow you to specify IOPs and bandwidth as independent dimensions. Azure has Managed Disks v2 and Ultra Disks, as this component and Amazon's Elastic Block Service (EBS) GP3, IO1 and IO2 volumes are their equivalent service offering.

One important note about your storage performance is that your VM is frequently the limiting factor. Each cloud VM has a specific quality of service around both storage IOPs and bandwidth. Further complicating this is "burst" capacity, which can provide additional levels of performance for brief timeframes (in the case of Azure, this is 30 minutes per day).

The other consideration for your database servers is that most of the VMs you would consider for database use also have a local, direct-attached, ephemeral (which means any data goes away on deallocation) flash storage with its own IOPs and bandwidth capacity. For SQL Server, this has the obvious use case for storing the tempdb database. However, any other database engine with scratch space that gets recreated on the restart of the engine should consider taking advantage of this storage. This direct attached storage is the lowest latency storage you have available and, more importantly, is included in the monthly cost of your VM.

Also, part of my recent storage assessment project is NetApp OnTap cloud volumes. You may ask why I'm mentioning a non-native cloud offering -- and yes, the cost of NetApp is much more expensive than the native offerings. However, I was genuinely impressed with the performance, which matched and beat some of the best-in-class native cloud offerings. The downside of the NetApp solution is that the cost of entry is very high -- it can be two to three times the most commonly used cloud block storage offerings (AWS GP3/Azure Managed Disks). However, if you already use NetApp for file storage, it can make sense for you financially and technically. Some of the other major storage vendors have similar offerings. However, my sources tell me that they are not as tightly integrated into the cloud providers' infrastructure as NetApp.

Storage performance is one of the most important parts of database performance and can quickly become a workload bottleneck. The fact that cloud storage presents so many choices and can require a lot of specialist knowledge that many organizations need to gain means that cloud workloads can struggle to match the performance of on-premises databases. Admins also face the additional complication of changing types of storage -- it is crucial to understand if you need to stripe your storage to meet your performance requirements or if you can leverage newer storage types that require less configuration.

About the Author

Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.

Featured

Microsoft Releases Scareware Blocker Preview

Microsoft this week announced that its scareware blocker, which aims to identify potential scam sites and messages, is now in public preview.
Microsoft Shares Guidance on Proactive Identity Management

With the continual rise of nation-state attacks and increasingly sophisticated cybercriminal rings, Microsoft has released recommendations for enterprises to keep their data secure in 2025.
Dealing with Problems Related to Expanding a Windows Volume, Part 1

Expanding a Windows volume can reveal some unforeseen pitfalls, including file system cluster limitations.
2025 Roles Based Copilot, Microsoft Dynamics 365 and Power Platform Roadmap Announced

Microsoft has revealed what it is calling its "release wave 1" plans for Microsoft Dynamics 365, roles-based Copilot offerings and Power Platform.
Fabric's Growing Pains: Meeting the Needs of Power BI and Enterprise Customers

Microsoft's challenge with Fabric lies in balancing the needs of data engineers and smaller Power BI-focused users, while addressing concerns around feature gaps, security and usability in its expansive analytics suite.