What's the Future of Database Hardware Post-Cloud? -- Redmondmag.com

What's the Future of Database Hardware Post-Cloud?

IT is moving toward the cloud and virtualization, but don't write off hardware -- especially for highly critical systems. Here's a look at the newest generation of database hardware and how it impacts performance.

By Joey D'Antoni
02/05/2019

I'm currently involved in a project where we're working on actual physical hardware on some pretty big, mission-critical systems. This type of work is a rarity in a cloud-first, mostly virtualized world. However, there are still many high-end systems that run on physical hardware.

The performance differences between physical and virtual (or cloud) platforms are currently minimal. But for systems with this level of criticality, the extra layers of performance can be worth the additional cost and loss of flexibility. For logistical reasons, cloud hardware also tends to trail the leading edge of technology. (Have you ever tried to buy hardware for 53 datacenters across the world with enough capacity for thousands of customers?)

So on this project, I've gotten to explore some of the newer hardware options and how they impact database performance. I'll be talking about this in the context of SQL Server. However, most of this applies to any relational database platform, if the vendor supports the hardware technology.

NVMe Storage
"NVMe" stands for "non-volatile memory express." It is a host controller interface and storage protocol that is designed to accelerate the data between the compute tier and the storage layer. What this means in practice is drives with microsecond latency and a previously unheard-of number of IOPS; the Intel Optane drives support 550,000 random write I/Os per second.

Just five or so years ago, you would have had to buy an entire enterprise-class storage area network (SAN) in order to reach that level of IOPS. Now you can install a single disk device, which costs about the same as a nice laptop (approximately $2,000 for 750GB), that reaches that level of IOPS at a tenth of the latency.

These devices come in a couple of form factors -- either normal 2.5 drives or PCI-Express (PCI-E) cards. The PCI-E cards will have slightly lower latency than the drives, but you may be more limited in the number of drives you can use in your server.

In order to take advantage of this technology, you will probably need servers from the most recent generation or two, and the controller software on your server may or may not support creating a RAID array on these volumes. You can use Storage Spaces in Windows or Logical Volume Management in Linux to protect your data across disks. If I were prioritizing how to use these devices for SQL Server, I would first place your TempDB data files on these volumes. Secondly, I would place busy data or log files on the volume.

NVDimm
Short for "non-volatile dual in-line memory module," NVDimm technology allows you to persist data on special memory sticks. What is the benefit of this? Memory is much closer to the CPU compared to disk. In fact, in terms of latency, even an NVMe disk has orders of magnitude more latency than a memory chip (this is one of the places in computing where the laws of physics are our ultimate limiter).

SQL Server 2016 introduced some support for NVDimm; you could put the tail of your transaction log file in memory, which minimizes the latency to complete transactions.

Just like NVMe, you will need newer generations of hardware to take advantage of this technology. Starting with SQL Server 2019 (currently Linux only), SQL Server will support full enlightenment of data and transaction log files on NVDimm chips. The other benefit of this is that SQL Server uses direct access (DAX) on memory to bypass the file system, further reducing the latency of the overall system.

While NVDimm is a little bit newer than NVMe storage, I think it is definitely the future of in-memory database technology. There are some hurdles that may need to be overcome in order to scale the technology. However, persisted memory will greatly reduce the storage bottleneck of most systems, allowing busy transactional systems to process even more transactions per second.

So, What's the Bottleneck Now?
Taking advantage of these technologies, we have greatly reduced I/O overhead. So what happens when we remove that?

As my colleague Kevin Closson (check out his blog and Twitter) said once, "Everything is a CPU problem." When you remove the bottlenecks from the storage layer, data comes back to the process much faster and the utilization of the CPU goes up. From my experiences working with busy OLTP systems in the past, you start to see different wait types within SQL Server -- things like spinlocks and other CPU-related activity you may not be used to. You will still need to tune your queries, as less data read means less CPU activity.

I had some discussions with colleagues recently about whether these technologies will lead to a change in the way database structures are designed. The traditional B-tree index structure was designed with sequential spinning disk access in mind, rather than much faster, random SSD access. I would look to academic papers in the next year or two to get some idea of what sorts of changes you will see in the future.

What About the Cloud?
It's rare that I don't mention Microsoft Azure or Amazon Web Services (AWS) in a column...so here we go.

While we've focused on physical hardware so far, you can already find some of this technology in AWS (and likely Azure). The large hosts you can rent in AWS are already loaded with local NVMe storage. Also, given the IOPS and latency that Azure's UltraSSD drives promise, that leads me to believe that the underlying storage is NVMe.

Just last night, I learned that Microsoft has introduced NVMe storage to the Lv2 series of Azure virtual machines. They are currently available in limited regions, but offer fantastic storage performance based on my initial testing.

NVDimm is not currently available, but as it becomes more broadly adopted in the physical world, you will see wider adoption by major cloud providers. Also, as AWS or Microsoft implement a particular technology, the other vendor will be quick to match its competitor.

About the Author

Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.

Featured

Q&A: Protecting Data with Microsoft 365

Microsoft 365 solutions architect Vanessa Toves breaks down some of the biggest hurdles for enterprise IT when it comes to keeping internal data safe.
OpenAI Secures Historic $40 Billion Investment

OpenAI announced Monday that it has closed a record-breaking $40 billion funding round led by Japan’s SoftBank Group, setting a new benchmark for private capital in the tech sector.
Do-It-Yourself AI, Part 4: Building the Bot

It's time to put everything in the previous lessons together to build your very own chat bot.
Security Firm Reveals Voice Phishing Attack Targeting Microsoft Teams

Security researchers at Ontinue's Cyber Defense Center have uncovered a complex, multi-stage cyberattack that leveraged social engineering, remote access tools and signed binaries to infiltrate and persist within a target network.
Microsoft Announces Researcher and Analyst Agents for Microsoft 365 Copilot

Microsoft on Tuesday announced two new AI-powered reasoning agents: Researcher and Analyst.