Joey on SQL Server
Backups and Consistency Checks: The DBA's Top Priorities
Database protection goes far beyond backups; ensuring the quality and consistency of those backups is equally important. Here are the maintenance tasks that will make sure your databases stay in top shape.
- By Joey D'Antoni
- 11/16/2020
If you are a database administrator, or a sysadmin or developer who is responsible for databases, the most important responsibility you have is to protect your data. This means you need to ensure not just the security of your databases, but also that your databases are backed up and have no corruption.
In order to ensure consistency, you need to run the DBCC CHECKDB SQL Server command regularly. I'll talk more about CHECKDB later in this article. First, let's talk about database corruption.
SQL Server stores data in 8Kb pages that are an on-disk structure. When a transaction is processed and data is changed, that data is written to the transaction log on-disk so that in the event of a power loss or subsequent storage failure, you have all of the changed data on physical disk. The operating system acknowledges that the write took place, which in turn transmits that acknowledgment back to the SQL engine, which then marks that transaction as committed.
Corruption most commonly occurs when the storage device acknowledges a write that did not happen or happened incorrectly. This can be caused by a physical storage failure, a bug in storage microcode, power failures or any number of other things. Corruption can also be caused by problems with physical memory or, rarely, a bug in SQL Server. Of all the instances of database corruption that I have observed, 99.99 percent were caused by problems with storage -- including the one where the datacenter team said, "The disk controller has burn marks on it."
In order to protect your databases against corruption, there are two things you need to do. Backing up your databases is always very important as it provides you the ability to recover your database, but there's a big caveat when it comes to database corruption: SQL Server will let you back up -- and add additional data to -- a corrupt database, but the only way you would know that there are corrupt pages in the database will be when SQL Server reads those pages into memory.
You can add the CHECKSUM option, which can detect some types of corruption, to your backup commands. When you use CHECKSUM, the backup process will test the page checksums for each page being backed up. This does impact the performance of your backup operations, but can provide an additional layer of protection. However, not all types of page corruption will be detected by this process.
To perform a definitive check for database corruption, you need to execute the DBCC CHECKDB command. This command performs a series of consistency checks against your database to ensure that all of your pages are in a consistent state. Because this process inspects every page of the database and the consistency between pages, this process is quite I/O-intensive.
There are several alternatives to the command, including PHYSICAL_ONLY, which only inspects the physical structure of the page and record headers, and can identity the most common storage-related corruption. Microsoft recommends running the PHYSICAL_ONLY option frequently and running a full consistency check periodically. You may also consider offloading these checks to a restored backup, or a secondary copy in an availability group. If you have Software Assurance, you can run consistency checks on a server without a license.
I recommend running CHECKDB on your primary database. While this may be impactful to performance, it is the only way to ensure that there is no corruption on the physical storage where your database exists.
The correlation to backups exists because until you have restored a backup and done a consistency check on the restored copy of the database, you don't know if you have corruption. Just this week, I was working with a customer who had not run CHECKDB and had a month's worth of backups with corruption. You can check to see the last completed CHECKDB data in database properties, starting with SQL Server 2016 Service Pack 2. Prior to that, it was part of the DBCC INFO command, and could also be found by running some undocumented commands that allow you to inspect page headers.
What I typically recommend for most companies is to execute CHECKDB weekly. I do this on a system that has about 25TB of storage. The checks take a good chunk of the day to run, but the systems remain operational and other workloads can keep running. On these systems, I keep eight days of backups so that if corruption occurs, I can still perform a restore of a known good copy of the database. On a very busy OLTP system that supports 24x7 workloads, you may need to get a little more creative in terms of running on a secondary copy or restoring a backup to another server.
There are a couple of ways you can run integrity checks. The first is to use the built-in functionality in SQL Server called maintenance plans. Another option, one preferred by more experienced DBAs, is the SQL Server Maintenance Solution, which is an open source option from Ola Hallengren, a Microsoft Data Platform MVP based in Sweden.
Protecting your data requires more than just any one step. Just like data security requires a layered defense-in-depth approach, data protection requires a multilayered approach. It's not enough to have backups -- you need to ensure your backups are stored on a separate storage device from your database and log files, and you need to ensure those backups are not corrupt. Beyond backups, you need to regularly run consistency checks against your databases to ensure all of the pages are valid.
Ensuring that your data is protected is the No. 1 job of the database administrator, so make sure your maintenance tasks are lined up.
About the Author
Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.