Never Again

SQL Slip Up

What happens when a query goes wrong -- very, very wrong.

One of our largest customers was using our software to support its training department. The software let thousands of employees register for classes, checkout resources and see their progress toward certification. They needed some customizations done to support a government client, so as the only person with experience in Microsoft SQL Server, I traveled to the customer site. Our company president, who also doubled as our head programmer, came to support the database side.

The customizations had been completed, tested and saved to floppy disk before we left our office in Chicago. Before we could install the code changes, however, we had to massage the live production data. Simple enough. The first step was to run a SELECT query that looked for records in the STUDENT table that would not fit into the new customizations. We needed to make sure our final query would successfully update the table with the correct information.

My boss worked up a query that returned all students who did not have an employee ID registered. A second UPDATE query then altered the data in the LAST_NAME field for all the matched records. This way, the records would remain in the database, but would not display in the application.

My boss was flying through the process, which, to be honest, he was very good at. However, he accidentally placed a carriage return in the SQL query text, causing the UPDATE query look to overwrite the last name for every record in the customer's database and made it TERMINATED.

I watched him prep the query and remember wanting to scream "Nooooo!" as his hand hovered over the Enter button. I said to my boss, "David, do you realize what you just did?" His eyes flicked back and forth over the screen and he whispered, "Oh my god."

It was bad. We had been working on the production database. That's never a good idea, especially when the database belongs to your largest client.

My boss was frantic, trying to keep the customer from finding out what happened while we tried to recover the lost data. I searched for recent database dumps. There were none. After a couple of hours, we knew we had to tell the client.

They were ... unhappy. They had to go to their tape storage facility and perform a full restore of their database. And back then, restoring from tape was really hard.

Needless to say, all of our professional services during that trip were gratis, including our expenses. I think the only reason they didn't kick us to the curb right then was that our software was so heavily embedded into their business. To appease them, we offered deep discounts and free professional services for a year. We lost a lot of money on that trip, but we did keep the customer.

Like most disasters, a lot of little mistakes were made on the way to a big calamity. But one error stands out: The decision to run an untested query on a production database. We also failed to confirm the presence of a recent data backup. Had we taken that step, we might have been able to quietly recover from our error without involving the client. Instead, we learned a hard lesson in customer retention.

As for the president of the company, he stopped working on live data, providing only pre-tested scripts to customers that need data customizations.

About the Author

Jim Desmond, CISSP, works in information security and contingency planning in the San Francisco Bay area.

comments powered by Disqus

Reader Comments:

Wed, Sep 12, 2007 John Michigan


Any time I need to do a query like that on a live database, I execute 3-5 statements, individually (after testing in a dev database)

1) Am I *really* concerned about losing this data? do a SELECT INTO to a new table to save pre-modifcation data

2) begin transaction

3) the update/insert statement I want to run

3) did it affect the # of rows I expected? perhaps a SELECT now to verify results are as expected

4) commit or rollback transaction depending on #3

Of course backups are done frequently as well, but I consider having to load a backup to be a major screwup if it wasn't hardware failure and something to avoid.

Thu, Oct 12, 2006 Matthias Germany

Golden Rules of System Administration

1. Plan everything out in detail before you do it
2. Make sure there's a way to undo everything you've done
3. Test everything before you put it on the production system
4. Know how everything really works

you violated rule 2 und 3 - and got to pay for it.

Mon, Oct 9, 2006 Shaun Warrington

Ever thought about taking a backup before starting work on a live database? That guy should have been sacked!

Thu, Oct 5, 2006 JavaJoe Cleveland, OH

Did any of the additional "free services" involve you performing work while only wearing lavendar briefs? That would've made the whole incident worth while!

Add Your Comment Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above

Redmond Tech Watch

Sign up for our newsletter.

I agree to this site's Privacy Policy.