Denny's SQL Secrets

Exploring SQL Server's Distributed Transactions

Here's what you need to know about distributed transaction and when you should use one.

Distributed transactions are one of the more powerful and important functions within SQL Server. A distributed transaction is just like any other transaction within the SQL Server database, with a single key difference. The difference is this type of transaction can exist across multiple different servers. The transactions across different servers are all tied together so the operations included within the transactions must all succeed for the transaction to be completed.

We normally think of distributed transactions just within the scope of a SQL Server transaction. However, you can also put things like file system access by a .NET application within a distributed transaction. An example of this might be an application that creates PDFs of receipts for a billing application. The application that creates the files might begin a distributed transaction within the .NET application code, get the data needed for the receipt from the database, update the database with the new timestamp on which the receipt was created and then create the PDF for the receipt.

If writing the PDF fails, we want to ensure changes to the database are rolled back. That's why we use the distributed transaction. In the event of any problems along the way, we'll want to roll back the transaction and undo all changes, including automatically creating the PDF.

You can also use distributed transactions whenever there's SQL Server to SQL Server communication and changes will be made to data during the course of the transaction, often done via a linked server. However, you only need distributed transactions when data is being changed. Select statements issued across linked servers will simply use two different transactions, one on the source server and one on the remote server.

If data is modified during the transaction, the two transactions will be escalated to a distributed transaction. The transactions are effectively merged so either server can cause the other server to roll back the transaction on both servers. Within T-SQL, there's no requirement that transactions are manually started as distributed transactions. SQL Server will automatically escalate transactions from normal transactions to distributed transactions as needed. This escalation can be expensive, however. If you're going to use distributed transactions frequently within an application, it's often better to specifically tell the SQL Server at the beginning of the transaction it needs to be a distributed transaction so that the escalation cost isn't there.

Using a Distributed Transaction
Creating a transaction manually as a distributed transaction simply requires adding the keyword DISTRIBUTED into the BEGIN TRANSACTION command (see Figure 1). Committing or rolling back the distributed transaction is done via the normal COMMIT and ROLLBACK keywords.

 

  Figure 1. Beginning a distributed transaction manually.

 

If a transaction has escalated to a distributed transaction, there's nothing special you have to do within the T-SQL code to commit or rollback the transaction. Simply use the COMMIT or ROLLBACK keywords to cause the transaction on all the servers involved with the transaction to commit or rollback as specified.

Why Do You Need Distributed Transactions?
Distributed transactions ensure that data changes within databases on multiple servers are kept in sync within the context of the single T-SQL transaction. This ensures that applications made up of multiple databases on multiple servers are transactionally consistent with each other. If this wasn't a requirement, then data could mismatch and application data would be compromised.

Imagine you have an order entry and fulfillment system that didn't require distributed transactions. The order entry system is on server1. The fulfillment system is on server2. We have our application built so when orders are entered into SQL Server by calling a stored procedure. After writing the data to the tables on server1, it calls a stored procedure on server2 via a linked server that inputs the needed information into the fulfillment systems tables. When the fulfillment system is done running its stored procedure, output from that stored procedure is returned via output parameters. Then the order system uses those values to update some additional tables within the order entry system.

If this system was able to run without a distributed transaction, and there was a problem using the values the fulfillment passed to back to the order entry system, the transaction on the order entry system would fail and rollback. That would prevent the order from being entered, even though it had been entered into the fulfillment system. The end result would be that the order would ship, but without any order showing in the orders table. Therefore, there would be no way to charge the customer who placed the order.

As we have distributed transactions, when the same situation is put into place and there's a problem processing the values from the fulfillment system, both the order and the data written to the fulfillment system rolls back. So there is no order placed, and there is no order shipped.

Coordinating Distributed Transactions
Within the Microsoft platform, all distributed transactions are controlled by the Microsoft Distributed Transaction Coordinator (MSDTC). The Distributed Transaction Coordinator talks to the application using transactions such as SQL Server, custom .NET applications and so on.

It also provides the transaction coordination link to the distributed transaction coordinator on the remote machine. The distributed transaction coordinator on the remote machine doesn't need to be the Microsoft Distributed Transaction Coordinator. It could be a distributed transaction coordinator running on a Linux server as an example.

No matter what distributed transaction coordinator the Microsoft Distributed Transaction Coordinator is using to connect to the application (SQL Server or .NET), it still maintains its connections to the remote resource as normal. Only transaction management, not the actual transaction, is done via the Microsoft Distributed Transaction Coordinator.

About the Author

Denny Cherry is the owner and principal consultant for Denny Cherry & Associates Consulting and has over a decade of experience working with platforms such as Microsoft SQL Server, Hyper-V, vSphere and Enterprise Storage solutions. Denny's areas of technical expertise include system architecture, performance tuning, security, replication and troubleshooting. Denny currently holds several of the Microsoft Certifications related to SQL Server for versions 2000 through 2008 including the Microsoft Certified Master as well as being a Microsoft MVP for several years.  Denny has written several books and dozens of technical articles on SQL Server management and how SQL Server integrates with various other technologies.

Featured

comments powered by Disqus

Subscribe on YouTube