Tour de SQL Finale, Part IV: Replication
Our final sprint to the finish line stretches over replication: its limitations, high availability and the many improvements since the last version.
- By Eric Johnson
Over our previous three stages you made it through the rolling countryside,
the cobblestone streets and the Pyrenees unscathed. Now it's time to head into
Paris and take a victory lap on this year's Tour de SQL. Here's what we've covered
so far: Each new version of SQL Server 2005 and which one's right for you; SQL
Server's new administration features; how to fine-tune SQL Server for maximum
Your journey is now at its end, and for our final sprint, we're going to tackle SQL 2005 replication. Since its inception, replication has been a topic of debate among database administrators (DBAs): Should replication be limited to only moving small subsets of data for reporting, or is it a viable high-availability solution? Despite the debates, one sure fact is that replication has improved with each new version, and SQL 2005 is no exception. It comes packed with new features, much-needed improvements and fixes to old replication problems.
Monitoring the Situation
Microsoft has taken steps to improve the manageability of replication in SQL
2005. The first such change is the new replication monitor, shown in Figure
[Click on image for larger view.]
|Figure 1. The Replication
Monitor can keep a close eye on your replication environment.
The replication monitor is a new tool that shows everything about your publications
on all your servers in one place. Here's some of what it can do:
- View the status and history of all your replication agents, and set alert
thresholds to notify you of any problems.
- View latency and inject tracer tokens. These are new features. Latency shows
the amount of time the subscriber's data lags behind the publisher's. You
can configure alerts to tell when the latency exceeds a set amount of time;
the default is 30 seconds.
- If that's not enough, you can also inject a tracer token in the replication
data stream and get metrics on how long it took the subscriber to receive
the token. You'll see the breakdown of the time it took the token to go from
the publisher to the distributor, time from distributor to subscriber, and
total latency. Using the replication monitor, you can make sure your replication
environment is healthy and that everything is running as expected.
- View undistributed commands. Viewing the details of a subscription (Figure
2), you can see how many commands are waiting to be distributed and an
estimate of how long the distribution will take.
Upgraded Identity Management
In older versions of SQL Server, identities and replication didn't get
along. If you had an identity in your article, it wouldn't replicate the identity;
instead, it would replicate only the base data type and value. That meant the
subscriber was up-to-date with current data from the publisher, but you couldn't
insert records into the subscriber without causing problems.
That's all a thing of the past. SQL 2005 now offers identity range management for replication. You can assign each subscriber a pool of identities that are all managed on the publisher. For example, your published database could be using an identity pool of 1-1000 while two subscribers are using pools 1001-2000 and 2001-3000. This way, records can be inserted in any database and the identities won't overlap.
Another issue was that very large datatypes (text, ntext and image) couldn't
be updated by subscribers in an updating subscriber subscription. With the introduction
of the varchar(max) and varbinary(max) datatypes, this problem is also extinct.
Error reporting has also been significantly enhanced. Many of the error messages
have been rewritten to provide more detail that can help resolve problems.
All these upgrades are helped by the new replication interface, which has been completely retooled and streamlined. For example, the replication configuration wizard has been reduced by about 40 percent. Many of the dialog boxes, such as the ones used for filtering articles, have also been simplified. This translates to a replication environment that's easier and quicker to set up.
These new features help take some of the mystery out of replication and allow this very useful tool to be utilized in new areas. DBAs who were previously intimidated should now feel confident that replication can be managed. No longer is it a black box that's difficult, if not impossible, to open.
Getting Along with Peers
Peer-to-peer replication is a new replication topology that picks up
where transactional replication with updating subscribers left off.
Peer-to-peer replication is what happens when several databases share the same
data and schema with one another. Using this topology, several servers can maintain
the same database and all the updates can be pushed to all the other peers.
[Click on image for larger view.]
|Figure 2. Among its many
uses, Replication Monitor allows you to easily see how many transactions
have yet to distribute.
How is peer-to-peer different from transactional replication with updating
subscribers? There's no hierarchy requirement in peer-to-peer. With transactional
replication, data is replicated from the publisher to one or more subscribers.
Changes made on the subscribers can be sent back to the publisher and subsequently
sent to the other subscribers. That makes the publisher a single point of failure.
If the publisher goes down, the subscribers are on their own and will no longer
see data from the publisher or other subscribers.
With peer-to-peer, that hierarchy is gone; every peer talks with every other peer directly. This significantly enhances efficiency for many operations, starting with the ability to load balance. You can set up several peers and put them behind an application server that directs traffic.
Because peer-to-peer allows changes on any peer, you have options when load balancing:
- Multiple servers can be sent Write and Read requests, reducing the burden
on any one server.
- Send Write requests to a single, dedicated server and spread out the Read
requests among the remaining boxes.
The sky's the limit on how you implement load balancing, each implementation will vary based on your environment.
Another use of peer-to-peer replication is site-specific servers. All the clients at various sites can Read and Write from their own SQL Server, and the transactions can be replicated between sites. This allows all the sites to have the data from the other sites, in addition to giving the DBA control over when that data replicates.
Availability of replication is another advantage to the latest SQL version.
Historically, replication needed to be stopped to make many SQL changes. Of utmost
importance are Schema changes. Previously, you could add or drop a column; beyond
that, a Schema change required that replication be taken down and rebuilt. Now,
the following statements can all be replicated from the publisher to the subscriber:
ALTER TABLE, ALTER VIEW, ALTER PROCEDURE, ALTER FUNCTION and ALTER TRIGGER. In
order to replicate, these changes need to be made via T-SQL. Also note that this
functionality isn't limited to SQL 2005 subscribers. As long as the change you're
making is supported in the previous version of the subscriber, it will replicate.
Another big boost in availability is brought about by resumable snapshot delivery. Previously, if a snapshot failed at any point, it would start again from the beginning. If you've spent any time around replication, this problem has likely cropped up -- the data is bulk copied out of the publisher, the scripts are prepared and applied to the subscriber, but the subscriber is blocking and the data can't be copied over. For large databases, the blocking could mean hours of work wasted. Now SQL automatically resumes the snapshot from the point of failure. This puts a diminished load on your publisher and network from the extra snapshots and data copy.
Microsoft has improved the replication agents themselves. Overall, the agents seem more reliable and less likely to fail in the new version. In addition, the error recovery is much improved. Now agents will retry their previous operation when deadlocks, network timeouts or query timeouts occur. Lastly, some of the contention between replication agent jobs has been reduced to prevent replication from breaking itself.
Like so many of Microsoft's other recent products, replication security
is much improved. One of the biggest improvements concerns agents: Now each
agent can have its own security context to run under, as you can see in Figure
[Click on image for larger view.]
|Figure 3. Replication security
in SQL 2005 is far more flexible than in previous versions.
Previously, by default, everything ran under whatever account the SQL Server
Agent used. Now the model has been expanded, and each agent can be independent.
That allows you to set up accounts that have the bare minimum of permissions
required for the agent to run.
Like the SQL Agent, replication agents can be configured to use Windows accounts only; you can't run under a SQL account. You can still configure the agents to run under the SQL Agent account, but that isn't recommended.
Another security sore spot is passwords. In SQL 2005, changing the password is easy via SQL Server Management Studio or a stored procedure.
We've Come a Long Way
Over the years, each new incarnation of replication has brought with it
an improved interface and increased reliability. Replication in SQL 2005 feels
more like a refinement to a product that has become pretty good over time. With
the new features and enhancements, SQL 2005 replication may now be ready for high
I hope you've enjoyed our four-month Tour de SQL. Like the Tour de France, SQL Server 2005 can be daunting. But, in the same way that new technology and training has helped Tour riders become better than ever, SQL Server is also the best it's ever been. Putting in the kind of dedication the riders do will help you become a SQL champion. Happy racing!