Ensuring High Availability in Exchange Server 2010 -- Redmondmag.com

Ensuring High Availability in Exchange Server 2010

Exchange Server 2010 incorporates a number of high-availability features that system admins will find useful, including a new database availability groups (DAG) feature.

By William R. Stanek
12/02/2009

Mailbox databases and the data they contain are critical to any Exchange organization. To ensure high availability for mailbox databases, Exchange Server 2007 provided a variety of replication and clustering options, including local continuous replication, single-copy clusters and clustered Mailbox servers.

Although these features represented improvements over earlier offerings, they still posed many implementation challenges. For starters, each approach to high availability was managed in a different way. With single-copy clusters, all Mailbox servers in a cluster used shared storage. Implementing clustering meant that Exchange administrators had to configure Windows failover clustering, which is fairly complex and can require a great deal of an administrator's time to achieve a high level of uptime. With continuous replication, Exchange 2007 used built-in asynchronous replication to create copies of data and then maintained the copies using transaction log shipping and replay. Although you used local continuous replication to create local copies in a nonclustered environment, you used cluster continuous replication or standby continuous replication in a clustered environment, and each type of continuous replication was managed differently.

Exchange Server 2010 has a radically different approach to high availability because high availability is integrated into its core architecture, creating an end-to-end solution that provides service availability, data availability and automatic recovery. The result is that one, key, high-availability solution replaces the many, different solutions used previously. This solution is the database availability group (DAG).

DAGs provide automatic failover and recovery at the database level (rather than the server level) without requiring clusters when you deploy multiple Mailbox servers with multiple copies of mailbox databases. Because of these changes, building a high-availability Mailbox server solution no longer requires cluster hardware or advanced cluster configuration. Instead, DAGs provide the base component for high availability, and failover is automatic for mailbox databases that are part of the same DAG. DAGs can be extended to multiple Active Directory sites, and related architectural changes to Mailbox servers enable a single mailbox database to move between Active Directory sites. As a result, a single mailbox database in one Active Directory site can fail over to another Active Directory site.

You need to remember that database copies are for mailbox databases only. For redundancy and high availability of public folder databases, you'll use public folder replication. Unlike with cluster continuous replication, in which multiple copies of a public folder database can't exist in the same cluster, you can replicate public folder databases between servers in a DAG.

Before I delve into the details of DAGs, let's look at other ways high-availability options have changed for Exchange 2010.

Whirlwind Tour of High-Availability Features in Exchange 2010
In previous versions, Exchange operated as a cluster application that used the cluster resource management model. In this approach, you implemented high availability for Mailbox servers by first creating a Windows failover cluster and then running Exchange setup in clustered mode. As part of the setup process, the Exchange cluster resource DLL (exres.dll) was registered, allowing the creation of a clustered Mailbox server. In contrast, Exchange 2010 doesn't operate as a clustered application, and the cluster resource management model is no longer used for high availability. The Exchange cluster resource DLL and all the cluster resources it provided no longer exist. Instead, Exchange 2010 uses its own internal high-availability model. Although some components of Windows failover clustering are still used in this model, they are now managed exclusively by Exchange 2010.

Interestingly enough, many of the underlying replication technologies remain -- they've simply evolved and now work in significantly different ways. Because storage groups have been removed from Exchange 2010, continuous replication runs at the database level. Instead of using Server Message Block (SMB) for log shipping and seeding, Exchange 2010 uses a single administrator-defined TCP port for data transfer. Rather than having passive copies pull a closed log file from the active copy, the active copy pushes log files to the passive copies and the data stream is secured using encryption or compressed to reduce the size of replicated data. Although the active copy of a database in earlier versions of Exchange could be used only for seeding and reseeding, in Exchange Server 2010 both active and passive copies of mailbox databases can be specified as sources for seeding and reseeding, allowing you to more easily add a copy of a database to another Mailbox server.

Another significant change has to do with the way data is replicated. In Exchange 2007, the Microsoft Exchange Replication service replayed logs into passive database copies and built up a cache of read/write operations that was used to reduce read I/O operations. When the passive copy of the database was activated, the database cache was lost, however, because the Microsoft Exchange Information Store service that mounted the database didn't have this cache available. This meant that the passive copy was activated and made available in a cold state without a ready cache. A cold state is the same state the database cache would have been in following a restart of the server or a restart of the services performing the caching. Being in a cold state meant that the server didn't have cached read/write operations, a condition that typically increased the number of read I/O operations required until the size of the cache increased sufficiently to reduce disk I/O on the server. In Exchange 2010, the Microsoft Exchange Information Store service replays logs and handles the mount operations, ensuring that the cache is available when a passive copy is activated and made available. As a result, the server is more likely to be able to use the cache to reduce read I/O operations after a switchover or failover.

With highly available Mailbox servers, e-mail messages are safe once they arrive in a mailbox; protecting e-mail messages in transit is another matter, however. If a Hub Transport server fails while processing messages and can't be recovered, messages could be lost. As a safeguard against data loss, Exchange 2007 introduced the transport dumpster feature, which ensured Hub Transport servers maintained a queue of messages delivered recently to recipients whose mailboxes were protected by local continuous replication or cluster continuous replication. Messages were retained in the transport dumpster until an administrator-defined time limit or size limit was reached. In the case of a failover, a clustered Mailbox server automatically requested every Hub Transport server in the Active Directory site to resubmit mail from the transport dumpster queue. This approach prevented mail from being lost during the time required for the cluster to failover. Although this approach works, it's available only for message delivery in a continuous replication environment and doesn't address potential message loss when messages are in transit between Hub Transport and Edge Transport servers.

Exchange 2010 addresses these shortcomings in several ways. The transport dumpster now receives feedback to determine which messages have been delivered and replicated. Hub Transport servers maintain a copy of messages being sent to a replicated mailbox database in a DAG. The copy is kept in the transport queue (mail.que) until the Hub Transport server has been notified that the transaction logs representing the message have been successfully replicated to and inspected by all copies of the mailbox database. Then the logs are truncated from the transport dumpster, ensuring that the transport dumpster queue is used only to maintain copies of messages whose transaction logs haven't yet been replicated. Additionally, when a mailbox database in one Active Directory site failsover to another Active Directory site, transport dumpster redelivery requests are sent to both the original site and the new site.

To provide redundancy for messages during the entire time they're in transit, Exchange 2010 adds the shadow redundancy feature. Shadow redundancy uses an approach similar to the transport dumpster, except the deletion of messages from the transport databases is delayed until the Transport server verifies that all of the next hops for that message have completed delivery. If the Transport server is unable to verify next-hop delivery, the message is resubmitted for delivery to the next hop. This approach uses less network bandwidth than creating duplicate copies of messages on multiple servers. Here, the only additional network traffic generated is the exchange of discard status messages between Transport servers. Discard status messages are generated by the Shadow Redundancy Manager and indicate when an e-mail message is ready to be discarded from the transport database.

Shadow redundancy is an extension of the Simple Mail Transfer Protocol (SMTP) service and is used as long as both servers in an SMTP connection support the feature. When you have redundant message paths in your routing topology, shadow redundancy makes any Transport server disposable by eliminating the reliance on the state of any specific Hub or Edge Transport server. In this case, if a Transport server fails or you want to take it offline for maintenance, you can do so at any time by removing, replacing or upgrading it without having to empty its queues or worry that messages will be lost.

The Shadow Redundancy Manager uses a heartbeat approach in determining the availability of servers for which shadow messages are queued. The initiating server issues an XQUERYDISCARD message and in response the target server returns discard notifications. This notification exchange is the heartbeat.

If a server can't establish a connection to a primary server within the heartbeat timeout interval, which is 300 seconds by default, the server resets the timer and tries again, up to three times (the default value of the heartbeat retry count). If a primary server fails to respond by the time the retry count has been reached, the server determines that the primary server has failed, assumes ownership of the shadow messages and resubmits them. The messages are then delivered to their destinations as appropriate. In some scenarios, such as when the original server comes back online with its original database, duplicate delivery of messages can result. Because of duplicate message detection features in Exchange, Exchange mailbox users don't see duplicate messages. However, recipients on non-Exchange Mailbox servers might receive duplicate copies.

Digging into DAGs
Although the many high-availability enhancements I've described so far are important, no single feature has as much impact on the way you manage Exchange 2010 as database availability groups. DAGs are the base component of high availability in Exchange 2010. The rules for DAGs are simple. Each DAG can have up to 16 Mailbox servers as members. Each Mailbox server can be a member of only one DAG and can host only one copy of a database. The hosted copy can be either an active copy or a passive copy. An active copy differs from a passive copy in that it's in use and being accessed by users rather than offline. You can't create two copies of the same database on the same server. Following this, any server in a DAG can host a copy of any mailbox database from any other server in the DAG. Although multiple databases can be active simultaneously, only one copy of any particular database can be active at any one time, and up to 15 passive copies of this database can be on other servers in a DAG.

When you create your first DAG in an Exchange organization, Exchange creates a Windows failover cluster, but there are no cluster groups for Exchange and no storage resources in the cluster. The DAG uses only the cluster heartbeat, cluster networks and the cluster database features of Windows failover clusters. The cluster heartbeat is used to detect failures. Each DAG requires at least one network for replication traffic and at least one network for MAPI and other traffic. The cluster database stores database state changes and other important information. As you add other servers to the DAG, the servers are joined to the underlying cluster and the cluster's quorum model is modified automatically as needed based on the number of member servers.

Active Manager is the Exchange 2010 component that provides the resource model and failover management features. Active Manager runs on all Mailbox servers that are members of a DAG, operating as either the primary role holder (the Primary Active Manager) or a standby secondary role holder (the Standby Active Manager) of a particular database. The primary decides which database copies will be active and which copies to activate. The primary receives topology change notifications and reacts to server failures. The primary also owns the cluster quorum resource. If the server acting as the primary fails, the primary role automatically moves to another server in the DAG and that server takes ownership of the cluster quorum resource.

The secondary detects failures of replicated, local databases and the local information store, and it issues failure notifications to the primary, asking the primary to initiate a failover. The secondary doesn't determine which server takes over, nor does it update the database's location state. The primary performs these tasks. When an active database fails, the Active Manager uses a Best Copy Selection algorithm to select a database copy to activate. This algorithm identifies the best database copy to activate based on the database status, the content index status, the copy queue length and the replay queue length of the database copy. If more than one database copy meets the selection criteria, the activation preference value is used and the database with the lowest preference value is activated and mounted.

After you add servers to a DAG, the active databases on each server can be replicated to other servers in the DAG and you can configure other DAG properties, such as network encryption or network compression for database replication. Within a DAG, transaction logs are replicated to each member server that has a copy of a mailbox database and replayed into the copy of the mailbox database. Once you've created multiple database copies, you can use Exchange Management Console and Exchange Management Shell to monitor the replication and health status of your DAGs. Database failover can occur automatically in the event of an outage, or you can manually initiate switchover. In a switchover, the active copy is dismounted and a passive copy on another server in the DAG is mounted and made the active copy.

True Simplification
As I've explained in this article, Exchange 2010 has many important enhancements that improve availability, including integration of high-availability features into the core, architecture changes that improve availability and more. Of all the new and changed features, DAGs are my favorite. DAGs truly simplify cluster implementations and allow you to focus on what matters most -- the data.