Examining multisite database availability groups in Exchange 2010

Multisite database availability groups are a solid high availability choice, but can pose problems. Fortunately, options are available.

Multisite database availability groups in Exchange 2010 allow organizations to replicate mailboxes across different locations in close to real time. They also significantly reduce the time it takes to recover when disaster strikes a data center. If you're contemplating stretching a database availability group across multiple sites with active users, it's important to familiarize yourself with potential problems a multisite DAG could cause, as well as the options available.

Common scenarios for a multisite database availability group

There are a few good reasons to consider stretching your DAG across multiple sites, especially if you have the resources to do so.

Those reasons include:

  • Replicating mailbox databases to a disaster recovery site
  • Using two or more data centers to provide a resilient service to remote offices
  • Splitting a DAG across two main offices to provide local services to users in both, while also providing high availability and disaster recovery

In this tip, I'll focus on the final use case. It's very common for an Exchange shop to prefer a single DAG that services two offices, each with its own data center.

Where a multisite DAG can cause issues

Let's look at a multisite DAG that provides its users with a highly available email service (Figure 1). This is a six-node DAG with three nodes per site, and databases are active in the site closest to their users.

Sample DAG

Figure 1: A quick look at our sample DAG.

As you can see, the DAG is spread across two main offices: one in London and one in Birmingham. There is also a fast, low-latency link between both sites. The goal here is a single configured DAG, along with mailbox databases for users at each site.

As part of the DAG design, there are three copies per mailbox database, along with two copies that are colocated with users at their respective offices and a third copy at the opposite site. Unless there's a problem, the goal here is that the active database copies will be those closest to the actual users.

There's also the File Share Witness (FSW) to take into account. We have the option of placing it at the London site, at the Birmingham site or at a third site that has good connectivity to both the London and Birmingham sites.

That said, these options can cause problems. Let's see what can go wrong if we have a single DAG split across sites with active databases in each.

The first thing that can go wrong is a WAN link failure (Figure 2).

The link between the sites has been broken

Figure 2: The link between the sites has been broken.

As you can see, the connectivity link between sites has been severed. One of the following will happen, depending on the placement of the FSW:

  • If the FSW is located in London, the nodes in Birmingham will stop and the DAG will remain online for London users.
  • If the FSW is located in Birmingham, the nodes in London will stop and the DAG will remain online for Birmingham users.
  • If the FSW is at a third site, it's very possible that both London and Birmingham will go offline. With luck, one site might retain connectivity to the third site -- that includes the FSW -- and remain online.

These results go against the grain of what the design originally hoped to achieve, namely that it would provide highly available Exchange email for users in both offices. A simple WAN link failure has brought down services for at least one site and in some scenarios, both sites.

Other types of failures will get similar results. In other words, a data center failure at one of the two main offices has the potential to stop the entire DAG if the site hosting the FSW suffers a failure.

Using multiple database availability groups to protect against failures

So, what options are available to protect against similar failures while providing resilient services for both sites, where a failure at one site doesn't affect users at the other?

If you deploy two DAGs, each active database will only copy at the site it primarily serves. Thus, you ensure that a major data center failure doesn't leave users of the surviving site offline, and that a WAN link failure -- or something similar -- doesn't leave users in either site unable to access Exchange email.

As Figure 3 shows, we've altered our design to continue to use the same number of mailbox servers and the same number of database copies. However, we've also configured two separate DAGs with the majority of nodes for each DAG in the sites where the majority of its users are located.

Example of two DAGs that use the same number of mailbox servers and database copies

Figure 3: An example of two DAGs that use the same number of mailbox servers and database copies.

In the event of a WAN link failure, both sites can still serve the local users in each office. In the event of a complete data center failure, the users at the other site will be unaffected, while the remaining DAG node for the single failed DAG can be brought online in a controlled fashion.

The one downside to this approach is that if you have a data center failure at one site, you must bring the remaining DAG node of the single failed DAG online manually. That said, this failure scenario is not only predictable, it's also less serious that the alternatives we've examined.

About the author
Steve Goodman is an Exchange MVP, and works as a technical architect for one of the UK's leading Microsoft Gold partners, Phoenix IT Group. Goodman has worked in the IT industry for 14 years and has worked extensively with Microsoft Exchange since version 5.5.

Dig Deeper on Exchange Server setup and troubleshooting