Published: 25 Aug 2010
A large part of an Exchange Server administrator's job involves preparing for and recovering from disaster -- setting up a backup strategy or configuring Exchange servers for failover. Microsoft added some seamless new high-availability and site resilience features (backup and disaster recovery) to Exchange Server 2010 that do not involve any additional configuration or extraordinary work to establish.
High availability (HA) in Exchange Server 2007 was a function of continuous cluster replication, which isn't perfect, but still a huge improvement over Windows Server 2003 active/passive failover. In that case, you could leverage Exchange 2007's log shipping and replay functions in conjunction with clustering, but you were still stuck with only one database per storage group.
Exchange Server 2010 gives you multiple databases through the use of database availability groups, or DAGs, which automatically stay in sync with each other. These groups of up to 16 Exchange 2010 mailbox servers automatically replicate Exchange databases and create less of a dependency for a given mailbox to reside on a specific database or server. You can add or remove servers from the DAG at any time with minimal effort. Therefore, DAGs make it possible to combine an HA solution with disaster recovery (DR) because it has elements of both in one package.
DAGs are also useful because they allow more flexibility in how a particular cluster or server setup can be rolled out. You can start with a single Exchange server on Windows Server 2008 R2 Enterprise Edition, and then add more machines at a later time to increase system uptime and availability or for data protection. Mailbox servers can also act as multiuse machines, allowing them to assume other Exchange roles such as unified messaging. Therefore, you don't need to dedicate them exclusively as a failover for other machines.
A DAG can span more than one Active Directory (AD) site. For example, if you have multiple Exchange servers in different data centers, you could include them all in a DAG. Redundancy among the different data centers would add that much more resilience to your setup: If one data center fails, the other continues to run as expected. If you do this, typically you'll need to turn on Datacenter Activation Coordination Mode, which allows a DAG that has been divided across two data centers to survive an outage at one site and still allow both data centers to recover gracefully without each site assuming that it's the only surviving site. Microsoft calls this behavior split-brain syndrome.
Mailbox continuity via dial tone portability
Many of the disaster recovery/continuity features of Exchange Server 2010 will be familiar to trained Exchange admins, but there are a few end-user continuity features -- like dial tone portability -- that will stand out.
Dial tone portability creates a temporary mailbox for a user whose original mailbox lived on a failed database or server. All message traffic is redirected seamlessly to the new mailbox. Users running Microsoft Outlook 2007 or later don't need to reconfigure anything on their end -- they're automatically connected to the new mailbox.
A dial tone recovery can be performed on the server where the database failed. This is recommended so that the database doesn't have to be copied to another server, or on another server, which can in turn become the new permanent home for that user's mailbox if needed. The new way in which databases are managed in Exchange Server 2010 make this feature possible.
Exchange Server 2010's database availability groups were also designed so that updates can be applied to machines in the DAG without interrupting services. You still have to apply updates manually on each machine in the DAG in succession, but automatic failover between these machines means you can simply apply the updates, let the DAG handle the failover gracefully each time and continue on without having to take additional steps.
Windows Server integration and DAG configuration
Exchange Server 2010 integrates very well with Windows Server's conventional backup functions. A DAG's failover and continuity functions aren't meant to be substitutes for conventional backup -- just as RAID isn't a substitute for a proper server backup plan. This is why Microsoft created a Windows Server Backup plug-in for Exchange 2010 that uses the Volume Shadow Copy Service (VSS).
The Windows Server Backup plugin isn't perfect. Its list of limitations includes an incompatibility with Windows Server Backups' command-line interface (which still doesn't work with Exchange 2010). In addition, backups only work at the volume level (i.e., no backups of only the database or the logs). But despite these limitations, the plug-in is still useful.
DAGs can be configured in a few different ways. The easiest way and the one most familiar to Exchange admins, is a simple two-member (two-server) DAG. On the higher end, there is a four member DAG -- two local machines and two other machines placed in a remote data center.
The local machines are for availability (if one server goes down, the other keeps chugging); the remote servers are for site resilience (if your onsite data center fails, the other one can pick up where the first left off). All of the work that needs to be done via DAGs can be done from the Exchange Management Console or from a PowerShell prompt -- the former for ease of use, the latter for fine-grained control and scripting.
These changes in Exchange Server 2010 mean that many long-time Exchange admins have had to adjust the ways in which they perform certain tasks. The most relevant changes encompass the fact that clustered mailbox servers and storage groups no longer exist. This probably sounds extreme at first, but in practice it means there are fewer obstacles to maintaining replication and consistency. You don't have to manually administer Exchange as a clustered application -- for the most part, that is handled under the hood.
Microsoft also made a lot of under-the-hood changes in how replication works in Exchange Server 2010. For example, it has streamlined the process of populating passive copies of a database from the database cache. That way, if a failover occurs, the backup database is available more quickly than in the past.
Exchange admins who have sweated blood in the past getting backup and disaster recovery features to work ought to be intrigued, to say the least, about what Exchange Server 2010 has to offer in this vein. If you're curious about putting it to work in your organization, start by checking out the prerequisites for an Exchange 2010 installation and give it a try. Microsoft offers various trial environments, from a 120-day evaluation copy to a pre-loaded virtual hard disk you can use in your virtual machine of choice.