Anytime someone mentions the words "database architecture" in a sentence, my eyes glaze over and it's just a matter of time before I am sawing logs.
But the truth is this topic is vitally important to an Exchange admin. In fact, understanding the internal database architecture is absolutely critical to running an Exchange organization.
If you don't have at least a basic understanding of how the Exchange database structure works, you will have trouble getting Exchange to perform the way it should. That said, let's get down to business.
All databases not created equal
Exchange databases come in two basic flavors. Exchange 5.5 and earlier use one type of database architecture, while Exchange 2000 and 2003 use a different architecture. Of course, there are subtle differences from version to version, but there is a huge structural difference in the way that databases work in Exchange 5.5 and in Exchange 2000.
This difference stems from the fact that Exchange 2000 and Exchange 2003 integrate themselves into the Active Directory, while Exchange 5.5 and earlier versions do not.
In Exchange 5.5 and earlier versions, there were three primary database files: PRIV.EDB, PUB.EDB and DIR.EDB. The PRIV.EDB file contained the private information store (the actual mailboxes). The PUB.EDB file contained the public information store (the public folders) and the DIR.EDB file contained the directory of users who had mailboxes on the server.
These three databases were very closely related and had to be kept in sync. Often, if a problem occurred in one database, it would prevent the other two databases from being able to be mounted. If a database isn't mounted, then the information within it isn't accessible.
Pluses of multiple storage groups
I want to give you a brief overview about how Exchange 5.5 databases work since many organizations still run Exchange 5.5 or have a mix of Exchange 5.5 and Exchange 2000 or 2003 servers. You will also appreciate the current database structure if you know something about Exchange 5.5's structure.
You probably noticed that each of the three Exchange 5.5 databases used the .EDB file extension. This is the file extension used by the Microsoft Jet database format. Exchange 2000 and Exchange 2003 still use the .EDB file format, but each .EDB file has a corresponding STM file. The .EDB file holds the actual messages, the rich text information and the MAPI information. The STM file is known as the Streaming file and its job is to hold all non-MAPI information.
To illustrate the importance of the STM file, consider this: If someone opens Outlook, composes a message and sends it to someone on your server, the message is stored in MAPI format within the EDB database.
However, suppose that the recipient isn't connecting to the server through a MAPI client, but rather through a Web client. Since the message is stored in MAPI format, it will have to be converted to the appropriate format before the recipient can read it. This is where the STM file comes in. At the time that the client attempts to read the message, Exchange uses on demand conversion to convert the message to the correct format. The reformatted message is then stored in the STM file for the recipient to read.
Another difference between Exchange 5.5 and Exchange 2000/2003 is in the names of the databases themselves. I already explained that DIR.EDB does not exist in Exchange 2000 or 2003 because Exchange relies on the Active Directory rather than on its own directory database. The public and private information stores have evolved as well. The private information store has been renamed to the mailbox store, and the public information store has been renamed to the Public Folder Store. The names aren't the only things that have changed though. Exchange 2000 and 2003 support the use of storage groups.
In Exchange 5.5, there was one public store and one private store. In Exchange 2000 and 2003, you can have multiple storage groups, each with their own mailbox store and public folder store. There are several advantages to this. One is that separate storage groups make it easier to host mail services for several companies on a single server while mitigating the security risks. Even if your server is only used to service your own company, there are advantages to using storage groups. They can be used to increase both performance and reliability.
Imagine that your company has 5,000 Exchange mailboxes on a single server. If all 5,000 mailboxes were in the same storage group and the mailbox store within that storage group failed, then all 5,000 mailboxes would be unavailable until the store was brought back online. On the other hand, if you created five different storage groups, each with 1,000 mailboxes and had the exact same failure, then only 1,000 users would be effected. Exchange also works better with smaller stores. Usually, your server will perform better if you limit the size of individual stores rather than trying to lump all of the mailboxes into a single store.
Transaction logs do more than log
One last concept you should be familiar with is transaction logs. Transaction logs, used in all versions of Exchange, can be tricky. I could probably write an entire book on how they work, but in keeping with the spirit of this article, I will give you the short and sweet version.
Imagine that you are sending a message to someone with a mailbox on your Exchange Server and half way through the transmission, the power fails. Not only will the person not get the message, but the partial message will corrupt the database. This is where transaction logs come into play. Rather than writing messages directly to a database, messages are written to transaction logs instead. Only after a transaction log fills up are the messages within it committed to the database. The idea behind this is that should a power failure occur, there won't be any corruption. Exchange can look at the transaction logs to remove and then rewrite any incomplete transactions.
Transaction logs also make it possible to back up the server. You usually can't back up an open file. Since Exchange Servers constantly use the databases, it would be very difficult to back up the server if it weren't for the transaction logs. Messages arriving during the backup are written to the transaction logs, not to the databases. This guarantees that the database does not change during the backup.
After the database backup is complete, data from the transaction logs is added to the backup and then the checkpoint file is updated. The checkpoint file is a simple file that tells Exchange which transaction logs are current.
As you can see, the databases Exchange uses tend to be somewhat complicated. However, having a basic understanding of the various database files and what they do is crucial to successful server operations.
Editor's Note: "Exchange Admin 101" is a new category of tip that we created for the Exchange newbie. This regularly featured tip will discuss a topic of particular importance to the new Exchange admin or to the admin who has upgraded to a new version of Exchange and wants to learn the basics.
If you have suggested topics for Exchange Admin 101 tips, please send them along to editor@SearchExchange.com.
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. Brien has served as the CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. As a freelance technical writer he has written for Microsoft, CNET, ZDNet, TechTarget, MSD2D, Relevant Technologies and other technology companies. You can visit Brien's personal Web site at http://www.brienposey.com.