Best practices for SAN configuration and administration

Storage headaches tend to get worse over time, but following the proper configuration and management steps can be the perfect remedy for your SAN-related woes.

Storage access and reliability are critical to the success of your data canter -- and the overall success of your...

business. Sure, servers perform work on business data and network hardware moves it from place to place, but all of that data ultimately "lives" in a storage infrastructure such as a SAN or NAS.

Failed servers or switch modules are easily exchanged, but lost data cannot be replaced and trouble with storage access can cripple application availability -- grinding the entire organization to a halt. The best way to avoid problems and keep storage running smoothly is to apply some best practices to SAN configuration and administration.

Storage configuration tips to live by

More on SAN deployments
for Windows

In this three-part series, Microsoft MVP Mark Arnold takes a look at the basics of SAN implementations.

Part 1: Why make the move?

Part 2: Migrating structured data

Part 3: Backup and recovery

Without access to storage, applications simply won't work. So perhaps the most important issue with any storage system configuration is to identify and eliminate single points of failure between the application and disks. This may include redundant network interface cards (NICs) or host bus adapters (HBAs) at each server and storage system, redundant switch ports for iSCSI (Ethernet) or Fibre Channel storage traffic, and even redundant storage systems that are replicated or kept synchronized to each other. Faults at any point should then failover to an alternate network pathway and maintain storage availability.

As physical servers host more virtual machines, it's easy for storage and other user traffic to bottleneck at the network interface. Advance workload testing should evaluate the networking requirements at the server and analyze performance in terms of IOPS and bandwidth for the assortment of workloads planned. It may make sense to redistribute the virtual workloads and ease any traffic bottlenecks at critical servers.

It is also possible to mitigate network bottlenecks by leveraging a higher link speed such as moving to 8 Gbps Fibre Channel or 10 Gigabit Ethernet links. However, since redundant network connectivity is highly recommended for resilience, it can also benefit performance load balancing, allowing multiple pathways to share the total data traffic burden rather than leaving redundant pathways idle until a failover occurs. This is particularly important for virtualized servers that host 10, 20 or even more virtual machines and contend with substantial storage traffic demands.

Another means of easing traffic conflicts is to segregate storage and user traffic. This isn't an issue with traditional SANs because Fibre Channel and Ethernet exist as separate networks. The trick is to isolate traffic in an iSCSI or Fibre Channel over Ethernet (FCoE) SAN. Logical isolation is normally achieved by restricting storage traffic to a VPN, but that does nothing to ease bottlenecks. It's more effective to implement separate networks to carry user and storage traffic. This raises the cost of deployment, but pays dividends in security, manageability and performance.

Another factor that should be considered is LUN utilization with SAN configurations. It's generally poor practice to allow all of the virtual machines on a server to access the same LUN in a storage subsystem. Even when there is adequate Fibre Channel or LAN connectivity between the server and storage, multiple VMs simultaneously accessing the same disks can produce some serious performance problems on the storage side. Configure storage to maintain a small VM-to-LUN ratio -- a ratio of 1:1 where each VM is assigned to a separate LUN is probably the most flexible arrangement, and it offers versatile snapshot strategies. Don't forget to leverage tiered storage to balance performance and cost, assign mission-critical VMs to the fastest and best-performing disk storage, and relegate everyday VMs to more economical disk groups depending on your organization's needs.

Fibre channel storage configurations also involve zoning, which basically defines the visibility of each LUN to each server. It's a means of organizing storage and enhancing security, but it can also become a serious problem for virtualization when live migration moves virtual machines between physical host servers. For example, a VM on a certain server may access particular storage LUNs. If the VM migrates to another server (perhaps as the result of automated workload balancing) and the new server does not share the same zoning, the VM may stop functioning on the new server. This presents administrators with a storage management conundrum -- which often results in disabling automated migration features.

Key storage administration practices

Storage administration should always start with a comprehensive performance baseline. For example, establish a baseline when migrating to or implementing a SAN where the SAN is known to be "healthy." Use tools to monitor short-term and long-term storage performance against the baseline. Short-term anomalies in storage performance may be indicative of an unexpected change or fault, and can usually be investigated and corrected as they occur. Long-term storage performance changes are more indicative of growth such as more users, new applications and the varying demands of cyclical business cycles. Attention to long-term performance changes are often used as the basis for storage capacity planning.

Ideally, thin provisioning allows administrators to create LUNs that are logically larger than the physical storage capacity that is allocated to it. For example, an administrator can provision a 2 TB SAN into several 2 TB LUNs for various virtual machines or other non-virtualized applications. It's an important technology and it works well – as long as the LUN doesn't fill up. When this occurs, the application can crash and data loss can occur, so administrators need to manage thinly provisioned LUNs carefully and add more physical storage to each LUN as needed.

Finally, pay attention to factors that demand significant storage. In a virtual data center, one of the biggest culprits of storage waste is the uncontrolled proliferation of VM instances, or virtual machine sprawl. Each new VM requires space for its image along with space for data protection, such as snapshots and off-site data replication.

VM sprawl can quickly deplete available storage resources. Organizations can regulate VM sprawl by limiting the number of IT administrators with authorization to create new virtual machines, and implementing policies and procedures that justify each VM and track its lifecycle.

Stephen J. Bigelow
, senior features writer, has more than 15 years of technical writing experience in the PC/technology industry. He holds a BSEE, CompTIA A+, Network+, Security+ and Server+ certifications and has written hundreds of articles and more than 15 feature books on computer troubleshooting. Contact him at [email protected].

Dig Deeper on Windows Server storage management