Keep your systems in check

How can organizations ensure their Web sites remain reliable and fault-free? Focus on hardware and software, said David Flawn, a Stratus Technologies' vice president. Flawn outlined why systems fail as well as information on clustering and using Windows versus Unix in the enterprise in this Q&A.

As soon as a company's e-commerce Web site goes down, that organization's branding is tainted in the eyes of the...

end-user, said David Flawn. So, how can organizations ensure their Web sites remain reliable and fault-free? "You can't just focus on hardware or just on software because management is multi-variant," said Flawn, vice president of Worldwide Business Development for Stratus Technologies Inc., a server vendor in Maynard, Mass. SearchWindowsManageability asked Flawn to give some details. We found out why systems fail as well as information on clustering and using Windows versus Unix in the enterprise.

sWM: Generally speaking, why do most systems fail? Is the operating system, hardware or application at fault?

It depends. Ultimately, you have to recognize that when software fails, there are a number of remedial actions you can take. When software fails, whether it's an application or the operating system itself, there are a set of events that can predict when that failure is going to occur. Part of the answer to how you make systems more available has to do with being able to understand what's going on. What events are being written to error logs, and what kinds of activities have occurred on a system that ultimately resulted in failure? Software, when it fails, can be restarted, and the restarting is usually fairly rapid. There are mechanisms you can employ that will allow you to recover from the failure on the software side a lot more quickly. First, if software fails, sometimes it's really just a process that can be killed, and the application can be corrected by virtue of correcting a bad process. Similarly, if the OS fails, you can restart the system. With most technology that exists today this can be done on a remote basis.

The more catastrophic circumstance is when hardware fails. When it fails, regardless of how well you manage it, it's generally going to require somebody coming out and replacing a motherboard, or replacing an input/output (I/O) subsystem, bad memory, or power supply. So, the meantime between correction of a software problem versus a hardware problem is much longer for hardware. When most NT servers fail from a hardware perspective, you have to get a guy out there with parts to resolve the issue. With software, you can resolve the issues remotely.

sWM: Stratus' servers are promoted as fault-tolerant servers, so administrators will not have to worry about hardware failures. Clustering is a way, however, to ensure that if one server fails, its data is failed over to another server to keep uptime at 100%. Can you explain the difference between fault-tolerant computing and clustering, then?

This is an important distinction to make. There's always downtime associated with a clustered environment. That's not to say clusters are bad. They're very good for scaling out and having multiple systems and spreading out workloads. Also, they allow you take one system down, replace the software, upgrade the OS and startup the cluster again. With this kind of failover environment, you're basically waiting for a failure to happen and recovering from it, though. A fault-tolerant system, on the other hand, is doing the same thing down multiple paths at the same time. If those paths disappear because of hardware failure, you can continue on with the transaction.

sWM: Since Unix is known to be more reliable and secure than Windows, why do most businesses choose Windows?

That's where you get into a business question more than a technological one. It is difficult for any MIS shop to reconcile themselves to multiple systems all having to integrate in a unified system. If you've got IBM mainframes talking to dumb terminals, you've got a unified environment. If you've got Windows desktops, Windows servers, mainframe architectures and different kinds of Unix-based systems, you end up with this heterogeneous environment. That's very hard to control and integrate. So, one of the dreams of most MIS people is a homogenous environment that extends from the desktop to the data center. The only environment that will allow that is Windows. Most businesses today have standardized on Windows desktops and servers, but they still have Unix-based relational databases and other elements. The ability for Unix to extend its reach to all elements of the enterprise is not as high as Windows. That's why people are more interested in Windows -- this common interface and API. For example, Microsoft's new .NET initiative is a unifying architecture for integrating applications and disparate systems utilizing standards like XML.


Dos and don'ts of choosing, managing clusters

Dig Deeper on Windows Systems and Network Management Tools and Techniques



Find more PRO+ content and other member only offers, here.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.