We are an Internet company that is using three Dell servers with four processors and 4G bytes of RAM running Windows 2K Advanced utilizing Network Load Balancing (NLB). Having run the same configuration for several years using NT4 and NLBS we have noticed a new problem in our current W2k solution. Two things actually. First, at different times on different servers we will notice using Performance Monitor that the current anonymous users counter will slowly run away from the other two, showing nearly double or triple what the average is for the other two. It will slowly come back in line and remain constant for several hours before happening again. The logs cannot confirm those as actual visits.
Second, and seemingly having no relation with the problem above, we will start to queue ASP pages for no reason. I've got NetIQ running an IIS Reset to help recitfy the problem.
This is a tough question, and there's no obvious answer. It's very hard to troubleshoot a problem based entirely on data gathered from Performance Monitor. I suspect the anonymous user counter and the ASP queue are directly related. If the server cannot process requests, both counters would increment because ASP requests become queued, and user requests don't get fulfilled and therefore the connection doesn't get removed.
Unfortunately, I don't have an understanding of what the end-user is experiencing when you see these symptoms. For example, do ASP pages stop working entirely when the ASP queue begins to increment, or do things seem to continue normally? I'm also not clear on the nature of the changes reflected in Performance Monitor. Does the rate of increase in the anonymous user counter relate directly to incoming requests, or does it skyrocket over a period of a few seconds? I also don't know enough about the application or architecture. It's possible that something is legitimately blocking the processing of requests on the server, but without understanding your architecture better, I can't tell you if it may be something in the application layer or database layer.
With that disclaimer aside, I can toss out a couple of potential causes:
- A blocking process somewhere in your application. This may be part of IIS, or part of the application/database layers. It's common for the number of allocated ODBC connections to become completely consumed. I've also seen situations where a SQL Server table or row is locked by an application, and not promptly released. If other processes need to access that table or row, SQL Server forces them to wait. This, in turn, can cause ASP requests to get queued.
- Legitimate increases in traffic. It?s possible that you?re seeing spikes in incoming requests that aren't being evenly distributed by NLBS. NLBS supports routing requests to a single server based on source IP address (or source IP network). If you're getting a large burst of traffic from, say, AOL, where many users have the same IP address, all requests may be sent to a single server.
I hope that helps. Let us know what you figure out.
This was first published in February 2001