Today's topic is a sneaky one that manifests itself in a variety of symptoms. The cause is always the same: Users belonging to more than 70-80 groups fail to authenticate. This shows up as failing to get Group Policy, Outlook, and other situations requiring authentication. The root cause is that the user token is sent via UDP (User Datagram Protocol). Large group memberships create a problem for the Kerberos UDP frames in that it takes 2 frames to pass the token. Old router firmware strips the second frame, thus removing some of the groups that should be part of the token. Since UDP is a connectionless protocol, there is no acknowledgement and no resend if a packet fails at the destination.
I have seen this cause two different problems. In the first, the Group Policy was not being applied and some users could not get authenticated. There was no inconsistent behavior in the OS itself but running gpresult will either fail or take a long time on XP. What is consistent though is the output from the userenv logs and errors in the event logs. In the Application log, you get event id 1053, source - Userenv and in the System Log, event id 6, source - Kerberos. Note the output of the Userenv.log from an XP client here.
USERENV(264.220) 17:56:20:565 PingComputer: Fast link. Exiting.
USERENV(264.220) 17:56:20:752 MyGetUserName: GetUserNameEx failed with 14.
USERENV(264.220) 17:56:20:752 MyGetUserName: Retrying call to GetUserNameEx in 1/2 second.
USERENV(264.220) 17:56:21:409 MyGetUserName: GetUserNameEx failed with 14.
USERENV(264.220) 17:56:21:409 MyGetUserName: Retrying call to GetUserNameEx in 1/2 second.
USERENV(264.220) 17:56:22:081 MyGetUserName: GetUserNameEx failed with 14.
USERENV(264.220) 17:56:22:081 MyGetUserName: Retrying call to GetUserNameEx in 1/2 second.
You get the picture – these entries repeat over and over. Running a Net Helpmsg for error 14 you get:
C:\>net helpmsg 14
Not enough storage is available to complete this operation.
Have a customer getting http 400 Bad request
trying to log onto OWA
because user is part of many groups
reducing gr membership corrects probl
In the second case, users attempting to logon to the Outlook Web Access client (OWA) got HTTP error 400 Bad Request. They discovered that reducing the number of groups fixed the problem. Like the first case, this was an authentication problem, probably caused by the group required to provide access being dropped in the second UDP packet that got dropped.
There are three possible solutions, or more accurately, there is one solution and two workarounds.
The best solution is to contact the router manufacturer and see if they have updated firmware for this issue, then update all routers. If updating the routers is not possible at this time, then you can choose between two workarounds that can be implemented on the client.
One workaround is to increase the Registry Key MaxTokenSize to 100,000. This is described in KB 327825. This KB actually provides a calculation method where you can input the number of groups and determine how big this value should be. Microsoft recommends that if you set this value, set it to 100,000. That way, if the groups increase, you don't have to modify this value again. It is also dangerous to set it to a value that is too big. Note that increasing this value just allows the token to grow that large if needed. It doesn't grow the token size automatically. As far as whether this will have a negative impact on your network, the users who have the token size increased will have a few more Kerberos packets delivered. Other than that, there shouldn't be any negative impact on the network by increasing this value.
In my opinion, the best option (other than fixing the routers) is described in KB 244474. In this solution, define the Registry value MaxPacketSize = 1. This will force TCP to be used rather than UDP. TCP will resend the dropped packets and authentication will succeed. The advantage here is that unlike defining the MaxTokenSize, you don't have to mess with calculating the token size and possibly changing it. In addition, in my experience, raising the MaxTokenSize did not solve the problem. I'm not saying it won't, but it failed in my experience.
It is important to note that these Registry changes are made on the clients -- not the servers. Of course pushing Registry changes to all clients is a challenging task. The best way to accomplish this is to create a custom ADM template and have it applied as a special GPO. Creation of custom ADM templates is described in KB 323639 and 816662.
Of course any time you implement something like this, its best to test it in the lab to ensure it corrects the authentication issue. You need to test the ability of the ADM to make the correct registry changes and that they get properly applied to the clients.
NOTE: Another solution is, of course, to reduce the number of Groups users are members of. It is not uncommon for large enterprises to experience "group creep" where new groups are implemented but rarely cleaned up. It could be that many groups are not used anymore and perhaps some housekeeping could eliminate the problem without implementing the changes noted here.
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers.