# Server hangs/crash intermittently



## k00900281 (Feb 13, 2012)

Dear all,

I have configured two windows server 2003 R2 servers in cluster mode. The cluster server gets authenticated by a domain controller. The Primary and seondary servers continue to run for some time and then either hang or crash with blue screen of death. BSOD is seen especially with only one server though.

I have put the event logs for reference. I must say that the problem started happening only after the RAM capacity was changed from 8GB to 32 GB. There was also some power supply (UPS) change done on the same day.

But the tools like memtest-86 have found no errors with the new RAMs.

The Primary server has seen BSOD while the secodary simply freezes after running for few hours. The taskbar is hung (although I can launch some applications from the task manager).. Launching a new explorer does not help as it also gets hung immediately.

Please help.

Below I have given event details when BSOD happened on Primary server.

Event Type: Warning
Event Source: USER32
Event Category: None
Event ID: 1076
Date: 2/7/2012
Time: 4:05:55 PM
User: S-1-5-21-3433499848-1543794289-3939903927-500
Computer: DRSECSVR
Description:
The reason supplied by user DRDOMAIN\Administrator for the last unexpected shutdown of this computer is: System Failure: Stop error
Reason Code: 0x805000f
Bug ID: 
Bugcheck String: 0x000000d1 (0x00000000, 0xd0000002, 0x00000000, 0xb3c64a6c)
Comment: 0x000000d1 (0x00000000, 0xd0000002, 0x00000000, 0xb3c64a6c)
For more information, see Help and Support Center at Events and Errors Message Center: Basic Search.
Data:
0000: 0f 00 05 08 .... 



The winDBG analysis (with my limited knowledge) gives following ersult
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: d0000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: b3c64a6c, address which referenced memory
 


Thank you very much in advance.......


----------



## Maz_- (Nov 4, 2008)

Look into application viewer and check if there is an app error at the similar time to the above log.

0xD1 error usually refers to a driver issue.


----------



## k00900281 (Feb 13, 2012)

Thanks for the response...

I have checked the application events at the time of crash, whcih is 3:59 PM. The last event before the crash was at 9:32 AM and first event after crash was at 4:03 PM. I have put all the application events for 2/7/2012 below. Sorry for not attaching the .evt file directly, as upload is blocked from my office network.


Type Date Time Source Category Event User Computer
Information 2/7/2012 4:10:55 PM Ci CI Service 4137 N/A DRSECSVR
Information 2/7/2012 4:10:55 PM Ci CI Service 4137 N/A DRSECSVR
Information 2/7/2012 4:07:36 PM LoadPerf None 1000 N/A DRSECSVR
Information 2/7/2012 4:07:35 PM LoadPerf None 1001 N/A DRSECSVR
Warning 2/7/2012 4:05:56 PM Windows Product Activation None 1005 N/A DRSECSVR
Error 2/7/2012 4:04:58 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:22 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:21 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:21 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:21 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:21 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:21 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:20 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:20 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:20 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:20 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:20 PM crypt32 None 8 N/A DRSECSVR
Error 2/7/2012 4:04:19 PM crypt32 None 8 N/A DRSECSVR
Warning 2/7/2012 4:03:41 PM MSDTC Client Cluster 4148 N/A DRSECSVR
Information 2/7/2012 4:03:41 PM MSDTC SVC 4111 N/A DRSECSVR
Warning 2/7/2012 4:03:32 PM Microsoft Search Gatherer 3055 N/A DRSECSVR
Information 2/7/2012 4:03:32 PM Microsoft Search Search Service 1003 N/A DRSECSVR
Information 2/7/2012 4:03:31 PM EventSystem None 4625 N/A DRSECSVR
Information 2/7/2012 4:03:31 PM MSDTC TM 4193 N/A DRSECSVR
Warning 2/7/2012 4:03:31 PM MSDTC Cluster 4147 N/A DRSECSVR
Information 2/7/2012 9:32:40 AM SceCli None 1704 N/A DRSECSVR
Information 2/7/2012 9:29:50 AM SceCli None 1704 N/A DRPRISVR


----------



## Maz_- (Nov 4, 2008)

How about the system log files? Is there anything in there at 3:59 or around that time?

Did you make changes to the server or has this problem being occurring since the server was built?


----------



## dai (Jul 2, 2004)

see if there are mini dumps

7 and server run on the same platform

Blue Screen of Death (BSOD) Posting Instructions - Windows 7 & Vista - Tech Support Forum


----------



## k00900281 (Feb 13, 2012)

Hi,
The server has been reinstalled (OS) almost 10 times.. This problem is seen for more than two months now.. There are four IBM servers: two IBM x3650 and 2 IBM 3850. One IBM 3650 has been configured as the Primary Domain Controller and other IBM 3650 joins this forest. The two IBM 3850 servers join this domain. One 3850 server creates a windows cluster and and the other 3850 server joins this cluster. The cluster is protecting a NMS application. There are errors from the NMS application noticed in the event logs. The behaviour is the Primay application server, when it is active, crashes after running for about 24 hours.., whereas the secondary server freezes (the explorer.exe freezes)

Another observation is: The system freeze was observed many times when the secondary server was being added to the cluster. The cluster log does not reveal any useful information...

I have given the system logs below:

Type Date Time Source Category Event User Computer
Information 2/7/2012 4:13:57 PM Clussvc (13) 1202 N/A DRSECSVR
Error 2/7/2012 4:06:07 PM System Error (102) 1003 N/A DRSECSVR
Warning 2/7/2012 4:05:55 PM USER32 None 1076 S-1-5-21-3433499848-1543794289-3939903927-500 DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 SYSTEM DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7036 N/A DRSECSVR
Information 2/7/2012 4:04:56 PM Service Control Manager None 7035 S-1-5-21-3433499848-1543794289-3939903927-1111 DRSECSVR
Information 2/7/2012 4:04:37 PM Dnsapi None 11156 N/A DRSECSVR
Information 2/7/2012 4:04:01 PM Clussvc (8) 1062 N/A DRSECSVR
Information 2/7/2012 4:03:37 PM Clussvc Shell 1122 N/A DRSECSVR
Information 2/7/2012 4:03:37 PM Clussvc Shell 1122 N/A DRSECSVR
Information 2/7/2012 4:03:35 PM smtpsvc None 4005 N/A DRSECSVR
Information 2/7/2012 4:03:35 PM NNTPSVC None 85 N/A DRSECSVR
Information 2/7/2012 4:03:35 PM NNTPSVC None 93 N/A DRSECSVR
Warning 2/7/2012 4:03:32 PM MSFTPSVC None 101 N/A DRSECSVR
Information 2/7/2012 4:03:32 PM IPSec None 4294 N/A DRSECSVR
Information 2/7/2012 4:03:31 PM AeLookupSvc None 3 N/A DRSECSVR
Information 2/7/2012 4:03:22 PM l2nd None 11 N/A DRSECSVR
Information 2/7/2012 4:03:21 PM IPSec None 4295 N/A DRSECSVR
Information 2/7/2012 4:03:20 PM l2nd None 9 N/A DRSECSVR
Information 2/7/2012 4:03:18 PM l2nd None 16 N/A DRSECSVR
Information 2/7/2012 4:03:18 PM b06bdrv None 18 N/A DRSECSVR
Information 2/7/2012 4:03:18 PM l2nd None 16 N/A DRSECSVR
Information 2/7/2012 4:03:18 PM b06bdrv None 18 N/A DRSECSVR
Information 2/7/2012 4:03:15 PM updsm None 2 N/A DRSECSVR
Warning 2/7/2012 4:03:14 PM updsm None 3 N/A DRSECSVR
Warning 2/7/2012 4:03:14 PM updsm None 3 N/A DRSECSVR
Warning 2/7/2012 4:03:14 PM updsm None 3 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 1 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 1 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM mpio None 1 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM b06bdrv None 12 N/A DRSECSVR
Information 2/7/2012 4:03:14 PM b06bdrv None 12 N/A DRSECSVR
Information 2/7/2012 4:03:00 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:00 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:00 PM updsm None 2 N/A DRSECSVR
Information 2/7/2012 4:03:25 PM Save Dump None 1001 N/A DRSECSVR
Information 2/7/2012 4:03:25 PM DCOM None 10026 N/A DRSECSVR
Information 2/7/2012 4:03:25 PM eventlog None 6005 N/A DRSECSVR
Information 2/7/2012 4:03:25 PM eventlog None 6009 N/A DRSECSVR
Error 2/7/2012 4:03:25 PM eventlog None 6008 N/A DRSECSVR
Information 2/7/2012 3:51:05 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:45:19 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:41:10 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:36:16 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:27:22 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:17:15 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:10:31 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:07:11 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 3:04:26 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 2:58:07 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 2:55:03 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 2:52:01 PM Clussvc (13) 1202 N/A DRSECSVR
Information 2/7/2012 2:40:00 PM Dnsapi None 11156 N/A DRPRISVR
Information 2/7/2012 2:38:46 PM NNTPSVC None 420 N/A DRPRISVR
Information 2/7/2012 2:38:46 PM NNTPSVC None 421 N/A DRPRISVR
Warning 2/7/2012 2:38:43 PM W32Time None 36 N/A DRPRISVR
Information 2/7/2012 2:39:04 PM Clussvc (13) 1202 N/A DRSECSVR
Warning 2/7/2012 2:33:49 PM W32Time None 36 N/A DRSECSVR


----------



## k00900281 (Feb 13, 2012)

I tried to simplyfy the setup.
I removed the domain controllers and secondary application server and Cluster.
I just run my NMS application on just one server (in WORKGROUP). Then my application can run for days together without any hang issue.

Then I just introduced on Domain controller (freshly OS installed and configured); I added my server into this domain. And guess what my server freezed (same explorer.exe freeze) after about 12 hours.

What could be the problem? WHat is in the domain controller that can hang my server? I have run the virus scan in both machines, the network cable are brand new company made, I am using a 8-port brand new hub for connecting them.
NIC card in DC is an issue? Ping works fine.. FTP of 500 MB file from DC to the server is pretty fast with about 80 Mbps speed.

I am glad that hang has reproduced with just two machines. But wondering how to pinpoint the fault.. Any ideas?


----------



## Maz_- (Nov 4, 2008)

This sounds more and more like a registry error somewhere in your application server. Have you tried to registry cleaner on the machine?

Also when explorer.exe freezes can you press ctrl+alt+del and restart explorer.exe? Have a look at the process and check if any of the processes is consuming the cpu/processor to 100% which could result in the crashing.


----------

