ASP.NET App Slow Response and Application Pool/AppDomain Recycle, Event message: Application is shutting down. Reason: Unknown – Windows Server 2003

Scenario
From time to time, asp.net application response is very slow on Windows Server 2003

Rants and the resolution

After turning on recycle events, logged message in application event log was Event message: Application is shutting down. Reason: Unknown. Slow response is always timed with this message in the application event log so that confirmed that Application Pool is terminating so no wonder asp.net response is slow from time to time.

However, the only missing piece was why? Since, the Reason is unknown :-) . This application pool is configured for web garden with 6 app pools in it so we decided to attach debugger in production box to 2 worker processes.

If you are just starting out with debugging or have not read John Robbins Book on debugging, I would like to stress the followings when using debugger in production environment

1. By Default, ADPlus  writes the call stack on first-chance exception. Walking call stack also results in Symbol loading, symbol loading along with the stack walking causes a performance hit when a debugger is attached. The last thing you want in production environment is to cause performance hit because of  debugger.

2. Don’t just use ADPlus script to attach a debugger to the worker process by name because it will attach the debugger to each worker process in your production server causing  further performance hit.

3. Don’t use DebugDiag in production environment unless you really have a good reason for it.

Continue reading

Share

interesting windows service hang issue because of zone identifier data

The first time I came across this issue, my ex-boss Bill Vieux pointed it out immediately when we were looking at hang dump. I came across this issue again last week so I thought I will blog about it, just in case anyone finds it useful.

Scenario:

Windows Service is configured to watch a folder to export a file, when a file is written to disk it will launch an executable with parameters. First of all, yes it is always a bad idea to launch an exe from a windows service but anyways, sometime you don’t have a choice. Windows service was hanging with the release of new executable and we could never recreate this issue. We asked for hang dump and the following is what we found on a callstack.

0:012> ~6kb
ChildEBP RetAddr  Args to Child
00e8d07c 7e419418 7e42770a 00000000 00000000 ntdll!KiFastSystemCallRet
00e8d0b4 7e4249c4 003d060c 00000000 00000001 USER32!NtUserWaitMessage+0xc
00e8d0dc 7e424a06 03850000 03ab83f8 00000000 USER32!InternalDialogBox+0xd0
00e8d0fc 7e4247ea 03850000 03ab83f8 00000000 USER32!DialogBoxIndirectParamAorW+0×37
00e8d120 77fa9ef1 03850000 00001140 00000000 USER32!DialogBoxParamW+0x3f
00e8d188 7e2f3dc1 03850000 00001140 00000000 SHDOCVW!SHFusionDialogBoxParam+0x3a
00e8d1a8 7e2f43b4 00000000 00001140 00e8d1fc SHDOCVW!_ShowSafeOpenDialog+0×26
00e8f6d0 7ca4d2ec 00000000 0019400c 00194008 SHDOCVW!SafeOpenPromptForShellExec+0x2ce
00e8f6f0 7ca04173 0019400c 00000001 00194008 shell32!CShellExecute::_ZoneCheckFile+0×60
00e8f708 7ca040fa 0019400c 013ca5c0 00194008 shell32!CShellExecute::_VerifyZoneTrust+0x2a
00e8f72c 7ca03071 0019400c 00e8fa0c 00194008 shell32!CShellExecute::_VerifyExecTrust+0xa4
00e8f754 7ca02f6a 013ca5c0 00190f00 00e8fa0c shell32!ShellExecuteNormal+0×30
00e8f770 00a3b0aa 013ca5c0 11ad3aad 00000000 shell32!ShellExecuteExW+0x8d

If you are familiar with alternate data stream, you will notice VerifyZoneTrust immediately. When we dump the parameter passed in internaldialogbox, below is what we have

0:012> dc 03ab83f8 03ab83f8 + 512
03ab83f8  80c80ac0 00000000 00000011 01090000  …………….
03ab8408  000000a9 00000000 004d0008 00200053  ……….M.S. .
03ab8418  00680053 006c0065 0020006c 006c0044  S.h.e.l.l. .D.l.
03ab8428  00000067 50000080 00000000 000a000a  g……P……..
03ab8438  001400f5 ffff1141 00440082 0020006f  ….A…..D.o. .
03ab8448  006f0079 00200075 00610077 0074006e  y.o.u. .w.a.n.t.
03ab8458  00740020 0020006f 0070006f 006e0065   .t.o. .o.p.e.n.
03ab8468  00740020 00690068 00200073 00690066   .t.h.i.s. .f.i.
03ab8478  0065006c 0000003f 00000000 50000003  l.e.?……….P
03ab86f8  0069004c 006b006e 00540000 00690068  L.i.n.k…T.h.i.
03ab8708  00200073 00790074 00650070 006f0020  s. .t.y.p.e. .o.
03ab8718  00200066 00690066 0065006c 00630020  f. .f.i.l.e. .c.
03ab8728  006e0061 00680020 00720061 0020006d  a.n. .h.a.r.m. .
03ab8738  006f0079 00720075 00630020 006d006f  y.o.u.r. .c.o.m.
03ab8748  00750070 00650074 002e0072 004f0020  p.u.t.e.r… .O.
03ab8758  006c006e 00200079 00750072 0020006e  n.l.y. .r.u.n. .
03ab8768  006f0073 00740066 00610077 00650072  s.o.f.t.w.a.r.e.
03ab8778  00660020 006f0072 0020006d 00750070   .f.r.o.m. .p.u.
03ab8788  006c0062 00730069 00650068 00730072  b.l.i.s.h.e.r.s.
03ab8798  00790020 0075006f 00740020 00750072   .y.o.u. .t.r.u.
03ab87a8  00740073 0020002e 0041003c 0048003e  s.t… .<.A.>.H.
03ab87b8  0077006f 00630020 006e0061 00490020  o.w. .c.a.n. .I.
03ab87c8  00640020 00630065 00640069 00200065   .d.e.c.i.d.e. .
03ab87d8  00680077 00740061 00730020 0066006f  w.h.a.t. .s.o.f.
03ab87e8  00770074 00720061 00200065 006f0074  t.w.a.r.e. .t.o.
03ab87f8  00740020 00750072 00740073 003c003f   .t.r.u.s.t.?.<.
03ab8808  0041002f 0000003e 00000000 00000000  /.A.>………..
03ab8818  80c80ac0 00000000 00000011 01090000  …………….
03ab8828  000000a7 00000000 004d0008 00200053  ……….M.S. .
03ab8838  00680053 006c0065 0020006c 006c0044  S.h.e.l.l. .D.l.
03ab8848  00000067 50000080 00000000 000a000a  g……P……..
03ab8858  001400f5 ffff1141 00440082 0020006f  ….A…..D.o. .
03ab8868  006f0079 00200075 00610077 0074006e  y.o.u. .w.a.n.t.
03ab8878  00740020 0020006f 0070006f 006e0065   .t.o. .o.p.e.n.
03ab8888  00740020 00690068 00200073 00690066   .t.h.i.s. .f.i.
03ab8898  0065006c 0000003f 00000000 50000003  l.e.?……….P

We have a modal dialog box with message “Do you want to …, file can harm your computer…” and since windows service is not running in interactive desktop mode, you are not going to see the warning message.

on executing streams( downloadable from http://technet.microsoft.com/en-us/sysinternals/bb897440.aspx) on exe, we see the following output

D:\Tools\Streams>streams “d:\xxxxx.exe”

Streams v1.56 – Enumerate alternate NTFS data streams
Copyright (C) 1999-2007 Mark Russinovich
Sysinternals – www.sysinternals.com

:Zone.Identifier:$DATA       26

Zone.identifier is added as a security measure to every executable when downloaded from http or on a network share.

Since this application could be deployed using xcopy, so someone unzipped on a network share to copy the executable and of course windows service is now hung. Although, it is always a bad idea to launch exe but sometime there is no option so watch out for this scenario

Share

High CPU Usage and Windows Forms application hang with SQLCE database and the SqlCeLockTimeoutException

Problem Description

Windows froms application on a user machine is consuming high cpu resource and application appears to be hung.

Analysis

Usually, the cause of high cpu hang is

1. infinite loop

2. machine is busy

3. High CPU in GC

4. Thread is spinning with sleep time in 100s of milliseconds

We will start with collecting the user mode memory dump of windows forms application. We should always collect more than one dump to analyze hang scenario. I prefer to use adplus script with -r 3 300″ to collect where 3 specifies the number of times ADPlus will run in hang mode and 300 is the interval in seconds between each run.

1. Lets open the memory dump using WinDbg

2. The first command to run to analyze hang is !runaway extension to find out how long each of the threads have been running for

0:000> !runaway 3
User Mode Time
Thread       Time
0:218       0 days 0:08:38.359
8:a50       0 days 0:02:29.968
Kernel Mode Time
Thread       Time
8:a50       0 days 0:03:23.875
0:218       0 days 0:02:06.046

I have listed the interesting threads. Thread # 0 has consumed 2 mins in kernel mode and more than 8 mins in user mode and thread # 8 has consumed nearly equal amount of time around 2:xx in user and kernel mode.

3. Lets look at thread stack of each of thread # 0 and thread # 8

0:000> ~8e!clrstack
OS Thread Id: 0xa50 (8)
ESP       EIP
06d8f604 7c90e4f4 [InlinedCallFrame: 06d8f604] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
06d8f600 7b1d8e48 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
06d8f69c 7b1d8937 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
06d8f6f0 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
06d8f720 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
06d8f734 05893609 MyApp.Splasher.ShowThread()

We can ignore thread 8 because this is displaying splasher screen and pumping the message.

0:000> ~0e!clrstack
OS Thread Id: 0×218 (0)
ESP       EIP
0012e4f0 7c90e4f4 [NDirectMethodFrameStandalone: 0012e4f0] System.Data.SqlServerCe.NativeMethods.CompileQueryPlan(IntPtr, System.String, System.Data.SqlServerCe.ResultSetOptions, IntPtr[], IntPtr, Int32, IntPtr ByRef, IntPtr)
0012e518 07d72c57 System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()
System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String,

In Thread # 0 ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

We will now look at unmanaged stack

0:000> ~0kb
ChildEBP RetAddr  Args to Child
0012cd18 7c802455 0000007f 00000000 0012cd44 kernel32!SleepEx+0×61
0012cd28 7d57b50e 0000007f 04c105a8 0012cda4 kernel32!Sleep+0xf
0012cd44 7d57ffa0 00001388 00000002 04c105a8 sqlcese35!Session::_WaitForLock+0x5f
0012cd80 7d580b6b 0176a388 01764778 00000001 sqlcese35!Session::_Lock+0×280
0012cdcc 7d5571bb 0012cdf8 0000000c 00000003 sqlcese35!Session::LockRow+0xba
0012ce20 7d56c74e 004ad00c 04c105a8 003ff978 sqlcese35!BaseMover::_SetCurrentBookmark+0×88
0012ce34 7d56cad0 00000000 00000001 04c105a8 sqlcese35!IndexMover::_SetCurrentRow+0×80
0012ceb0 7d56cb52 0000000d 003ffa98 00000001 sqlcese35!IndexMover::_Seek+0×191
0012ced4 7d565e5c 00000000 00000000 04c105a8 sqlcese35!IndexMover::Seek+0x5a
0012cef4 7d57be24 003ff04c 00000000 00000000 sqlcese35!Cursor::Seek+0×88
0012cf24 7d7ba8fa 04bd4788 0180c8ce 0181963c sqlcese35!Session::OpenHistogram+0xcb
0012cf80 7d7b31f2 00000000 0180c8ce 0181963c sqlceqp35!StatMan::LoadHistogram+0×53
0012cfac 7d7b3558 04b9b31c 0180c8a4 0180f874 sqlceqp35!QPIndexSchemaInfo::LoadHistogramHelper+0×57
0012d524 7d79dab5 04b9b31c 0180c8a4 0180c8a4 sqlceqp35!QPIndexSchemaInfo::LoadHistogram+0×62
0012d54c 7d7b8b23 04b9b31c 00024f1c 00000069 sqlceqp35!QPIScanAccessPlan::CostIndexScan+0×56
0012d5fc 7d79dea3 04b9b31c 0012d818 0180c8a4 sqlceqp35!QPIndexList::Matches+0×265
0012d6a8 7d79e339 04b9b31c 0012d818 01802274 sqlceqp35!RelBaseBlockPlan::CreateIScanPlan+0x7d
0012d6f8 7d79f9ff 04b9b31c 0012d818 0012d728 sqlceqp35!RelBaseBlockPlan::CreatePlan+0×130

Let’s understand what’s going on in this call stack

Reader thread calls command.ExecuteReader(CommandBehavior.SingleRow)

ExecuteReader makes a call to ExecuteCommand

ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

The below steps are getting executed in umanaged

  • Unmanaged code creates the query plan sqlceqp35!RelBaseBlockPlan::CreatePlan
  • Reader Thread’s Query plan has created a query plan
  • Cursor is performing index seek on predicate column = @columnId
  • Index seeker sets the current row during seek
  • sql ce session tries to lock the current row
  • Thread just spins on lock with Sleep time of 127 milliseconds

Since thread sleep time is only 127 milliseconds so that explains the high cpu usage. So why is it spinning on lock forever.

4. Since this is a managed application, we should always look at all the exception objects on managed heap

0:000> !dumpheap -type Exception
Address       MT     Size
………………………………..
173e4aec 07d84d68       76
Statistics:
MT Count TotalSize Class Name
……………………………………………………
07d84d68       32         2432
System.Data.SqlServerCe.SqlCeLockTimeoutException
Total 44 objects

0:000> !pe 173e4aec
Exception object: 173e4aec
Exception type: System.Data.SqlServerCe.SqlCeLockTimeoutException
Message:
InnerException:
StackTrace (generated):
SP IP Function
0012E4A0 07D72D63 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()+0x24b
0012E560 07D73D4C System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String, System.Data.SqlServerCe.ResultSetOptions)+0x23c
0012E5A0 07D73EE6 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteReader(System.Data.CommandBehavior)+0×16
0012E5A4 04A8C52D GlobalPlanning_Sync!GlobalPlanning.Sync.PlanetSyncManager.OnSyncProgress(System.Object, Microsoft.Synchronization.Data.SyncProgressEventArgs)+0x81d

Dumping the exception object confirms the same call stack as thread 0

You can get the error details by dumping the heap for

0:000> !dumpheap -type SqlServerCe.SqlCeError
Address       MT     Size
173e4a74 07d826dc       36
total 64 objects
Statistics:
MT    Count    TotalSize Class Name
……………………………………………………………………………………….
07d826dc       32         1152 System.Data.SqlServerCe.SqlCeError
Total 64 objects

You can get the formatted error message by dumping System.Data.SqlServerCe.SqlCeError.formattedMessage

0:000> !do 173762b0
SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 2,Thread id = 536,Process id = 3564,Table name = __SysObjects,Conflict type = s lock (x blocks),Resource = RID: 1197:12 ]

In above error message, you will notice that Resource = RID which is a row identifier that means a single row in a table is locked. Conflict type = s lock which means that this is in shared lock mode for read access.

This is confirmed by dumping SQLCECommand object on managed heap that select statement is being executed.

Another point to note is, with default isolation level SELECT statement in SQL doesn’t require S lock unless it is in a transaction.

SQL CE compact edition online book says that

“When a statement has waited longer than the LOCK_TIMEOUT setting, the blocked statement is canceled automatically, and the error message SSCE_M_LOCKTIMEOUT, “The system timed out waiting for a lock,” is returned to the application.However, SQL Server Compact 3.5 does not roll back or cancel any transaction that contains the statement. The application must have an error handler that can trap the error message SSCE_M_LOCKTIMEOUT. If an application does not trap the error, it can proceed without knowing that an individual statement within a transaction has been canceled. Errors can occur because statements later in the transaction might depend on the statement that was not executed.”

However, if your database file is fragmented or large its possible that your select query may not run with default time out of 5 seconds in desktop so it will keep on throwing locktimeout exception every 5 seconds for every select query.

There is only one connection object on managed heap and no other threads are executing the SQLCECommand so this is most likely a fragmented database because this issue gets resolved after database is compacted.

However, we should also look at thread stack memory to find any hidden exception to make sure there is no other handled exception resulting in not leaving the spinlock

you can get the Thread Environment Block stack address using !teb command and use dds command to dump words and symbols

0:000> !teb
TEB at 7ffdf000
ExceptionList:        0012cd08
StackBase:            00130000
StackLimit:           00126000

0:000> dds 00126000 00130000

………………………………………………………………………………………….

0012df94  7d58272d sqlcese35!SpinLock::Enter+0xd
0012dfa8  7d55bc0b sqlcese35!AutoSpinLock::AutoSpinLock+0×17
0012dfac  7d55bc48 sqlcese35!AutoSpinLock::Leave+0×12
0012dfb4  7d57c150 sqlcese35!Session::GetSyncSession+0x4c
0012dfc8  7d58ebbd sqlcese35!TrackingUtils::GetSyncSessionInfo+0×19
0012dfcc  7d58ebf4 sqlcese35!TrackingUtils::GetSyncSessionInfo+0×50

……………………………………………………………………………………………….

Only interesting bits are shown above for brevity but there is no exception dispatcher found on raw stack memory so that confirms that no exception has been thrown in this thread call stack

Resolution and Workaround

1. There is no way for you to determine in SQL CE whether a resource is locked before locking it, except to attempt to access the data catch the time out exception, so one workaround is to have a configurable SQL CE lock timeout so that when SQL CE database is fragmented over time and when you get an exception you can increase the timeout value

2.  SQL CE database is file based loaded in process’s address space so  SQL CE database can become fragmented over time and fragmentation causes some serious performance issue. To avoid fragmentation,  you can compact the database when locktimeout exception is thrown

However, if the lock is due to coding error in that case review your code path but in this case there is only one connection object with a command object on one thread so this was a fragmentation issue and locktimeout issue did get resolved after compacting the database.

Share

Not reproducible Windows Forms Hang OnUserPreferenceChanged – If your application has hung once it will hang again

Problem Description

At one of our customer site, windows forms business application hung for a business user.  This issue  occurred only on one machine out of 100s of users. The way we look at it is, If your application has hung once on any machine that means it will hang again, so I volunteered to look into his machine armed with WinDBG.

Note

Executed command to check for syncblock/locks are for illustration purpose only

1. While debugging for hang/deadlock, the very first thing you need to do is look at the call stack for each thread

2.  If call to Systen.Threading.Monitor.Enter is at the top of the managed call stack and there is a corresponding *WaitFor* at the top of unmanaged call stack which you can get either by running command kb or execute !dumpstack to get managed/unmanaged both, in that case you can suspect deadlock in managed code and resolved it using !synblk command to find out which thread is owning the object

3. If the managed thread call stackdoesn’t have Thread.Monitor that means sycnblock will be pretty useless to find out the cause of deadlock

4. If thread has WaitOne or waiting on a handle in a call stack in that case, you should use dd command to find the handles for kernel handle deadlock or use !do command in managed stack along with !handle to find out which thread owns the handle

Analysis

1. Take a memory dump of the application while it is hung, no live debugging – least convenience to user

2. Open the dump file for analysis, since this is a pure managed app so first we will look at syncblock and then criticalsection locks

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
—————————–
Total           546
CCW             4
RCW             17
ComClassFactory 0
Free            460
0:000> !locks

CritSec <Unloaded_PI32.dll>+a00a92f at 0a00a930
LockCount          0
RecursionCount     0
OwningThread       0
*** Locked

3. Everything is fine so far, let’s look at all the thread stacks and the waithandles on interesting thread

0:000> ~*e!clrstack
OS Thread Id: 0xcf8 (0)

ESP       EIP
0012e4c4 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0012e4d8 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0012e4ec 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0012e58c 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0012e5c0 7b6f322f System.Windows.Forms.Control.Invoke(System.Delegate)
………………………………………………………………………………………………………….

0012eda4 0167834e MyApp.UI.RunStartup()
0012eff0 7b1c8906 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)

0012f3ac 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f3c0 01676974 MyApp.Program.RunSelectPlanForm()
0012f400 0187b525 MyApp.Program.Main(System.String[])

OS Thread Id: 0xf74 (9)
ESP       EIP

0710f204 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0710f218 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0710f22c 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0710f2cc 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0710f300 7b920717 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
0710f34c 7a922833 Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
0710f398 7a923d2f Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
0710f3b8 7aa8f184 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
0710f670 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0710f6a0 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0710f6b4 04edf271 MyApp.MyClass.Run()
0710f6e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()

4. Just to illustrate, You can get the handles on thread 0 and 9 from stack trace

0:009> ~0kb
ChildEBP RetAddr  Args to Child
0012e07c 7c90df2c 7c809574 00000002 0012e0a8 ntdll!KiFastSystemCallRet
0012e080 7c809574 00000002 0012e0a8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0012e11c 7e4195f9 00000002 0012e144 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0012e178 7752ebd6 00000001 0012e478 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0012e1a0 77557237 0012e478 000003e8 0012e1cc ole32!CCliModalLoop::BlockFn+0×80

0:009> ~9kb
ChildEBP RetAddr  Args to Child
0710edbc 7c90df2c 7c809574 00000002 0710ede8 ntdll!KiFastSystemCallRet
0710edc0 7c809574 00000002 0710ede8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0710ee5c 7e4195f9 00000002 0710ee84 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0710eeb8 7752ebd6 00000001 0710f1b8 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0710eee0 77557237 0710f1b8 000003e8 0710ef0c ole32!CCliModalLoop::BlockFn+0×80

WaitForMultipleObjects take the array of handles and we know the array size from function parameters which is 2. We can use dd to display memory range since we know the size

0:009> dd 0012e144 L2
0012e144  00000ac0 00000038
0:009> dd 0710ee84 L2
0710ee84  00000200 000007c4

Both of these threads are waiting on different handles.

5. After reviewing the call stack again you will see that Thread 0 is calling Invoke to send a message to a window created on Thread 9 and Thread 9 is calling Invoke to send a message to a window created on Thread 0.  After posting the message, Thread 0 is waiting so message is not getting pumped on thread 0 and the same applies to Thread 9.  When you look at MarshaledInvoke implementation , after posting a message it waits on a event handle.

Why did this occur?

Thread 0 is the main UI thread and Thread 9 is a worker thread creating a splasher screen and running in STA apartment to pump message, below is the code snippet which creates a new thread and message loop for splasher on a worker thread.

SplashThread = new Thread(threadStart);
SplashThread.IsBackground = true;
SplashThread.SetApartmentState(ApartmentState.STA);
SplashThread.Start();
By default, Controls subscribe to UserPreferenceChanged Event and this event can get fired when system wide setting is changed, for example – changing the system settings including desktop background or a theme. It just so happened that business user changed his system settings when this app was running with an active splash screen and the timing was such that controls subscribed to UserPreferenceChanged was in the middle of processing a message on MainUI thread. Splasher thread sends a message to MainUI which is not processed yet and MainUI thread sends a message to splasher window to change status which results in a deadlock.

Please refer to a very good article on UserPreferenceChanged event and other gotcha/deadlock/issues which you can run into with a window created on worker thread

Share