interesting windows service hang issue because of zone identifier data

The first time I came across this issue, my ex-boss Bill Vieux pointed it out immediately when we were looking at hang dump. I came across this issue again last week so I thought I will blog about it, just in case anyone finds it useful.

Scenario:

Windows Service is configured to watch a folder to export a file, when a file is written to disk it will launch an executable with parameters. First of all, yes it is always a bad idea to launch an exe from a windows service but anyways, sometime you don’t have a choice. Windows service was hanging with the release of new executable and we could never recreate this issue. We asked for hang dump and the following is what we found on a callstack.

0:012> ~6kb
ChildEBP RetAddr  Args to Child
00e8d07c 7e419418 7e42770a 00000000 00000000 ntdll!KiFastSystemCallRet
00e8d0b4 7e4249c4 003d060c 00000000 00000001 USER32!NtUserWaitMessage+0xc
00e8d0dc 7e424a06 03850000 03ab83f8 00000000 USER32!InternalDialogBox+0xd0
00e8d0fc 7e4247ea 03850000 03ab83f8 00000000 USER32!DialogBoxIndirectParamAorW+0×37
00e8d120 77fa9ef1 03850000 00001140 00000000 USER32!DialogBoxParamW+0x3f
00e8d188 7e2f3dc1 03850000 00001140 00000000 SHDOCVW!SHFusionDialogBoxParam+0x3a
00e8d1a8 7e2f43b4 00000000 00001140 00e8d1fc SHDOCVW!_ShowSafeOpenDialog+0×26
00e8f6d0 7ca4d2ec 00000000 0019400c 00194008 SHDOCVW!SafeOpenPromptForShellExec+0x2ce
00e8f6f0 7ca04173 0019400c 00000001 00194008 shell32!CShellExecute::_ZoneCheckFile+0×60
00e8f708 7ca040fa 0019400c 013ca5c0 00194008 shell32!CShellExecute::_VerifyZoneTrust+0x2a
00e8f72c 7ca03071 0019400c 00e8fa0c 00194008 shell32!CShellExecute::_VerifyExecTrust+0xa4
00e8f754 7ca02f6a 013ca5c0 00190f00 00e8fa0c shell32!ShellExecuteNormal+0×30
00e8f770 00a3b0aa 013ca5c0 11ad3aad 00000000 shell32!ShellExecuteExW+0x8d

If you are familiar with alternate data stream, you will notice VerifyZoneTrust immediately. When we dump the parameter passed in internaldialogbox, below is what we have

0:012> dc 03ab83f8 03ab83f8 + 512
03ab83f8  80c80ac0 00000000 00000011 01090000  …………….
03ab8408  000000a9 00000000 004d0008 00200053  ……….M.S. .
03ab8418  00680053 006c0065 0020006c 006c0044  S.h.e.l.l. .D.l.
03ab8428  00000067 50000080 00000000 000a000a  g……P……..
03ab8438  001400f5 ffff1141 00440082 0020006f  ….A…..D.o. .
03ab8448  006f0079 00200075 00610077 0074006e  y.o.u. .w.a.n.t.
03ab8458  00740020 0020006f 0070006f 006e0065   .t.o. .o.p.e.n.
03ab8468  00740020 00690068 00200073 00690066   .t.h.i.s. .f.i.
03ab8478  0065006c 0000003f 00000000 50000003  l.e.?……….P
03ab86f8  0069004c 006b006e 00540000 00690068  L.i.n.k…T.h.i.
03ab8708  00200073 00790074 00650070 006f0020  s. .t.y.p.e. .o.
03ab8718  00200066 00690066 0065006c 00630020  f. .f.i.l.e. .c.
03ab8728  006e0061 00680020 00720061 0020006d  a.n. .h.a.r.m. .
03ab8738  006f0079 00720075 00630020 006d006f  y.o.u.r. .c.o.m.
03ab8748  00750070 00650074 002e0072 004f0020  p.u.t.e.r… .O.
03ab8758  006c006e 00200079 00750072 0020006e  n.l.y. .r.u.n. .
03ab8768  006f0073 00740066 00610077 00650072  s.o.f.t.w.a.r.e.
03ab8778  00660020 006f0072 0020006d 00750070   .f.r.o.m. .p.u.
03ab8788  006c0062 00730069 00650068 00730072  b.l.i.s.h.e.r.s.
03ab8798  00790020 0075006f 00740020 00750072   .y.o.u. .t.r.u.
03ab87a8  00740073 0020002e 0041003c 0048003e  s.t… .<.A.>.H.
03ab87b8  0077006f 00630020 006e0061 00490020  o.w. .c.a.n. .I.
03ab87c8  00640020 00630065 00640069 00200065   .d.e.c.i.d.e. .
03ab87d8  00680077 00740061 00730020 0066006f  w.h.a.t. .s.o.f.
03ab87e8  00770074 00720061 00200065 006f0074  t.w.a.r.e. .t.o.
03ab87f8  00740020 00750072 00740073 003c003f   .t.r.u.s.t.?.<.
03ab8808  0041002f 0000003e 00000000 00000000  /.A.>………..
03ab8818  80c80ac0 00000000 00000011 01090000  …………….
03ab8828  000000a7 00000000 004d0008 00200053  ……….M.S. .
03ab8838  00680053 006c0065 0020006c 006c0044  S.h.e.l.l. .D.l.
03ab8848  00000067 50000080 00000000 000a000a  g……P……..
03ab8858  001400f5 ffff1141 00440082 0020006f  ….A…..D.o. .
03ab8868  006f0079 00200075 00610077 0074006e  y.o.u. .w.a.n.t.
03ab8878  00740020 0020006f 0070006f 006e0065   .t.o. .o.p.e.n.
03ab8888  00740020 00690068 00200073 00690066   .t.h.i.s. .f.i.
03ab8898  0065006c 0000003f 00000000 50000003  l.e.?……….P

We have a modal dialog box with message “Do you want to …, file can harm your computer…” and since windows service is not running in interactive desktop mode, you are not going to see the warning message.

on executing streams( downloadable from http://technet.microsoft.com/en-us/sysinternals/bb897440.aspx) on exe, we see the following output

D:\Tools\Streams>streams “d:\xxxxx.exe”

Streams v1.56 – Enumerate alternate NTFS data streams
Copyright (C) 1999-2007 Mark Russinovich
Sysinternals – www.sysinternals.com

:Zone.Identifier:$DATA       26

Zone.identifier is added as a security measure to every executable when downloaded from http or on a network share.

Since this application could be deployed using xcopy, so someone unzipped on a network share to copy the executable and of course windows service is now hung. Although, it is always a bad idea to launch exe but sometime there is no option so watch out for this scenario

Not reproducible Windows Forms Hang OnUserPreferenceChanged – If your application has hung once it will hang again

Problem Description

At one of our customer site, windows forms business application hung for a business user.  This issue  occurred only on one machine out of 100s of users. The way we look at it is, If your application has hung once on any machine that means it will hang again, so I volunteered to look into his machine armed with WinDBG.

Note

Executed command to check for syncblock/locks are for illustration purpose only

1. While debugging for hang/deadlock, the very first thing you need to do is look at the call stack for each thread

2.  If call to Systen.Threading.Monitor.Enter is at the top of the managed call stack and there is a corresponding *WaitFor* at the top of unmanaged call stack which you can get either by running command kb or execute !dumpstack to get managed/unmanaged both, in that case you can suspect deadlock in managed code and resolved it using !synblk command to find out which thread is owning the object

3. If the managed thread call stackdoesn’t have Thread.Monitor that means sycnblock will be pretty useless to find out the cause of deadlock

4. If thread has WaitOne or waiting on a handle in a call stack in that case, you should use dd command to find the handles for kernel handle deadlock or use !do command in managed stack along with !handle to find out which thread owns the handle

Analysis

1. Take a memory dump of the application while it is hung, no live debugging – least convenience to user

2. Open the dump file for analysis, since this is a pure managed app so first we will look at syncblock and then criticalsection locks

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
—————————–
Total           546
CCW             4
RCW             17
ComClassFactory 0
Free            460
0:000> !locks

CritSec <Unloaded_PI32.dll>+a00a92f at 0a00a930
LockCount          0
RecursionCount     0
OwningThread       0
*** Locked

3. Everything is fine so far, let’s look at all the thread stacks and the waithandles on interesting thread

0:000> ~*e!clrstack
OS Thread Id: 0xcf8 (0)

ESP       EIP
0012e4c4 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0012e4d8 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0012e4ec 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0012e58c 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0012e5c0 7b6f322f System.Windows.Forms.Control.Invoke(System.Delegate)
………………………………………………………………………………………………………….

0012eda4 0167834e MyApp.UI.RunStartup()
0012eff0 7b1c8906 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)

0012f3ac 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f3c0 01676974 MyApp.Program.RunSelectPlanForm()
0012f400 0187b525 MyApp.Program.Main(System.String[])

OS Thread Id: 0xf74 (9)
ESP       EIP

0710f204 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0710f218 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0710f22c 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0710f2cc 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0710f300 7b920717 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
0710f34c 7a922833 Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
0710f398 7a923d2f Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
0710f3b8 7aa8f184 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
0710f670 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0710f6a0 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0710f6b4 04edf271 MyApp.MyClass.Run()
0710f6e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()

4. Just to illustrate, You can get the handles on thread 0 and 9 from stack trace

0:009> ~0kb
ChildEBP RetAddr  Args to Child
0012e07c 7c90df2c 7c809574 00000002 0012e0a8 ntdll!KiFastSystemCallRet
0012e080 7c809574 00000002 0012e0a8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0012e11c 7e4195f9 00000002 0012e144 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0012e178 7752ebd6 00000001 0012e478 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0012e1a0 77557237 0012e478 000003e8 0012e1cc ole32!CCliModalLoop::BlockFn+0×80

0:009> ~9kb
ChildEBP RetAddr  Args to Child
0710edbc 7c90df2c 7c809574 00000002 0710ede8 ntdll!KiFastSystemCallRet
0710edc0 7c809574 00000002 0710ede8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0710ee5c 7e4195f9 00000002 0710ee84 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0710eeb8 7752ebd6 00000001 0710f1b8 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0710eee0 77557237 0710f1b8 000003e8 0710ef0c ole32!CCliModalLoop::BlockFn+0×80

WaitForMultipleObjects take the array of handles and we know the array size from function parameters which is 2. We can use dd to display memory range since we know the size

0:009> dd 0012e144 L2
0012e144  00000ac0 00000038
0:009> dd 0710ee84 L2
0710ee84  00000200 000007c4

Both of these threads are waiting on different handles.

5. After reviewing the call stack again you will see that Thread 0 is calling Invoke to send a message to a window created on Thread 9 and Thread 9 is calling Invoke to send a message to a window created on Thread 0.  After posting the message, Thread 0 is waiting so message is not getting pumped on thread 0 and the same applies to Thread 9.  When you look at MarshaledInvoke implementation , after posting a message it waits on a event handle.

Why did this occur?

Thread 0 is the main UI thread and Thread 9 is a worker thread creating a splasher screen and running in STA apartment to pump message, below is the code snippet which creates a new thread and message loop for splasher on a worker thread.

SplashThread = new Thread(threadStart);
SplashThread.IsBackground = true;
SplashThread.SetApartmentState(ApartmentState.STA);
SplashThread.Start();
By default, Controls subscribe to UserPreferenceChanged Event and this event can get fired when system wide setting is changed, for example – changing the system settings including desktop background or a theme. It just so happened that business user changed his system settings when this app was running with an active splash screen and the timing was such that controls subscribed to UserPreferenceChanged was in the middle of processing a message on MainUI thread. Splasher thread sends a message to MainUI which is not processed yet and MainUI thread sends a message to splasher window to change status which results in a deadlock.

Please refer to a very good article on UserPreferenceChanged event and other gotcha/deadlock/issues which you can run into with a window created on worker thread