High CPU Usage and Windows Forms application hang with SQLCE database and the SqlCeLockTimeoutException

Problem Description

Windows froms application on a user machine is consuming high cpu resource and application appears to be hung.

Analysis

Usually, the cause of high cpu hang is

1. infinite loop

2. machine is busy

3. High CPU in GC

4. Thread is spinning with sleep time in 100s of milliseconds

We will start with collecting the user mode memory dump of windows forms application. We should always collect more than one dump to analyze hang scenario. I prefer to use adplus script with -r 3 300″ to collect where 3 specifies the number of times ADPlus will run in hang mode and 300 is the interval in seconds between each run.

1. Lets open the memory dump using WinDbg

2. The first command to run to analyze hang is !runaway extension to find out how long each of the threads have been running for

0:000> !runaway 3
User Mode Time
Thread       Time
0:218       0 days 0:08:38.359
8:a50       0 days 0:02:29.968
Kernel Mode Time
Thread       Time
8:a50       0 days 0:03:23.875
0:218       0 days 0:02:06.046

I have listed the interesting threads. Thread # 0 has consumed 2 mins in kernel mode and more than 8 mins in user mode and thread # 8 has consumed nearly equal amount of time around 2:xx in user and kernel mode.

3. Lets look at thread stack of each of thread # 0 and thread # 8

0:000> ~8e!clrstack
OS Thread Id: 0xa50 (8)
ESP       EIP
06d8f604 7c90e4f4 [InlinedCallFrame: 06d8f604] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
06d8f600 7b1d8e48 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
06d8f69c 7b1d8937 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
06d8f6f0 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
06d8f720 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
06d8f734 05893609 MyApp.Splasher.ShowThread()

We can ignore thread 8 because this is displaying splasher screen and pumping the message.

0:000> ~0e!clrstack
OS Thread Id: 0×218 (0)
ESP       EIP
0012e4f0 7c90e4f4 [NDirectMethodFrameStandalone: 0012e4f0] System.Data.SqlServerCe.NativeMethods.CompileQueryPlan(IntPtr, System.String, System.Data.SqlServerCe.ResultSetOptions, IntPtr[], IntPtr, Int32, IntPtr ByRef, IntPtr)
0012e518 07d72c57 System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()
System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String,

In Thread # 0 ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

We will now look at unmanaged stack

0:000> ~0kb
ChildEBP RetAddr  Args to Child
0012cd18 7c802455 0000007f 00000000 0012cd44 kernel32!SleepEx+0×61
0012cd28 7d57b50e 0000007f 04c105a8 0012cda4 kernel32!Sleep+0xf
0012cd44 7d57ffa0 00001388 00000002 04c105a8 sqlcese35!Session::_WaitForLock+0x5f
0012cd80 7d580b6b 0176a388 01764778 00000001 sqlcese35!Session::_Lock+0×280
0012cdcc 7d5571bb 0012cdf8 0000000c 00000003 sqlcese35!Session::LockRow+0xba
0012ce20 7d56c74e 004ad00c 04c105a8 003ff978 sqlcese35!BaseMover::_SetCurrentBookmark+0×88
0012ce34 7d56cad0 00000000 00000001 04c105a8 sqlcese35!IndexMover::_SetCurrentRow+0×80
0012ceb0 7d56cb52 0000000d 003ffa98 00000001 sqlcese35!IndexMover::_Seek+0×191
0012ced4 7d565e5c 00000000 00000000 04c105a8 sqlcese35!IndexMover::Seek+0x5a
0012cef4 7d57be24 003ff04c 00000000 00000000 sqlcese35!Cursor::Seek+0×88
0012cf24 7d7ba8fa 04bd4788 0180c8ce 0181963c sqlcese35!Session::OpenHistogram+0xcb
0012cf80 7d7b31f2 00000000 0180c8ce 0181963c sqlceqp35!StatMan::LoadHistogram+0×53
0012cfac 7d7b3558 04b9b31c 0180c8a4 0180f874 sqlceqp35!QPIndexSchemaInfo::LoadHistogramHelper+0×57
0012d524 7d79dab5 04b9b31c 0180c8a4 0180c8a4 sqlceqp35!QPIndexSchemaInfo::LoadHistogram+0×62
0012d54c 7d7b8b23 04b9b31c 00024f1c 00000069 sqlceqp35!QPIScanAccessPlan::CostIndexScan+0×56
0012d5fc 7d79dea3 04b9b31c 0012d818 0180c8a4 sqlceqp35!QPIndexList::Matches+0×265
0012d6a8 7d79e339 04b9b31c 0012d818 01802274 sqlceqp35!RelBaseBlockPlan::CreateIScanPlan+0x7d
0012d6f8 7d79f9ff 04b9b31c 0012d818 0012d728 sqlceqp35!RelBaseBlockPlan::CreatePlan+0×130

Let’s understand what’s going on in this call stack

Reader thread calls command.ExecuteReader(CommandBehavior.SingleRow)

ExecuteReader makes a call to ExecuteCommand

ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

The below steps are getting executed in umanaged

  • Unmanaged code creates the query plan sqlceqp35!RelBaseBlockPlan::CreatePlan
  • Reader Thread’s Query plan has created a query plan
  • Cursor is performing index seek on predicate column = @columnId
  • Index seeker sets the current row during seek
  • sql ce session tries to lock the current row
  • Thread just spins on lock with Sleep time of 127 milliseconds

Since thread sleep time is only 127 milliseconds so that explains the high cpu usage. So why is it spinning on lock forever.

4. Since this is a managed application, we should always look at all the exception objects on managed heap

0:000> !dumpheap -type Exception
Address       MT     Size
………………………………..
173e4aec 07d84d68       76
Statistics:
MT Count TotalSize Class Name
……………………………………………………
07d84d68       32         2432
System.Data.SqlServerCe.SqlCeLockTimeoutException
Total 44 objects

0:000> !pe 173e4aec
Exception object: 173e4aec
Exception type: System.Data.SqlServerCe.SqlCeLockTimeoutException
Message:
InnerException:
StackTrace (generated):
SP IP Function
0012E4A0 07D72D63 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()+0x24b
0012E560 07D73D4C System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String, System.Data.SqlServerCe.ResultSetOptions)+0x23c
0012E5A0 07D73EE6 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteReader(System.Data.CommandBehavior)+0×16
0012E5A4 04A8C52D GlobalPlanning_Sync!GlobalPlanning.Sync.PlanetSyncManager.OnSyncProgress(System.Object, Microsoft.Synchronization.Data.SyncProgressEventArgs)+0x81d

Dumping the exception object confirms the same call stack as thread 0

You can get the error details by dumping the heap for

0:000> !dumpheap -type SqlServerCe.SqlCeError
Address       MT     Size
173e4a74 07d826dc       36
total 64 objects
Statistics:
MT    Count    TotalSize Class Name
……………………………………………………………………………………….
07d826dc       32         1152 System.Data.SqlServerCe.SqlCeError
Total 64 objects

You can get the formatted error message by dumping System.Data.SqlServerCe.SqlCeError.formattedMessage

0:000> !do 173762b0
SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 2,Thread id = 536,Process id = 3564,Table name = __SysObjects,Conflict type = s lock (x blocks),Resource = RID: 1197:12 ]

In above error message, you will notice that Resource = RID which is a row identifier that means a single row in a table is locked. Conflict type = s lock which means that this is in shared lock mode for read access.

This is confirmed by dumping SQLCECommand object on managed heap that select statement is being executed.

Another point to note is, with default isolation level SELECT statement in SQL doesn’t require S lock unless it is in a transaction.

SQL CE compact edition online book says that

“When a statement has waited longer than the LOCK_TIMEOUT setting, the blocked statement is canceled automatically, and the error message SSCE_M_LOCKTIMEOUT, “The system timed out waiting for a lock,” is returned to the application.However, SQL Server Compact 3.5 does not roll back or cancel any transaction that contains the statement. The application must have an error handler that can trap the error message SSCE_M_LOCKTIMEOUT. If an application does not trap the error, it can proceed without knowing that an individual statement within a transaction has been canceled. Errors can occur because statements later in the transaction might depend on the statement that was not executed.”

However, if your database file is fragmented or large its possible that your select query may not run with default time out of 5 seconds in desktop so it will keep on throwing locktimeout exception every 5 seconds for every select query.

There is only one connection object on managed heap and no other threads are executing the SQLCECommand so this is most likely a fragmented database because this issue gets resolved after database is compacted.

However, we should also look at thread stack memory to find any hidden exception to make sure there is no other handled exception resulting in not leaving the spinlock

you can get the Thread Environment Block stack address using !teb command and use dds command to dump words and symbols

0:000> !teb
TEB at 7ffdf000
ExceptionList:        0012cd08
StackBase:            00130000
StackLimit:           00126000

0:000> dds 00126000 00130000

………………………………………………………………………………………….

0012df94  7d58272d sqlcese35!SpinLock::Enter+0xd
0012dfa8  7d55bc0b sqlcese35!AutoSpinLock::AutoSpinLock+0×17
0012dfac  7d55bc48 sqlcese35!AutoSpinLock::Leave+0×12
0012dfb4  7d57c150 sqlcese35!Session::GetSyncSession+0x4c
0012dfc8  7d58ebbd sqlcese35!TrackingUtils::GetSyncSessionInfo+0×19
0012dfcc  7d58ebf4 sqlcese35!TrackingUtils::GetSyncSessionInfo+0×50

……………………………………………………………………………………………….

Only interesting bits are shown above for brevity but there is no exception dispatcher found on raw stack memory so that confirms that no exception has been thrown in this thread call stack

Resolution and Workaround

1. There is no way for you to determine in SQL CE whether a resource is locked before locking it, except to attempt to access the data catch the time out exception, so one workaround is to have a configurable SQL CE lock timeout so that when SQL CE database is fragmented over time and when you get an exception you can increase the timeout value

2.  SQL CE database is file based loaded in process’s address space so  SQL CE database can become fragmented over time and fragmentation causes some serious performance issue. To avoid fragmentation,  you can compact the database when locktimeout exception is thrown

However, if the lock is due to coding error in that case review your code path but in this case there is only one connection object with a command object on one thread so this was a fragmentation issue and locktimeout issue did get resolved after compacting the database.

Share

Debugging compilation failure(cl.exe) using windbg – fatal error C1902: Program database manager mismatch; please check your installation

Problem Description:

while trying to build a win32 console project from visual studio 2008, the error message is “fatal error C1902: Program database manager mismatch; please check your installation”

I decided to use WinDbg to find the root cause.

Steps to Debug

1. Since this fails during compilation, so we have to attach WinDbg to “cl.exe”and since visual studio 2008 is launching cl.exe so you have to auto-attach it (you can directly execute cl.exe)

2. In order to auto-attach a process to debugger, you can use “gflags.exe”(exists in WinDbg installation folder)

3. See the snapshot below on how to use gflags to auto attach to a process cl.exe when launched

Gflags Snapshot

4. Build your project and you will see WinDbg getting launched and attached to cl.exe

5. hit f5 and execute getlasterror command to see the details when stopped on exception
0:000> !gle
LastErrorValue: (Win32) 0 (0) – The operation completed successfully.
LastStatusValue: (NTSTATUS) 0xc0000135 – {Unable To Locate Component} This application has failed to start because %hs was not found. Re-installing the application may fix this problem.

6. Below is the loader message matching the last error description

LDR: LdrGetDllHandle, searching for C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\mspdbsrv.exe from ……
kLDR: LdrpCheckForLoadedDll – Unable To Locate C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\mspdbsrv.exe ……………..

7. Basically, build is failing because cl.exe(compiler) can’t find mspdbsrv.exe for symbols.

You can copy exe in the path(C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\) or change the registry to point to the correct location. It appears that visual studio 2005 upgrade may have caused this issue.

Share

The call to LoadLibrary(sos2) failed, Win32 error 0n14001, “This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.”

Mike Semikin has reported this issue while executing windbg command .load sos2

The call to LoadLibrary(sos2) failed, Win32 error 0n14001
“This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.”
Please check your debugger configuration and/or network access.

Resolution
This is because visual studio 2008 is used as a development environment for sos2 and CRT is installed in assembly cache(winSxS) folder so dependent dll will be missing

As an immediate workaround, please install visual c++ 2008 runtime redistributable from Microsoft,  download from http://www.microsoft.com/Downloads/details.aspx?familyid=9B2DA534-3E03-4391-8A4D-074B9F2BC1BF&displaylang=en

Share

Windows Forms – System.OutOfMemoryException: Screen-compatible bitmap cannot be created. The screen bitmap format cannot be determined.

Description

Occasionally, Windows Forms application user will get an error message as shown in below snapshot.
screen bitmap error

Analysis

If you have done win32 development, you will immediately suspect GDI handles. I would like to point out the steps which you can take to troubleshoot and confirm if there is a GDI Leak in a process.

When a user reports this issue which is not reproducible at will, in that case you can ask user to launch task manager and go to view->select columns and check Handles and GDI Handles. Even better, if you can use process explorer.

Things to remember

1. 32 bit Windows OS(XP and above) uses DWORD which is 4 bytes for either user handle or gdi handle. Most significant 16 bits are used for GDI handles so 2^16 = 65536 is a theoretical limit on number of GDI or USER handles a Win32 Process can have.

2. Default value of number of handles configured in a 32 bit operating system are 10k which is configurable in the registry

3. “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\GDIProcessHandleQuota” has the default value of GDI handles

4. “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\USERProcessHandleQuota” has the default value of User Handles

5. GDI handles are not just bitmaps, but it can be any device context object for example, Font, Pen, DC etc

I have not dissected GDI Handle yet so I can’t get into too much detail on GDI data structure on Windows XP. However, another point to remember is GDI kernel data structure is stored in a system paged pool which is a physical memory resident so what that means is although the theoretical limit on number of GDI handles is almost 65k per process but that doesn’t mean your process can have 65k GDI handles so before changing the registry you need to think twice and then twice again, does your application really need more than 10k GDI Handles?


Below is the number of GDI Handles as shown in a process explorer’s snapshot taken at customer site when this error occurred

screen bitmap error

As usual, we need to collect the hang dump at the time of this exception


1. I really don’t know how to get the list of gdi handles from a memory dump, there used to be a gdi windbg extension dll from microsoft but that has been discontinued with windows xp.


2. if you look at process explorer snapshot as shown above, you will notice that the number of GDI Handles are 9999 and after that it fails to create the compatible bitmap.


3. Let’s open the hang dump and you can still run !analyze -v command to get the exception record


EXCEPTION_OBJECT: !pe 1a2b14ec
Exception object: 1a2b14ec
Exception type: System.OutOfMemoryException
Message: Screen-compatible bitmap cannot be created. The screen bitmap format cannot be determined.
MANAGED_STACK:
SP               IP                                      Function
0012E46C  7AE32C56
System_Drawing_ni!System.Drawing.BufferedGraphicsContext.bFillBitmapInfo(IntPtr, IntPtr, BITMAPINFO_FLAT ByRef)
…………………………………..
System_Drawing_ni!System.Drawing.BufferedGraphicsContext.Allocate(IntPtr, System.Drawing.Rectangle)
0:000> !ip2md 7AE32C56
MethodDesc: 7adff018
Method Name: System.Drawing.BufferedGraphicsContext.bFillBitmapInfo(IntPtr, IntPtr, BITMAPINFO_FLAT ByRef)
0:000> !dumpil 7adff018
ilAddr = 7aeb0f64
IL_0000: ldsfld System.IntPtr::Zero
IL_0005: stloc.0
IL_0006: ldc.i4.0
IL_0007: stloc.1
.try
{
IL_0008: ldnull
IL_0009: ldarg.1
IL_000a: newobj System.Runtime.InteropServices.HandleRef::.ctor
IL_000f: ldc.i4.1
IL_0010: ldc.i4.1
IL_0011: call System.Drawing.SafeNativeMethods::CreateCompatibleBitmap
IL_0016: stloc.0
IL_0017: ldloc.0
IL_0018: ldsfld System.IntPtr::Zero
IL_001d: call System.IntPtr::op_Equality
IL_0022: brfalse.s IL_0034
IL_0024: ldstr “GraphicsBufferQueryFail”
IL_0029: call System.Drawing.SR::GetString
IL_002e: newobj System.OutOfMemoryException::.ctor

IL_0033: throw


Dump the IL for the corresponding IP, you will notice that it throws System.OutOfMemoryException when Native Method call ( GDI32.dll!CreateCompatibleBitmap) returns IntPtr.Zero.


4. This confirms that we have an issue here with GDI Handles, since this is a managed application so we can find the System.Drawing, Font, Pen and other GDI objects surviving the Garbage Collection.


5. You can just dump GDI objects on managed heap and find out where the objects are rooted at


6.  In this particular case, the culprit was System.Windows.Forms.Control+FontHandleWrapper

Below is the finalizequeue stats

0:000> !finalizequeue
SyncBlocks to be cleaned up: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
———————————-
generation 0 has 34 finalizable objects (19826184->1982620c)
generation 1 has 6 finalizable objects (1982616c->19826184)
generation 2 has 84919 finalizable objects (197d3290->1982616c)
Ready for finalization 0 objects (1982620c->1982620c)

………………………………………………………………………………………………………………….

7b225228     9811       156976 System.Windows.Forms.Control+FontHandleWrapper
7b21e930     9816      1649088 System.Windows.Forms.TextBox
……………………………………………………………………………………………………………………
Total 84959 objects

and the call stack for HFONT gdi handle leak is belowBelow is the call stack to set a new Font object for every TextBox Control

System.Windows.Forms.Control+FontHandleWrapper..ctor(System.Drawing.Font)
System.Windows.Forms.Control.get_FontHandle()
System.Windows.Forms.Control.get_FontHandle()
System.Windows.Forms.Control.SetWindowFont()
System.Windows.Forms.Control.OnHandleCreated
This Windows Forms application maintains a collection of controls and a new font gets created everytime but never gets disposed along with the controls

And the root cause for Control not getting disposed is

022d28f8(System.Windows.Forms.ToolStrip)->
022d2644(MyApp.Controls.UserControl)->
022d6ed4(MyApp.MyPanel)->
022d84d0(System.Windows.Forms.ContextMenuStrip)->
021546f4(System.Windows.Forms.TextBox)->
02154608(System.Windows.Forms.PropertyStore)->

a. Form contains ContextMenuStrip member

b. Each of the created Control has a reference to ContextMenuStrip(Control.ContextMenuStrip = Form.ContextMenuStrip)

c. Even after a Control is removed, it still holds reference to ContextMenuStrip which is a member of Main Form

d. Unless Main Form is disposed, Controls will survive GC although Dispose is called on each of the Control on OnControlRemoved Event

e. This issue can be easily resolved by setting Control.ContextMenuStrip = null when a control is removed

Share

Not reproducible Windows Forms Hang OnUserPreferenceChanged – If your application has hung once it will hang again

Problem Description

At one of our customer site, windows forms business application hung for a business user.  This issue  occurred only on one machine out of 100s of users. The way we look at it is, If your application has hung once on any machine that means it will hang again, so I volunteered to look into his machine armed with WinDBG.

Note

Executed command to check for syncblock/locks are for illustration purpose only

1. While debugging for hang/deadlock, the very first thing you need to do is look at the call stack for each thread

2.  If call to Systen.Threading.Monitor.Enter is at the top of the managed call stack and there is a corresponding *WaitFor* at the top of unmanaged call stack which you can get either by running command kb or execute !dumpstack to get managed/unmanaged both, in that case you can suspect deadlock in managed code and resolved it using !synblk command to find out which thread is owning the object

3. If the managed thread call stackdoesn’t have Thread.Monitor that means sycnblock will be pretty useless to find out the cause of deadlock

4. If thread has WaitOne or waiting on a handle in a call stack in that case, you should use dd command to find the handles for kernel handle deadlock or use !do command in managed stack along with !handle to find out which thread owns the handle

Analysis

1. Take a memory dump of the application while it is hung, no live debugging – least convenience to user

2. Open the dump file for analysis, since this is a pure managed app so first we will look at syncblock and then criticalsection locks

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
—————————–
Total           546
CCW             4
RCW             17
ComClassFactory 0
Free            460
0:000> !locks

CritSec <Unloaded_PI32.dll>+a00a92f at 0a00a930
LockCount          0
RecursionCount     0
OwningThread       0
*** Locked

3. Everything is fine so far, let’s look at all the thread stacks and the waithandles on interesting thread

0:000> ~*e!clrstack
OS Thread Id: 0xcf8 (0)

ESP       EIP
0012e4c4 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0012e4d8 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0012e4ec 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0012e58c 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0012e5c0 7b6f322f System.Windows.Forms.Control.Invoke(System.Delegate)
………………………………………………………………………………………………………….

0012eda4 0167834e MyApp.UI.RunStartup()
0012eff0 7b1c8906 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)

0012f3ac 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f3c0 01676974 MyApp.Program.RunSelectPlanForm()
0012f400 0187b525 MyApp.Program.Main(System.String[])

OS Thread Id: 0xf74 (9)
ESP       EIP

0710f204 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0710f218 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0710f22c 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0710f2cc 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0710f300 7b920717 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
0710f34c 7a922833 Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
0710f398 7a923d2f Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
0710f3b8 7aa8f184 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
0710f670 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0710f6a0 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0710f6b4 04edf271 MyApp.MyClass.Run()
0710f6e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()

4. Just to illustrate, You can get the handles on thread 0 and 9 from stack trace

0:009> ~0kb
ChildEBP RetAddr  Args to Child
0012e07c 7c90df2c 7c809574 00000002 0012e0a8 ntdll!KiFastSystemCallRet
0012e080 7c809574 00000002 0012e0a8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0012e11c 7e4195f9 00000002 0012e144 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0012e178 7752ebd6 00000001 0012e478 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0012e1a0 77557237 0012e478 000003e8 0012e1cc ole32!CCliModalLoop::BlockFn+0×80

0:009> ~9kb
ChildEBP RetAddr  Args to Child
0710edbc 7c90df2c 7c809574 00000002 0710ede8 ntdll!KiFastSystemCallRet
0710edc0 7c809574 00000002 0710ede8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0710ee5c 7e4195f9 00000002 0710ee84 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0710eeb8 7752ebd6 00000001 0710f1b8 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0710eee0 77557237 0710f1b8 000003e8 0710ef0c ole32!CCliModalLoop::BlockFn+0×80

WaitForMultipleObjects take the array of handles and we know the array size from function parameters which is 2. We can use dd to display memory range since we know the size

0:009> dd 0012e144 L2
0012e144  00000ac0 00000038
0:009> dd 0710ee84 L2
0710ee84  00000200 000007c4

Both of these threads are waiting on different handles.

5. After reviewing the call stack again you will see that Thread 0 is calling Invoke to send a message to a window created on Thread 9 and Thread 9 is calling Invoke to send a message to a window created on Thread 0.  After posting the message, Thread 0 is waiting so message is not getting pumped on thread 0 and the same applies to Thread 9.  When you look at MarshaledInvoke implementation , after posting a message it waits on a event handle.

Why did this occur?

Thread 0 is the main UI thread and Thread 9 is a worker thread creating a splasher screen and running in STA apartment to pump message, below is the code snippet which creates a new thread and message loop for splasher on a worker thread.

SplashThread = new Thread(threadStart);
SplashThread.IsBackground = true;
SplashThread.SetApartmentState(ApartmentState.STA);
SplashThread.Start();
By default, Controls subscribe to UserPreferenceChanged Event and this event can get fired when system wide setting is changed, for example – changing the system settings including desktop background or a theme. It just so happened that business user changed his system settings when this app was running with an active splash screen and the timing was such that controls subscribed to UserPreferenceChanged was in the middle of processing a message on MainUI thread. Splasher thread sends a message to MainUI which is not processed yet and MainUI thread sends a message to splasher window to change status which results in a deadlock.

Please refer to a very good article on UserPreferenceChanged event and other gotcha/deadlock/issues which you can run into with a window created on worker thread

Share

ASP.NET Worker Process Recycle/.NET Application crash – What not to do in Finalizer – includes asp.net 2.0 debugging lab

Click to Download debugging lab

Debugging Labs

SFTSRC.StockTrader asp.net 2.0 web application will be used to introduce you to a series of debugging labs. We will start with a simple asp.net application which crashes and follow up with memory leak, memory fragmentation, application performance and application hang.

The first lab in series will show you how an application can crash if you are not being careful while implementing finalizer.

Issue Description

Website user gets “Server Application Unavailable error” message from time to time.

ASP.NET Debugging Lab

This lab is prepared based on a production issue resolved at a customer site. Please note that you can download this debugging lab from here. Web project/solution has been created using visual studio 2008.

What will you learn

1. How to debug asp.net application

2. How to identify what is causing asp.net application pool to recycle

3. WinDbg / sos basic commands to analyze a crash dump

Application Architecture

architecture

Breaking down your thought process

1.  Since website user gets to see a generic error handler page, you will need to identify why http request is failing.

2. As shown in the above diagram, website and webservice are hosted on two different servers, so you need to identify the server where request is failing

3. You should look at the http error logs/iis logs and the events logs on each server to identify the issue

4. Logs will also help you determine if there is a network issue

Analysis

After looking at the server logs, we have determined that worker process is getting recycled on a sever hosting Sftsrc Stock Trading WebService. A worker process can get recycled depending on the configuration, low on virtual memory, crash due to unhandled exception etc. Let’s assume that you don’t have any information on what has caused worker process to recycle

Let’s look at the lab which is prepared based on the above issue description.

Lab

We will analyze this issue using Lab01- Download it from here.

1. Once you download the lab, create 2 virtual directory “StockTrader” and “TradeWS”

2. “StockTrader” will be set to folder “\Lab01\SFTSRC.StockTrader”

3. “TradeWS” will be set to folder “\Lab01\SFTSRC.UnmanagedService”

4. Browse to http://localhost/StockTrader/Default.aspx

You will see the website as shown below
debugging lab 01 snapshot

5. Enter Stock Symbol, Quantity, Price and click on Submit Order

6. Once you submit an order, you will see “Server Unavailable Error” page in a few seconds.

7. If you look at the iis log, you will see http 503 error

please visit http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for description on HTTP 1.1 status code

8. Event log will have the error message “asp.net process recycle or stopped unexpectedly.”

9. Attach a debugger to your worker process

10. we will use adplus configuration file to get a crash dump on first chance access violation.

11. download accessviolation.cfg from here

12. go to your windbg folder and run the following command

cscript.exe adplus.vbs -c accessviolation.cfg

13. Browse to http://localhost/StockTrader/Default.aspx again

14. Once you get “server unavailable error”, you should have the dump files created under your windbg folder

15. Please note that adplus configuration file has  <ProcessName>aspnet_wp.exe</ProcessName> by default, if you are using windows server then change it to <ProcessName>w3wp.exe</ProcessName>. You may have multiple dumps created if you have more than one application pool. You can identify the worker process id for the application pool hosting TradeWS and then you can create a dump based on process id.

16. Open the dump file( *****.EXE__1st_chance_AccessViolation__full_*****c.dmp) using windbg

17.  Run following command to load sos dll

0:009> .loadby sos mscorwks

18. You should always run !analyze -v when you are doing a crash dump analysis, most of the time this one command is enough to tell you what has caused the crash

0:009> !analyze -v

Attempt to read from address 00000000

DEFAULT_BUCKET_ID:  NULL_POINTER_READ

PROCESS_NAME:  aspnet_wp.exe

MANAGED_STACK:
0195F948 05CD0848 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.GetTLS()+0×20
0195F950 05CD07E6 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.Finalize()+0x1e

I have omitted most of the output for the sake of brevity.

19. You will notice that application has crashed while executing Finalize

20. On a side note you can look at all the Exception objects on managed heap with the following command

0:009> !dumpheap -type Exception

21. disassemble managed method Finalize using IP

0:009> !u 05cd07e6

05cd07e0 ff15c005cc05    call    dword ptr ds:[5CC05C0h] (SFTSRC.UnmanagedService.UnmanagedService.GetTLS(), mdToken: 06000028)
>>> 05cd07e6 8bce            mov     ecx,esi

· You will notice that Exception is occurring while calling GetTLS, where it tries to assign the return value

22. On a side note, you can use View->Registers window to look at registers for the current thread or use r command

23. disassemble managed method GetTLS using IP

0:009> !u 05cd0848
05cd0841 e81a085b73      call    mscorlib_ni+0x1c1060 (79281060) (System.Threading.Thread.GetData(System.LocalDataStoreSlot), mdToken: 060012f3)
05cd0846 8bc8            mov     ecx,eax
>>> 05cd0848 8b01            mov     eax,dword ptr [ecx]

24. You will notice that this is occurring right while calling to System.Threading.Thread.GetData() and ecx register is null

25. SOS dll has powerful command to look at clr data structure. We are going to look at GetTLS() implementation by dumping the IL(intermediate language) of this method

26. in order to dumpil, you have to first convert instruction pointer IP to Method Descriptor

0:009> !ip2md 05cd0848
MethodDesc: 05cc05b8
Method Name: SFTSRC.UnmanagedService.UnmanagedService.GetTLS()
Class: 05c54628

0:009> !dumpil 05cc05b8
ilAddr = 05c72250
IL_0000: ldstr “ThreadSpecificData”
IL_0005: call System.Threading.Thread::GetNamedDataSlot
IL_000a: stloc.0
IL_000b: ldloc.0
IL_000c: call System.Threading.Thread::GetData
IL_0011: callvirt System.Object::ToString
IL_0016: callvirt System.String::get_Length
IL_001b: ret

27. As shown in above IL, it has crashed while converting System.Threading.Thread::GetData object to String so that means System.Threading.Thread::GetData is returning NULL

Conclusion

Application Pool crashed on finalizer thread while executing Thread.GetData(). If you are not familiar with Thread Local Storage, please read it on MSDN http://msdn.microsoft.com/en-us/library/ms686749.aspx . Thread.GetData get the allocated data from thread slot but the issue here is, thread data slots are unique per thread. No other thread (not even a child thread) can get that data. Please see more details on MSDN http://msdn.microsoft.com/en-us/library/system.threading.thread.getdata.aspx . We all know that Finalizer runs on a separate thread and a finalizer thread wakes up when it the Q is notified.

You will see that in SFTSRC.UnmanagedService.UnmanagedService.CreateTLS() method, we are allocating the data slot which is in a totally different thread that’s why Finalizer thread doesn’t have access to thread specific data and resulting in crash.

You can run the following command to look at all the thread stacks

0:009> ~*e!clrstack

OS Thread Id: 0x11cc (20)
ESP       EIP
05c4ec64 7c90e4f4 [HelperMethodFrame: 05c4ec64] System.GC.WaitForPendingFinalizers()
05c4ecb4 05cd07aa SFTSRC.UnmanagedService.UnmanagedService.CreateTLS()
05c4ecc0 05cd06e8 TradeWebService.OrderService.SendOrder(System.String, UInt32, Int32, System.String, System.String)

You will notice that, while sending the order, we create a TLS and then in order to force GC and Finalizer, CreateTLS() method calls GC.Collect and GC.WaitForPendingFinalizers.

Lesson Learnt

Never access a thread specific data in Finalizer. This lab uses Thread Local Storage to show you the thread specific data but it could be anything. For example, in windows application, thread which creates a handle or control owns it so you can’t access those in finalizer.

Exercise

1. use !tls command to look at Thread Local Storage slot

2. use !teb to get familiar with thread environment block and how to use it with !tls

3. ~ command to display all the threads and get their corresponding TEB

4. Search memory to find which thread has created the slot

Download debugging lab

Share

OutOfMemoryException in System.Drawing.Image, why??

Windows Forms application has been throwing OutOfMemoryException although there is enough virtual memory in the process.  When you have OutOfMemoryException usually callstack is not helpful unless of course you have blocked finalizer thread or some deadlock preventing objects from being gced but in this scenario whenever OutOfMemoryException is thrown it will always point to the same call stack.

at System.Drawing.Image.FromFile(String filename, Boolean useEmbeddedColorManagement)
at System.Drawing.Image.FromFile(String filename)

Let’s look at System.Drawing.Image implementation using Reflector.  Just read the documentation below

Creates an Image from the specified file using embedded color management information in that file.

Exception Condition
OutOfMemoryException The file does not have a valid image format.-or- GDI+ does not support the pixel format of the file.
FileNotFoundException The specified file does not exist.


My first reaction to this is why? Why would you throw OutOfMemoryException if the file is corrupted or GDIPlus doesn’t support the pixel format.

I really don’t know why does System.Drawing library choose to throw OutOfMemoryException exception? This is only going to add to your worries.

Share

Memory Access Violation in SQL Server Compact Edition(CE)

Scenario

Windows Forms Application is throwing first chance memory access violation exception. Windows Forms application implements Application.ThreadException to log any unhandled exceptions in UI Threads. Log file always have the call stack  SqlCeCommand.ExecuteResultSet->SqlCeCommand.CompileQueryPlan->[NativeMethod]CompileQueryPlan(native sql ce dll). This application is using only one thread to execute commands in SQL CE so any multithreading issue is ruled out.

Some Ranting

Windows forms application in question is a pure .net application with no interop layer so the immediate suspicion was on upgraded SQL CE 3.5 SP1. There has been a few memory access violation in the past which apparently got resolved after installing 3.5. That made it very easy to blame it on Microsoft SQL CE dll. However, I do have to agree that Microsoft SQL CE exception handling is not as good as other .NET libraries. You will get “some weird – not making any sense” exception when you have a parameter missing or the column data type in your parametrized sql query.

Steps to Resolution using WinDbg

1. Get a full memory dump and share with microsoft support team

2. Since this is a first chance memory access violation that means you need to get a full dump on first chance exception

3. Create a adplus configuration file as shown below

<ADPlus>
<Settings>
<RunMode>CRASH</RunMode>
<Option>Quiet</Option>
<ProcessName>MyApp.exe</ProcessName>
</Settings>
<Exceptions>
<Option>NoDumpOnFirstChance</Option>
<Config>
<Code>clr;av</Code><!–to get the full dump on clr access violation–>
<Actions1>FullDump</Actions1>
<ReturnAction1>gn</ReturnAction1>
</Config>
</Exceptions>
</ADPlus>

4. run cscript.exe adplus.vbs -c <config file name>

5. Analyze the memory dump with 1st chance access violation using windbg

FAULTING_IP:
kernel32!InterlockedExchange

and if we dump the managed stack and look at other threads nothing unusual and the manged stack is just pointing to SqlCeCommand.CompileQueryPlan

It didn’t make much sense to us and we don’t have enough time to dig into it without private symbols for sql ce native dll.

We may be getting wrong call stack because of Heap corruption, so let’s get another full memory after enabling the full page heap. You can google on full page heap in case you have never used it or for quick reference visit http://msdn.microsoft.com/en-us/library/ms220938(VS.80).aspx

You can enable full page heap with the following command gflags /p /enable myapp.exe /full which adds an entry into your system registry and one should always disable it once done, visit http://msdn.microsoft.com/en-us/library/cc265936.aspx for more information, gflags gets installed with debugging tools for windows.

6. Get a full memory dump after enabling the full page heap

7. Analyze the dump, lets look at the call stack on exception thread . Run sos!dumpstack command without -EE , if you want to look at the managed and unmanaged both at the same time.Exception is thrown while Executing the resultset

0:000> !dumpstack
OS Thread Id: 0x12bc (0)
Current frame: sqlceme35!ME_GetKeyInfo+0×36
ChildEBP RetAddr  Caller,Callee
0012e0f8 79ec6feb mscorwks!DoSpecialUnmanagedCodeDemand+0×65, calling mscorwks!_EH_epilog3
0012e124 08326091 08326091
0012e154 085d935c (MethodDesc 0x857973c +0x1dc System.Data.SqlServerCe.SqlCeDataReader.FillMetaData(System.Data.SqlServerCe.SqlCeCommand)), calling 08588150
0012e1cc 085e38e4 (MethodDesc 0x857a7fc +0×44 System.Data.SqlServerCe.SqlCeCommand.InitializeDataReader(System.Data.SqlServerCe.SqlCeDataReader, Int32)), calling 08588cec
0012e1dc 085e3c6e (MethodDesc 0x857a7b8 +0x15e System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String, System.Data.SqlServerCe.ResultSetOptions)), calling 085897d4
0012e218 085e3e8b (MethodDesc 0x857a7a8 +0x2b System.Data.SqlServerCe.SqlCeCommand.ExecuteResultSet(System.Data.SqlServerCe.ResultSetOptions, System.Data.SqlServerCe.SqlCeResultSet)), calling 085897b4

8. Let’s look at all the call stack from all the threads, i will only show the interesting thread for brevity

0:000> ~*e!clrstack

OS Thread Id: 0x12c4 (2)
ESP       EIP
02cffbc8 7c90e4f4 [NDirectMethodFrameStandalone: 02cffbc8] System.Data.SqlServerCe.NativeMethods.CloseStore(IntPtr)
02cffbd8 085dbbf6 System.Data.SqlServerCe.SqlCeConnection.ReleaseNativeInterfaces()
02cffbe4 085db597 System.Data.SqlServerCe.SqlCeConnection.Close(Boolean)
02cffbf0 085dbcdf System.Data.SqlServerCe.SqlCeConnection.Close()
02cffbfc 085d80b3 System.Data.SqlServerCe.SqlCeDataReader.Dispose(Boolean)
02cffc18 085d7f25 System.Data.SqlServerCe.SqlCeDataReader.Finalize()

I see that finalizer thread is disposing the SqlCeConnection object.

9. How many connection object do we have on managed heap?

0:000> !dumpheap -type System.Data.SqlServerCe.SqlCeConnection
Address       MT     Size
0347ac10 085f2de4       96
035bf8c8 085f2de4       96
total 2 objects
Statistics:
MT    Count    TotalSize Class Name
085f2de4        2          192 System.Data.SqlServerCe.SqlCeConnection
Total 2 objects

so hey, we have 2 SqlCeConnection objects on managed heap one of them is getting finalized so nothing to worry.

How about we look at the connection object address on exeption thread and finalizer thread.

0:000> ~2e!clrstack -p
OS Thread Id: 0x12c4 (2)
ESP       EIP
02cffbc8 7c90e4f4 [NDirectMethodFrameStandalone: 02cffbc8] System.Data.SqlServerCe.NativeMethods.CloseStore(IntPtr)
02cffbd8 085dbbf6 System.Data.SqlServerCe.SqlCeConnection.ReleaseNativeInterfaces()
PARAMETERS:
this = 0x0347ac10

dump the command object from thread executing resultset

0:000> !do 0×03805988
Name: System.Data.SqlServerCe.SqlCeCommand
MethodTable: 085f3194
EEClass: 0857561c
Size: 120(0×78) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.Data.SqlServerCe\3.5.1.0__89845dcd8080cc91\System.Data.SqlServerCe.dll)
Fields:
MT    Field   Offset                 Type VT     Attr    Value Name
085f2de4  40000f6       28 …e.SqlCeConnection  0 instance 0347ac10 connection

Guess what, ExecuteResultSet is using the same connection object as the one getting finalized. Now, everything is making sense and of course you are going to get the memory access violation when sql ce is in middle of executing your query in native library. But, why is this connection object getting disposed?

10. find the module name from exception thread

0:000> lmv m MyApp_SqlCe*
start    end        module name
06e10000 0700c000   MyApp_SqlCe   (deferred)

11. Save the module and browse using reflector

0:000> !savemodule 06e10000 c:\temp\myappsqlce.dll
3 sections in file
section 0 – VA=2000, VASize=1f5794, FileAddr=1000, FileSize=1f6000
section 1 – VA=1f8000, VASize=3c0, FileAddr=1f7000, FileSize=1000
section 2 – VA=1fa000, VASize=c, FileAddr=1f8000, FileSize=1000

12. The culprit

SqlCeDataReader reader = command.ExecuteReader(CommandBehavior.CloseConnection);
Basically, at some point before calling ExecuteResultSet, the above line was getting executed on the same connection object with CommandBehavior = CloseConnection. This was disposing the connection object that’s why depending on memory pressure and finalizer queue, memory access violation was thrown randomly.

You may get disposed exception or other managed exception but in this case the timing was such that connection object always ended up in getting finalized while query is getting executed in native sql ce dll.

Share

Debugging memory leak in .NET Remoting Server

Memory leak in .net remoting server

Problem Description

.NET remoting application server crashes from time to time with out of memory exception.

Identifying the issue and the resolution

We have the crash dump. So lets open the crash dump using windbg and look at the managed heap stat

1.

0:000> .load clr20\sos.dll

0:000> !dumpheap -stat

——————————

Statistics:

MT Count TotalSize Class Name

7a774f6c 1 12 System.Net.WebRequest+WebProxyWrapper

7a774d9c 1 12 System.Net.Cache.HttpRequestCacheLevel

…………………………………………………………………………………………………………..

6c27b474 11622400

278937600

System.Threading.ReaderWriterCount

7912d8f8 157772 56244476 System.Object[]

2.

Dump “System.Threading.ReaderWriterCount” objects

0:000> !dumpheap -mt 6c27b474

——————————

Heap 0

Address MT Size

0107cdac 012f3654 24

3.

Find the references to this object

0:000> !gcroot -nostacks 012f3654

DOMAIN(00164F78):HANDLE(Pinned):6f13fc:Root:09070048(System.Object[])->

0107906c(System.Collections.Hashtable)->

45bae0d0(System.Collections.Hashtable+bucket[])->

0145ba30(System.Runtime.Remoting.ServerIdentity)->

012f0558(MemoryLeakApp.ServicesProvider)->

012f20d0(MemoryLeakApp.CacheRepository)->

012f20e0(MemoryLeakApp.Collections.ReaderWriterLockDictionary`2[[System.String, mscorlib],[System.Object, mscorlib]])->

012f20f4(System.Threading.ReaderWriterLockSlim)->

012f2134(System.Object[])->

012f3654(System.Threading.ReaderWriterCount)

Why are these objects rooted in System.Runtime.Remoting.ServerIdentity?

System.Runtime.Remoting.ServerIdentity is an internal class in System.Runtime.Remoting namespace. ServerIdentity derives from Identity and holds the extra server specific information associated with each instance of a remoted server object.

4.

In order to look at MemoryLeakApp.ServicesProvider, you can look at the method implementation with DumpIL command and a method description pointer. I prefer to save the assembly from a full dump and look at the class implementation using reflector. First I will run lm command to list loaded modules and then run sos.dll!savemodule command with the module address to save the assembly.

0:000> lm

start end module name

00400000 00486000 MemoryLeakApp(deferred)

0d070000 0d0b6000 RemotingLibrary (deferred)

0d220000 0d25a000 MemoryLeakApp_ServicesProvider(deferred)

0:000> !savemodule 0d220000 D:\temp\MemoryLeakApp_ServicesProvider.dll

3 sections in file

section 0 – VA=2000, VASize=325fc, FileAddr=200, FileSize=32600

section 1 – VA=36000, VASize=49c, FileAddr=32800, FileSize=600

section 2 – VA=38000, VASize=c, FileAddr=32e00, FileSize=200

5.

Now load “D:\temp\MemoryLeakApp_ServicesProvider.dll” in reflector

Class ServicesProvider : MyLifetimeMarshalByRefObject, IDisposable

{

}

public class MyLifetimeMarshalByRefObject: MarshalByRefObject

{


protected MyLifetimeMarshalByRefObject()

{

}

public override object InitializeLifetimeService()

{

return null;

}

}

MSDN description of InitializeLifetimeService is – “Obtains a lifetime service object to control the lifetime policy for this instance.” and when you override this method to return null that means this object will live forever.

You can read more on lifetime leases on http://msdn.microsoft.com/en-us/library/23bk23zc(VS.71).aspx

Resolution

According to MSDN, Marshal-by-reference objects (MBRs) do not reside in memory forever, whether server-activated Singleton objects or client-activated objects. Instead, unless the type overrides MarshalByRefObject.InitializeLifetimeService to control its own lifetime policies, each MBR has a lifetime that is controlled by a combination of leases, a lease manager, and some number of sponsors.

Using leases to manage the lifetime of remote objects is an alternative approach to reference counting, which can be complex and inefficient over unreliable network connections. Although leases can be configured to extend the lifetime of a remote object longer than is precisely required, the reduction in network traffic devoted to reference counting and pinging clients makes leasing an attractive solution when properly configured for a particular scenario.

You should not override this method to return null but implement lease management to suit your needs.

Share