Archive for the ‘.NET Debugging’ Category

Not reproducible Windows Forms Hang OnUserPreferenceChanged - If your application has hung once it will hang again

Friday, April 3rd, 2009

Problem Description

At one of our customer site, windows forms business application hung for a business user.  This issue  occurred only on one machine out of 100s of users. The way we look at it is, If your application has hung once on any machine that means it will hang again, so I volunteered to look into his machine armed with WinDBG.

Note

Executed command to check for syncblock/locks are for illustration purpose only

1. While debugging for hang/deadlock, the very first thing you need to do is look at the call stack for each thread

2.  If call to Systen.Threading.Monitor.Enter is at the top of the managed call stack and there is a corresponding *WaitFor* at the top of unmanaged call stack which you can get either by running command kb or execute !dumpstack to get managed/unmanaged both, in that case you can suspect deadlock in managed code and resolved it using !synblk command to find out which thread is owning the object

3. If the managed thread call stackdoesn’t have Thread.Monitor that means sycnblock will be pretty useless to find out the cause of deadlock

4. If thread has WaitOne or waiting on a handle in a call stack in that case, you should use dd command to find the handles for kernel handle deadlock or use !do command in managed stack along with !handle to find out which thread owns the handle

Analysis

1. Take a memory dump of the application while it is hung, no live debugging - least convenience to user

2. Open the dump file for analysis, since this is a pure managed app so first we will look at syncblock and then criticalsection locks

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
—————————–
Total           546
CCW             4
RCW             17
ComClassFactory 0
Free            460
0:000> !locks

CritSec <Unloaded_PI32.dll>+a00a92f at 0a00a930
LockCount          0
RecursionCount     0
OwningThread       0
*** Locked

3. Everything is fine so far, let’s look at all the thread stacks and the waithandles on interesting thread

0:000> ~*e!clrstack
OS Thread Id: 0xcf8 (0)

ESP       EIP
0012e4c4 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0012e4d8 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0012e4ec 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0012e58c 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0012e5c0 7b6f322f System.Windows.Forms.Control.Invoke(System.Delegate)
………………………………………………………………………………………………………….

0012eda4 0167834e MyApp.UI.RunStartup()
0012eff0 7b1c8906 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)

0012f3ac 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f3c0 01676974 MyApp.Program.RunSelectPlanForm()
0012f400 0187b525 MyApp.Program.Main(System.String[])

OS Thread Id: 0xf74 (9)
ESP       EIP

0710f204 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0710f218 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0710f22c 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0710f2cc 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0710f300 7b920717 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
0710f34c 7a922833 Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
0710f398 7a923d2f Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
0710f3b8 7aa8f184 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
0710f670 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0710f6a0 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0710f6b4 04edf271 MyApp.MyClass.Run()
0710f6e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()

4. Just to illustrate, You can get the handles on thread 0 and 9 from stack trace

0:009> ~0kb
ChildEBP RetAddr  Args to Child
0012e07c 7c90df2c 7c809574 00000002 0012e0a8 ntdll!KiFastSystemCallRet
0012e080 7c809574 00000002 0012e0a8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0012e11c 7e4195f9 00000002 0012e144 00000000 kernel32!WaitForMultipleObjectsEx+0×12c
0012e178 7752ebd6 00000001 0012e478 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
0012e1a0 77557237 0012e478 000003e8 0012e1cc ole32!CCliModalLoop::BlockFn+0×80

0:009> ~9kb
ChildEBP RetAddr  Args to Child
0710edbc 7c90df2c 7c809574 00000002 0710ede8 ntdll!KiFastSystemCallRet
0710edc0 7c809574 00000002 0710ede8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0710ee5c 7e4195f9 00000002 0710ee84 00000000 kernel32!WaitForMultipleObjectsEx+0×12c
0710eeb8 7752ebd6 00000001 0710f1b8 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
0710eee0 77557237 0710f1b8 000003e8 0710ef0c ole32!CCliModalLoop::BlockFn+0×80

WaitForMultipleObjects take the array of handles and we know the array size from function parameters which is 2. We can use dd to display memory range since we know the size

0:009> dd 0012e144 L2
0012e144  00000ac0 00000038
0:009> dd 0710ee84 L2
0710ee84  00000200 000007c4

Both of these threads are waiting on different handles.

5. After reviewing the call stack again you will see that Thread 0 is calling Invoke to send a message to a window created on Thread 9 and Thread 9 is calling Invoke to send a message to a window created on Thread 0.  After posting the message, Thread 0 is waiting so message is not getting pumped on thread 0 and the same applies to Thread 9.  When you look at MarshaledInvoke implementation , after posting a message it waits on a event handle.

Why did this occur?

Thread 0 is the main UI thread and Thread 9 is a worker thread creating a splasher screen and running in STA apartment to pump message, below is the code snippet which creates a new thread and message loop for splasher on a worker thread.

SplashThread = new Thread(threadStart);
SplashThread.IsBackground = true;
SplashThread.SetApartmentState(ApartmentState.STA);
SplashThread.Start();
By default, Controls subscribe to UserPreferenceChanged Event and this event can get fired when system wide setting is changed, for example - changing the system settings including desktop background or a theme. It just so happened that business user changed his system settings when this app was running with an active splash screen and the timing was such that controls subscribed to UserPreferenceChanged was in the middle of processing a message on MainUI thread. Splasher thread sends a message to MainUI which is not processed yet and MainUI thread sends a message to splasher window to change status which results in a deadlock.

Please refer to a very good article on UserPreferenceChanged event and other gotcha/deadlock/issues which you can run into with a window created on worker thread

Share/Save/Bookmark

ASP.NET Worker Process Recycle/.NET Application crash - What not to do in Finalizer - includes asp.net 2.0 debugging lab

Monday, March 23rd, 2009

Click to Download debugging lab

Debugging Labs

SFTSRC.StockTrader asp.net 2.0 web application will be used to introduce you to a series of debugging labs. We will start with a simple asp.net application which crashes and follow up with memory leak, memory fragmentation, application performance and application hang.

The first lab in series will show you how an application can crash if you are not being careful while implementing finalizer.

Issue Description

Website user gets “Server Application Unavailable error” message from time to time.

ASP.NET Debugging Lab

This lab is prepared based on a production issue resolved at a customer site. Please note that you can download this debugging lab from here. Web project/solution has been created using visual studio 2008.

What will you learn

1. How to debug asp.net application

2. How to identify what is causing asp.net application pool to recycle

3. WinDbg / sos basic commands to analyze a crash dump

Application Architecture

architecture

Breaking down your thought process

1.  Since website user gets to see a generic error handler page, you will need to identify why http request is failing.

2. As shown in the above diagram, website and webservice are hosted on two different servers, so you need to identify the server where request is failing

3. You should look at the http error logs/iis logs and the events logs on each server to identify the issue

4. Logs will also help you determine if there is a network issue

Analysis

After looking at the server logs, we have determined that worker process is getting recycled on a sever hosting Sftsrc Stock Trading WebService. A worker process can get recycled depending on the configuration, low on virtual memory, crash due to unhandled exception etc. Let’s assume that you don’t have any information on what has caused worker process to recycle

Let’s look at the lab which is prepared based on the above issue description.

Lab

We will analyze this issue using Lab01- Download it from here.

1. Once you download the lab, create 2 virtual directory “StockTrader” and “TradeWS”

2. “StockTrader” will be set to folder “\Lab01\SFTSRC.StockTrader”

3. “TradeWS” will be set to folder “\Lab01\SFTSRC.UnmanagedService”

4. Browse to http://localhost/StockTrader/Default.aspx

You will see the website as shown below
debugging lab 01 snapshot

5. Enter Stock Symbol, Quantity, Price and click on Submit Order

6. Once you submit an order, you will see “Server Unavailable Error” page in a few seconds.

7. If you look at the iis log, you will see http 503 error

please visit http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for description on HTTP 1.1 status code

8. Event log will have the error message “asp.net process recycle or stopped unexpectedly.”

9. Attach a debugger to your worker process

10. we will use adplus configuration file to get a crash dump on first chance access violation.

11. download accessviolation.cfg from here

12. go to your windbg folder and run the following command

cscript.exe adplus.vbs -c accessviolation.cfg

13. Browse to http://localhost/StockTrader/Default.aspx again

14. Once you get “server unavailable error”, you should have the dump files created under your windbg folder

15. Please note that adplus configuration file has  <ProcessName>aspnet_wp.exe</ProcessName> by default, if you are using windows server then change it to <ProcessName>w3wp.exe</ProcessName>. You may have multiple dumps created if you have more than one application pool. You can identify the worker process id for the application pool hosting TradeWS and then you can create a dump based on process id.

16. Open the dump file( *****.EXE__1st_chance_AccessViolation__full_*****c.dmp) using windbg

17.  Run following command to load sos dll

0:009> .loadby sos mscorwks

18. You should always run !analyze -v when you are doing a crash dump analysis, most of the time this one command is enough to tell you what has caused the crash

0:009> !analyze -v

Attempt to read from address 00000000

DEFAULT_BUCKET_ID:  NULL_POINTER_READ

PROCESS_NAME:  aspnet_wp.exe

MANAGED_STACK:
0195F948 05CD0848 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.GetTLS()+0×20
0195F950 05CD07E6 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.Finalize()+0×1e

I have omitted most of the output for the sake of brevity.

19. You will notice that application has crashed while executing Finalize

20. On a side note you can look at all the Exception objects on managed heap with the following command

0:009> !dumpheap -type Exception

21. disassemble managed method Finalize using IP

0:009> !u 05cd07e6

05cd07e0 ff15c005cc05    call    dword ptr ds:[5CC05C0h] (SFTSRC.UnmanagedService.UnmanagedService.GetTLS(), mdToken: 06000028)
>>> 05cd07e6 8bce            mov     ecx,esi

· You will notice that Exception is occurring while calling GetTLS, where it tries to assign the return value

22. On a side note, you can use View->Registers window to look at registers for the current thread or use r command

23. disassemble managed method GetTLS using IP

0:009> !u 05cd0848
05cd0841 e81a085b73      call    mscorlib_ni+0×1c1060 (79281060) (System.Threading.Thread.GetData(System.LocalDataStoreSlot), mdToken: 060012f3)
05cd0846 8bc8            mov     ecx,eax
>>> 05cd0848 8b01            mov     eax,dword ptr [ecx]

24. You will notice that this is occurring right while calling to System.Threading.Thread.GetData() and ecx register is null

25. SOS dll has powerful command to look at clr data structure. We are going to look at GetTLS() implementation by dumping the IL(intermediate language) of this method

26. in order to dumpil, you have to first convert instruction pointer IP to Method Descriptor

0:009> !ip2md 05cd0848
MethodDesc: 05cc05b8
Method Name: SFTSRC.UnmanagedService.UnmanagedService.GetTLS()
Class: 05c54628

0:009> !dumpil 05cc05b8
ilAddr = 05c72250
IL_0000: ldstr “ThreadSpecificData”
IL_0005: call System.Threading.Thread::GetNamedDataSlot
IL_000a: stloc.0
IL_000b: ldloc.0
IL_000c: call System.Threading.Thread::GetData
IL_0011: callvirt System.Object::ToString
IL_0016: callvirt System.String::get_Length
IL_001b: ret

27. As shown in above IL, it has crashed while converting System.Threading.Thread::GetData object to String so that means System.Threading.Thread::GetData is returning NULL

Conclusion

Application Pool crashed on finalizer thread while executing Thread.GetData(). If you are not familiar with Thread Local Storage, please read it on MSDN http://msdn.microsoft.com/en-us/library/ms686749.aspx . Thread.GetData get the allocated data from thread slot but the issue here is, thread data slots are unique per thread. No other thread (not even a child thread) can get that data. Please see more details on MSDN http://msdn.microsoft.com/en-us/library/system.threading.thread.getdata.aspx . We all know that Finalizer runs on a separate thread and a finalizer thread wakes up when it the Q is notified.

You will see that in SFTSRC.UnmanagedService.UnmanagedService.CreateTLS() method, we are allocating the data slot which is in a totally different thread that’s why Finalizer thread doesn’t have access to thread specific data and resulting in crash.

You can run the following command to look at all the thread stacks

0:009> ~*e!clrstack

OS Thread Id: 0×11cc (20)
ESP       EIP
05c4ec64 7c90e4f4 [HelperMethodFrame: 05c4ec64] System.GC.WaitForPendingFinalizers()
05c4ecb4 05cd07aa SFTSRC.UnmanagedService.UnmanagedService.CreateTLS()
05c4ecc0 05cd06e8 TradeWebService.OrderService.SendOrder(System.String, UInt32, Int32, System.String, System.String)

You will notice that, while sending the order, we create a TLS and then in order to force GC and Finalizer, CreateTLS() method calls GC.Collect and GC.WaitForPendingFinalizers.

Lesson Learnt

Never access a thread specific data in Finalizer. This lab uses Thread Local Storage to show you the thread specific data but it could be anything. For example, in windows application, thread which creates a handle or control owns it so you can’t access those in finalizer.

Exercise

1. use !tls command to look at Thread Local Storage slot

2. use !teb to get familiar with thread environment block and how to use it with !tls

3. ~ command to display all the threads and get their corresponding TEB

4. Search memory to find which thread has created the slot

Download debugging lab

Share/Save/Bookmark

.NET Crash/OutofMemoryException/Memory Leak - .NET windows forms and infragistics datagrid and why is System.Drawing.Image object not getting finalized??

Monday, March 2nd, 2009

Issue Description
Windows forms application has crashed with OOM exception. Before application crashes, cpu is almost pegged at 100% for a few minutes

Root Cause Analysis using WinDbg

Collect full memory dump at set intervals

  • You could get a crash dump and analyze the managed heap to find out rooted objects. But, since we have access to the system I prefer to get a dump at set intervals and compare the managed heap statistics because that makes it a little easier to find the objects which are surviving GC over a period of time.
  • We will use ADPlus to automate this task
  • I will run the script to get a full memory dump 4 times every 2 minutes
  • Command to automate this task is “cscript.exe adplus.vbs -hang -pn <myapp.exe> -quiet -r 4 120″
  • First Dump file size is around 800MB which also indicates process’s memory usage at that time
  • Second Dump file size is around 1.2 GB
  • Third Dump file size is around 1.6 GB and a little later application has crashed.

This is a pure .net application, so we are going to jump ahead and look at the managed heap stats, gc handles and the objects in finalize queue. We will use sos2.dll copied under the same folder as windbg executable, we will dump only pinned and strong gchanldes to identify gc handles increasing over the time because these handles could cause memory leak. Please note that,!gcht (gchandles by type) command is only available in our windbg extension sos2.dll. You could use sos.dll!gchandles to dump gchandles but it won’t give you the objects and their stats by type and you will have to figure out yourself probably by looking at the root.
GCHandles Stats from First Dump
0:000> .load sos2
0:000> !gcht -t p

Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………….
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 185
Statistics:
………………………………………………………
Total 185 objects
GCHandles Stats from Second Dump
0:000> !gcht -t p
Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………………………
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 186
Statistics:
……………………………………………………………..
Total 186 objects

Lets move over since we don’t see anything interesting with gchandles, no. of pinned gchandles remain same and strong gc handles count has increased only by one.

  • We will compare finalize queue stats in dumps, I am only including the interesting objects and the interesting comments for the sake of brevity

Finalize Queue in first dump

0:000> !finalizequeue
generation 2 has 9433 finalizable objects (05501508->0550a86c)
Ready for finalization 0 objects (0550af4c->0550af4c)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     1907 45768 System.Drawing.Bitmap
…………………………………………….
Total 9873 objects

Finalize Queue in second dump

0:000> !finalizequeue
generation 2 has 10545 finalizable objects (05501508->0550b9cc)
Ready for finalization 0 objects (0550bdac->0550bdac)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     2951 70824 System.Drawing.Bitmap
…………………………………………….
Total 10793 objects

Aha, Do we see something interesting here???? Of course, numbers of finalizable objects in generation 2 have increased by almost 1000 and on top of that number of objects ready to be finalized is 0. So why are these objects not getting finalized?

  • We have to find out why System.Drawing.Bitmap is not getting finalized.

As shown in above step,  generation 2 has 9433 finalizable objects (05501508->0550a86c).
We have finalizable objects starting from memory address 05501508 and ending at 0550a86c. You don’t want to dumpheap by type(System.Drawing.Bitmap) to look at the roots to this object, you will have to dump too many objects unless you get lucky. The better way is probably to display the memory and get the address of an object. Size of the System.Drawing.Bitmap object is 24 Bytes so we may be able to get the object address by specifying the address range ending with finalize queue @ 0550a86c. We will subtract 24*4 = 96 bytes(60) from 0550a86c which is 550A80C.
First column is the finalize queue address and the rest are the memory addresses of the objects
0:000> dd 550A80C 0550a86c

0550a80c  17b6e074 17b6e11c 17b6e1c4 17b6e26c
…………………………………………………………………………………….
0550a86c  17b76734
0:000> !do 17b6e074

Name: System.Drawing.Bitmap —-> Make sure this is System.Drawing.Bitmap
MethodTable: 7ae3c9f8
EEClass: 7ade4014
Size: 24(0×18) bytes

0:000> !gcroot -nostacks 17b6e704
DOMAIN(001581B0):HANDLE(Strong):ff11f8:Root:01981b64(System.Threading.Thread)->
………………………………………………………………………………………
01d00f54(MyApp.MyForm)->
160875cc(MyApp.Controls.MyControl)->
1618b578(Infragistics.Win.UltraWinGrid.UltraGridRow)->
14f1a7c0(Infragistics.Win.UltraWinGrid.CellsCollection)->
…………………………………………………………………………………….
17b6e674(Infragistics.Win.UltraWinGrid.UltraGridCell)->
17b6e6f4(Infragistics.Win.UltraWinGrid.UnBoundData)->
17b6e704(System.Drawing.Bitmap)

This is rooted in some strong handles so this is not rooted in finalization queue what that means is object is not ready to be finalized yet as we saw in finalizeQ stats. I am hiding the customer data so basically, we have a windows forms containing user control with infragistics UltraGrid and the System.Drawing.Bitmap is being set in a cell.

Let’s look at the sample code
foreach (UltraGridRow row in rows)
{
row.Cells[someindex] =<bitmap object>
}
This is where we have the problem because if there are let’s say 5000 rows then we are creating 5000 bitmap objects and as long as form is alive these objects will never be disposed. System.Drawing.Bitmap uses unmanaged GDIPlus library and this is not a lightweight object that’s why it was crashing with outofmemory exception and only in a particular scenario but this may go un-noticed during test cycle by QA team unless the test case covers this very particular scenario.
Resolution
I am sure there are many ways to fix it but one easy way to fix is create the drawing objects for rows visible in the client area and handle scroll/resize events to set the image and dispose the objects not in use.

Share/Save/Bookmark

Akismet - Spam Blocked
September 2010
M T W T F S S
« Aug    
 12345
6789101112
13141516171819
20212223242526
27282930  
Categories