Dissecting Silverlight and CoreCLR Runtime on Windows OS using WinDbg – Silverlight Plugin and Silverlight .NET runtime explained

Let’s deep dive into silverlight run time and find out how the request is processed
We need fiddler ,  WinDbg  and a browser in Windows OS. We have a simple html page hosted in IIS with a Silverlight control in it as shown below
<html><head>   <title>SilverlightApplication1</title></head>
<body> <div id=”silverlightControlHost”>
<object data=”data:application/x-silverlight-2,” type=”application/x-silverlight-2″>
<param name=”source” value=”ClientBin/SilverlightApplication1.xap”/>
</object>  </div></body></html>

Attach browser to WinDbg, if you are using IE8 then make sure that you are attaching the child process (tabbed browser), use process explorer to identify the child process.
Execute command “.cmdtree <pathname>sld.txt” as shown below; you can download sld.txt from here as shown below

command tree snapshot

Silverlight runtime by default gets installed under <system drive> \Program Files\Microsoft Silverlight\<version #>
As shown below

silverlight runtime installation folder

We will be interested more in native dll which makes up of Silverlight runtime. .NET library is part of Silverlight CoreCLR.
In order to get the order of the loaded modules of Silverlight runtime, we can execute command “sxe ld” or you can double click on “Break On Loaded Modules” from command tree and browse to html page containing Silverlight xap file.

You need to ignore most of the dlls but the one loaded from “C:\Program Files\Microsoft Silverlight\2.0.40115.0” (Silverlight runtime path on my machine). The order of loaded modules is
Npctrl.dll, agcore.dll, coreclr.dll, .net dlls
You can use any disassembler  or ndepends to look at the exported functions or download one from http://www.smidgeonsoft.prohosting.com/pebrowse-pro-file-viewer.html to look at these native dlls

Below is the snapshot for NPCTRL.DLL
NPCTRL.DLL snapshot

NPCTRL.dll is a COM dll which gets loaded in browser when http request is made with silverlight control in it(object tag). Shown below in the fiddler

Browser determines this based on http response since the MIME type in html file is
type=”application/x-silverlight-2″. NPCTRL.dll gets loaded as Silverlight Plug-in.

This occurs before XAP package gets downloaded,

fiddler snapshot

As shown below, you can now clear breakpoints on loaded modules and set breakpoint for each of dll one by one because debugger remembers the last ld settings.

command treesnapshot

You may want to change the symbols path to Silverlight runtime so that you can set breakpoints on all exported methods (see the above snapshot).

NPCTRL.DLL imports followings from AGCORE.DLL

command treesnapshot

NPCTRP.DLL first loads AGCORE.DLL. NPCTRL.dll implements CXcpBrowserHost/CommonBrowserHost and it makes a call to agcore!ObtainCoreServices to set the frameworkcallbacks. NPCTRL.DLL also implements CWindowsServices::CreateCLRRuntimeHost. NPCTRL.dll calls NPCTRL!GetCLRRuntTimeHost to load CORECLR.dll . CORECLR.dll is our Silverlight .NET runtime engine, basically stripped down version of MSCORWKS.dll

coreclr dll

Once CLRRuntime is created, coreclr will create a domain and load the .net assembly as a private copy using normal assembly binding. CORECLR will use PEImage layout to map the view of file same as .net runtime. Below is the call stack when application domain for coreclr is set up

0220f4d4 04a9b399 System.Activator.CreateInstance(System.String, System.String)
0220f4e8 04a9b349 System.AppDomain.CreateInstance(System.String, System.String)
0220f504 04a9b298 System.AppDomain.CreateInstanceAndUnwrap(System.String, System.String)
0220f51c 04a9b1d8 System.AppDomain.CreateDomainManager(System.String, System.String)
0220f560 04a94d6a System.AppDomain.SetDomainManager(System.Security.Policy.Evidence, System.Security.Policy.Evidence, IntPtr, Boolean)
0220f5b0 04a98dfd System.AppDomain.Setup(System.Object)

Once app domain is set up, ManagedRuntimeHost will be created as shown below from the reflector

Managed Runtime Host snapshot

This is when coreclr will also set up System.Windows.Threading.DispatcherSynchronizationContext for Single Threaded Apartment (UI). At this point, it will also create a callback handler in AGCORE.DLL. AGCORE.DLL is the Unmanaged version of Silverlight. AGCORE.DLL implements event listener, Setting the visual root and the callback handler is used for communication between CORECLR and AGCORE.

The main interface for communication between CORECLR and AGCORE is MS.Internal.XcpImports(implemented in System.Windows.DLL) XcpImports is the interop layer importing the exports section in AGCORE.dll

AGCORE.DLL
XCpImports

Once XcpImports.SetFrameworkCallbacksNative is called to set the callback handler, MS.Internal.FrameworkCallbacks will create System.Windows.DependencyObject.

After dependency object is created, Application object will be created as shown below

0220f2e4 0108008d SilverlightApplication1.App..ctor()
0220f850 026b2e90 System.Reflection.ConstructorInfo.Invoke(System.Object[])
0220f860 026b2a98 MS.Internal.TypeProxy.CreateInstance(UInt32)

.net library will be responsible for creating the visual tree and passing the buffer into agcore.dll native code and so on.

when does XAP package get downloaded?

Once NPCTRL.DLL and AGCORE.DLL are loaded, NPCTRL!CommonBrowserHost implements (shown below) npctrl!CommonBrowserHost::OnGotSourceDownloadResponse and this will make a call into agcore!CCoreServices::StartDeploymentTree which in turn calls agcore!CCoreServices::CLR_Startup and finally this will be routed to  npctrl!CWindowsServices::CreateCLRRuntimeHost to load CORECLR.DLL, see below for the callstack

kernel32!LoadLibraryW - Will load CORECLR.DLL
npctrl!GetCLRRuntimeHost
npctrl!CWindowsServices::CreateCLRRuntimeHost
agcore!CCLRHost::Initialize
agcore!CRuntimeHost::Initialize
agcore!CCoreServices::CLR_Startup
agcore!CDeployment::Start
agcore!CCoreServices::StartDeploymentTree
npctrl!CXcpBrowserHost::put_Source
npctrl!CommonBrowserHost::OnGotSourceDownloadResponsetype=”application/x-silverlight-2″

Share

Silverlight with better production debugging support – SOS.dll in coreclr has better extension command support than sos in CLR 2.0

sos.dll installed with Silverlight Runtime has more extension commands compare to mscorwks version of SOS. Although you can load coreclr version of sos with command “.loadby sos coreclr” to make sure that you have the sos loaded from the same location as your coreclr.dll, but you should also get the sos.dll when you are debugging a silverlight app dump from a customer machine because silverlight has more than one runtime version.

Below are the cool/new extension commands in silverlight version of sos to make our life easier, please note that some of these extension commands and dml were supported in .NET framework 1.1 SOS version but not in 2.0

1. DML support, you can turn it on by executing command “.prefer_dml 1″ or you can add “/D” with individual command. DML will definitely save you lot of typing as shown below
dmlsnapshot

This allows you to click on object link to dump the object equivalent to( “!DumpObj /d <address>”)

2. These are the new commands in silverlight version of sos. SOS help documentation has done a great job in explaining the details on each of the command, you can execute “!help <commandname>” to get details.

GCWhere and FindRoots are very handy.

GCWhere command not only outputs the Generation number of an object but also the GC Segment and the address

FindRoots with -gen option allows you to break into your application on the next collection of specified generation number in -gen option to find out the cross generation reference to the object.

—————————–
New commands in Diagnostic Utilities
—————————–
VerifyObj
FindRoots
HeapStat
GCWhere
ListNearObj (lno)
AnalyzeOOM (ao)
!FinalizeQueue [-allReady] [-short]
—————————–
New commands to Examine the GC history   (you will need stress log file, congfigured in registry. You can get more details by executing command “!help DumpLog” )
—————————–
HistInit
HistStats
HistRoot
HistObj
HistObjFind
HistClear

Share

Silverlight Debugging – .loadby sos coreclr Unable to find module ‘coreclr’, Failed to find runtime DLL (mscorwks.dll), 0×80004005 Extension commands need mscorwks.dll

0:000> .loadby sos mscorwks
Unable to find module ‘mscorwks’
0:000> .loadby sos coreclr
Unable to find module ‘coreclr’
0:000> .load C:\Program Files\Microsoft Silverlight\2.0.40115.0\sos.dll
0:000> !clrstack
Failed to find runtime DLL (mscorwks.dll), 0×80004005
Extension commands need mscorwks.dll in order to have something to do.

unable to find mscorwks.dll means there is no clr host but if you do encounter this issue while debugging Silverlight application running in IE or any browser, verify the followings

1. Have you attached WinDbg to the correct browser’s process id?

2. Have you taken the memory dump of the process running silverlight?

If you are using IE8, it runs the tabbed browser as a child process so you will see 2 instances of iexplore.exe running. You can use Process Explorer to identify which process has loaded the coreclr.dll  or agcore.dll(silverlight runtime). See the Snapshot below

Process Explorer snapshot of IE without Silverlight Runtime

IE without Silverlight Runtime

Process Explorer snapshot of IE with Silverlight Runtime

IE with Silverlight Runtime

Share

Caution when using System.IO.FileStream – ask what you need – System.UnauthorizedAccessException: Access to the path is denied

I have come across this issue quite a few times and this issue will be seen more often during deployment in QA/Production machine.

Its not unusual to see the below piece of code
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
XmlReader reader = XmlReader.Create(fs);
XmlSerializer serializer = new XmlSerializer(typeof(T));
Object o = serializer.Deserialize(reader);
}

Many times this will go unnoticed but if you look closely FileStream is using a constructor with FilePath and FileMode alone. System.IO.FileStream implementation of this constructor is

public FileStream(string path, FileMode mode) : this(path, mode, (mode == FileMode.Append) ? FileAccess.Write : FileAccess.ReadWrite, FileShare.Read, 0×1000, FileOptions.None, Path.GetFileName(path), false)
{
}

Default constructor asks for ReadWrite access, so now you see why your application is hosed in production, most of the developers use their system as admin user so of course they have the write access to the requested file. This is not to blame developers who run as admin because I can totally understand the pain.

By default, FileStream needs ReadWrite access that’s why System.UnauthorizedAccessException is thrown because on a production machine, User account under which asp.net worker process runs or a windows service or for that matter any process will not have the write access to a file by default.

Make sure, you ask for what you need. If your application needs only Read access to a file, make sure to specify that in your FileStream constructor. Don’t be GREEDY

using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))

May be I would rather see a Read access by default in System.IO.FileStream implementation rather than ReadWrite.

Share

The call to LoadLibrary(sos2) failed, Win32 error 0n14001, “This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.”

Mike Semikin has reported this issue while executing windbg command .load sos2

The call to LoadLibrary(sos2) failed, Win32 error 0n14001
“This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.”
Please check your debugger configuration and/or network access.

Resolution
This is because visual studio 2008 is used as a development environment for sos2 and CRT is installed in assembly cache(winSxS) folder so dependent dll will be missing

As an immediate workaround, please install visual c++ 2008 runtime redistributable from Microsoft,  download from http://www.microsoft.com/Downloads/details.aspx?familyid=9B2DA534-3E03-4391-8A4D-074B9F2BC1BF&displaylang=en

Share

Not reproducible Windows Forms Hang OnUserPreferenceChanged – If your application has hung once it will hang again

Problem Description

At one of our customer site, windows forms business application hung for a business user.  This issue  occurred only on one machine out of 100s of users. The way we look at it is, If your application has hung once on any machine that means it will hang again, so I volunteered to look into his machine armed with WinDBG.

Note

Executed command to check for syncblock/locks are for illustration purpose only

1. While debugging for hang/deadlock, the very first thing you need to do is look at the call stack for each thread

2.  If call to Systen.Threading.Monitor.Enter is at the top of the managed call stack and there is a corresponding *WaitFor* at the top of unmanaged call stack which you can get either by running command kb or execute !dumpstack to get managed/unmanaged both, in that case you can suspect deadlock in managed code and resolved it using !synblk command to find out which thread is owning the object

3. If the managed thread call stackdoesn’t have Thread.Monitor that means sycnblock will be pretty useless to find out the cause of deadlock

4. If thread has WaitOne or waiting on a handle in a call stack in that case, you should use dd command to find the handles for kernel handle deadlock or use !do command in managed stack along with !handle to find out which thread owns the handle

Analysis

1. Take a memory dump of the application while it is hung, no live debugging – least convenience to user

2. Open the dump file for analysis, since this is a pure managed app so first we will look at syncblock and then criticalsection locks

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
—————————–
Total           546
CCW             4
RCW             17
ComClassFactory 0
Free            460
0:000> !locks

CritSec <Unloaded_PI32.dll>+a00a92f at 0a00a930
LockCount          0
RecursionCount     0
OwningThread       0
*** Locked

3. Everything is fine so far, let’s look at all the thread stacks and the waithandles on interesting thread

0:000> ~*e!clrstack
OS Thread Id: 0xcf8 (0)

ESP       EIP
0012e4c4 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0012e4d8 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0012e4ec 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0012e58c 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0012e5c0 7b6f322f System.Windows.Forms.Control.Invoke(System.Delegate)
………………………………………………………………………………………………………….

0012eda4 0167834e MyApp.UI.RunStartup()
0012eff0 7b1c8906 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)

0012f3ac 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f3c0 01676974 MyApp.Program.RunSelectPlanForm()
0012f400 0187b525 MyApp.Program.Main(System.String[])

OS Thread Id: 0xf74 (9)
ESP       EIP

0710f204 792b6835 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
0710f218 7b6f192f System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
0710f22c 7ba2d0bb System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
0710f2cc 7b6f328c System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
0710f300 7b920717 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
0710f34c 7a922833 Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
0710f398 7a923d2f Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
0710f3b8 7aa8f184 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
0710f670 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0710f6a0 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0710f6b4 04edf271 MyApp.MyClass.Run()
0710f6e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()

4. Just to illustrate, You can get the handles on thread 0 and 9 from stack trace

0:009> ~0kb
ChildEBP RetAddr  Args to Child
0012e07c 7c90df2c 7c809574 00000002 0012e0a8 ntdll!KiFastSystemCallRet
0012e080 7c809574 00000002 0012e0a8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0012e11c 7e4195f9 00000002 0012e144 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0012e178 7752ebd6 00000001 0012e478 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0012e1a0 77557237 0012e478 000003e8 0012e1cc ole32!CCliModalLoop::BlockFn+0×80

0:009> ~9kb
ChildEBP RetAddr  Args to Child
0710edbc 7c90df2c 7c809574 00000002 0710ede8 ntdll!KiFastSystemCallRet
0710edc0 7c809574 00000002 0710ede8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0710ee5c 7e4195f9 00000002 0710ee84 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0710eeb8 7752ebd6 00000001 0710f1b8 000003e8 user32!RealMsgWaitForMultipleObjectsEx+0x13e
0710eee0 77557237 0710f1b8 000003e8 0710ef0c ole32!CCliModalLoop::BlockFn+0×80

WaitForMultipleObjects take the array of handles and we know the array size from function parameters which is 2. We can use dd to display memory range since we know the size

0:009> dd 0012e144 L2
0012e144  00000ac0 00000038
0:009> dd 0710ee84 L2
0710ee84  00000200 000007c4

Both of these threads are waiting on different handles.

5. After reviewing the call stack again you will see that Thread 0 is calling Invoke to send a message to a window created on Thread 9 and Thread 9 is calling Invoke to send a message to a window created on Thread 0.  After posting the message, Thread 0 is waiting so message is not getting pumped on thread 0 and the same applies to Thread 9.  When you look at MarshaledInvoke implementation , after posting a message it waits on a event handle.

Why did this occur?

Thread 0 is the main UI thread and Thread 9 is a worker thread creating a splasher screen and running in STA apartment to pump message, below is the code snippet which creates a new thread and message loop for splasher on a worker thread.

SplashThread = new Thread(threadStart);
SplashThread.IsBackground = true;
SplashThread.SetApartmentState(ApartmentState.STA);
SplashThread.Start();
By default, Controls subscribe to UserPreferenceChanged Event and this event can get fired when system wide setting is changed, for example – changing the system settings including desktop background or a theme. It just so happened that business user changed his system settings when this app was running with an active splash screen and the timing was such that controls subscribed to UserPreferenceChanged was in the middle of processing a message on MainUI thread. Splasher thread sends a message to MainUI which is not processed yet and MainUI thread sends a message to splasher window to change status which results in a deadlock.

Please refer to a very good article on UserPreferenceChanged event and other gotcha/deadlock/issues which you can run into with a window created on worker thread

Share

ASP.NET Worker Process Recycle/.NET Application crash – What not to do in Finalizer – includes asp.net 2.0 debugging lab

Click to Download debugging lab

Debugging Labs

SFTSRC.StockTrader asp.net 2.0 web application will be used to introduce you to a series of debugging labs. We will start with a simple asp.net application which crashes and follow up with memory leak, memory fragmentation, application performance and application hang.

The first lab in series will show you how an application can crash if you are not being careful while implementing finalizer.

Issue Description

Website user gets “Server Application Unavailable error” message from time to time.

ASP.NET Debugging Lab

This lab is prepared based on a production issue resolved at a customer site. Please note that you can download this debugging lab from here. Web project/solution has been created using visual studio 2008.

What will you learn

1. How to debug asp.net application

2. How to identify what is causing asp.net application pool to recycle

3. WinDbg / sos basic commands to analyze a crash dump

Application Architecture

architecture

Breaking down your thought process

1.  Since website user gets to see a generic error handler page, you will need to identify why http request is failing.

2. As shown in the above diagram, website and webservice are hosted on two different servers, so you need to identify the server where request is failing

3. You should look at the http error logs/iis logs and the events logs on each server to identify the issue

4. Logs will also help you determine if there is a network issue

Analysis

After looking at the server logs, we have determined that worker process is getting recycled on a sever hosting Sftsrc Stock Trading WebService. A worker process can get recycled depending on the configuration, low on virtual memory, crash due to unhandled exception etc. Let’s assume that you don’t have any information on what has caused worker process to recycle

Let’s look at the lab which is prepared based on the above issue description.

Lab

We will analyze this issue using Lab01- Download it from here.

1. Once you download the lab, create 2 virtual directory “StockTrader” and “TradeWS”

2. “StockTrader” will be set to folder “\Lab01\SFTSRC.StockTrader”

3. “TradeWS” will be set to folder “\Lab01\SFTSRC.UnmanagedService”

4. Browse to http://localhost/StockTrader/Default.aspx

You will see the website as shown below
debugging lab 01 snapshot

5. Enter Stock Symbol, Quantity, Price and click on Submit Order

6. Once you submit an order, you will see “Server Unavailable Error” page in a few seconds.

7. If you look at the iis log, you will see http 503 error

please visit http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for description on HTTP 1.1 status code

8. Event log will have the error message “asp.net process recycle or stopped unexpectedly.”

9. Attach a debugger to your worker process

10. we will use adplus configuration file to get a crash dump on first chance access violation.

11. download accessviolation.cfg from here

12. go to your windbg folder and run the following command

cscript.exe adplus.vbs -c accessviolation.cfg

13. Browse to http://localhost/StockTrader/Default.aspx again

14. Once you get “server unavailable error”, you should have the dump files created under your windbg folder

15. Please note that adplus configuration file has  <ProcessName>aspnet_wp.exe</ProcessName> by default, if you are using windows server then change it to <ProcessName>w3wp.exe</ProcessName>. You may have multiple dumps created if you have more than one application pool. You can identify the worker process id for the application pool hosting TradeWS and then you can create a dump based on process id.

16. Open the dump file( *****.EXE__1st_chance_AccessViolation__full_*****c.dmp) using windbg

17.  Run following command to load sos dll

0:009> .loadby sos mscorwks

18. You should always run !analyze -v when you are doing a crash dump analysis, most of the time this one command is enough to tell you what has caused the crash

0:009> !analyze -v

Attempt to read from address 00000000

DEFAULT_BUCKET_ID:  NULL_POINTER_READ

PROCESS_NAME:  aspnet_wp.exe

MANAGED_STACK:
0195F948 05CD0848 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.GetTLS()+0×20
0195F950 05CD07E6 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.Finalize()+0x1e

I have omitted most of the output for the sake of brevity.

19. You will notice that application has crashed while executing Finalize

20. On a side note you can look at all the Exception objects on managed heap with the following command

0:009> !dumpheap -type Exception

21. disassemble managed method Finalize using IP

0:009> !u 05cd07e6

05cd07e0 ff15c005cc05    call    dword ptr ds:[5CC05C0h] (SFTSRC.UnmanagedService.UnmanagedService.GetTLS(), mdToken: 06000028)
>>> 05cd07e6 8bce            mov     ecx,esi

· You will notice that Exception is occurring while calling GetTLS, where it tries to assign the return value

22. On a side note, you can use View->Registers window to look at registers for the current thread or use r command

23. disassemble managed method GetTLS using IP

0:009> !u 05cd0848
05cd0841 e81a085b73      call    mscorlib_ni+0x1c1060 (79281060) (System.Threading.Thread.GetData(System.LocalDataStoreSlot), mdToken: 060012f3)
05cd0846 8bc8            mov     ecx,eax
>>> 05cd0848 8b01            mov     eax,dword ptr [ecx]

24. You will notice that this is occurring right while calling to System.Threading.Thread.GetData() and ecx register is null

25. SOS dll has powerful command to look at clr data structure. We are going to look at GetTLS() implementation by dumping the IL(intermediate language) of this method

26. in order to dumpil, you have to first convert instruction pointer IP to Method Descriptor

0:009> !ip2md 05cd0848
MethodDesc: 05cc05b8
Method Name: SFTSRC.UnmanagedService.UnmanagedService.GetTLS()
Class: 05c54628

0:009> !dumpil 05cc05b8
ilAddr = 05c72250
IL_0000: ldstr “ThreadSpecificData”
IL_0005: call System.Threading.Thread::GetNamedDataSlot
IL_000a: stloc.0
IL_000b: ldloc.0
IL_000c: call System.Threading.Thread::GetData
IL_0011: callvirt System.Object::ToString
IL_0016: callvirt System.String::get_Length
IL_001b: ret

27. As shown in above IL, it has crashed while converting System.Threading.Thread::GetData object to String so that means System.Threading.Thread::GetData is returning NULL

Conclusion

Application Pool crashed on finalizer thread while executing Thread.GetData(). If you are not familiar with Thread Local Storage, please read it on MSDN http://msdn.microsoft.com/en-us/library/ms686749.aspx . Thread.GetData get the allocated data from thread slot but the issue here is, thread data slots are unique per thread. No other thread (not even a child thread) can get that data. Please see more details on MSDN http://msdn.microsoft.com/en-us/library/system.threading.thread.getdata.aspx . We all know that Finalizer runs on a separate thread and a finalizer thread wakes up when it the Q is notified.

You will see that in SFTSRC.UnmanagedService.UnmanagedService.CreateTLS() method, we are allocating the data slot which is in a totally different thread that’s why Finalizer thread doesn’t have access to thread specific data and resulting in crash.

You can run the following command to look at all the thread stacks

0:009> ~*e!clrstack

OS Thread Id: 0x11cc (20)
ESP       EIP
05c4ec64 7c90e4f4 [HelperMethodFrame: 05c4ec64] System.GC.WaitForPendingFinalizers()
05c4ecb4 05cd07aa SFTSRC.UnmanagedService.UnmanagedService.CreateTLS()
05c4ecc0 05cd06e8 TradeWebService.OrderService.SendOrder(System.String, UInt32, Int32, System.String, System.String)

You will notice that, while sending the order, we create a TLS and then in order to force GC and Finalizer, CreateTLS() method calls GC.Collect and GC.WaitForPendingFinalizers.

Lesson Learnt

Never access a thread specific data in Finalizer. This lab uses Thread Local Storage to show you the thread specific data but it could be anything. For example, in windows application, thread which creates a handle or control owns it so you can’t access those in finalizer.

Exercise

1. use !tls command to look at Thread Local Storage slot

2. use !teb to get familiar with thread environment block and how to use it with !tls

3. ~ command to display all the threads and get their corresponding TEB

4. Search memory to find which thread has created the slot

Download debugging lab

Share

.NET Crash/OutofMemoryException/Memory Leak – .NET windows forms and infragistics datagrid and why is System.Drawing.Image object not getting finalized??

Issue Description
Windows forms application has crashed with OOM exception. Before application crashes, cpu is almost pegged at 100% for a few minutes

Root Cause Analysis using WinDbg

Collect full memory dump at set intervals

  • You could get a crash dump and analyze the managed heap to find out rooted objects. But, since we have access to the system I prefer to get a dump at set intervals and compare the managed heap statistics because that makes it a little easier to find the objects which are surviving GC over a period of time.
  • We will use ADPlus to automate this task
  • I will run the script to get a full memory dump 4 times every 2 minutes
  • Command to automate this task is “cscript.exe adplus.vbs -hang -pn <myapp.exe> -quiet -r 4 120″
  • First Dump file size is around 800MB which also indicates process’s memory usage at that time
  • Second Dump file size is around 1.2 GB
  • Third Dump file size is around 1.6 GB and a little later application has crashed.

This is a pure .net application, so we are going to jump ahead and look at the managed heap stats, gc handles and the objects in finalize queue. We will use sos2.dll copied under the same folder as windbg executable, we will dump only pinned and strong gchanldes to identify gc handles increasing over the time because these handles could cause memory leak. Please note that,!gcht (gchandles by type) command is only available in our windbg extension sos2.dll. You could use sos.dll!gchandles to dump gchandles but it won’t give you the objects and their stats by type and you will have to figure out yourself probably by looking at the root.
GCHandles Stats from First Dump
0:000> .load sos2
0:000> !gcht -t p

Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………….
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 185
Statistics:
………………………………………………………
Total 185 objects
GCHandles Stats from Second Dump
0:000> !gcht -t p
Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………………………
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 186
Statistics:
……………………………………………………………..
Total 186 objects

Lets move over since we don’t see anything interesting with gchandles, no. of pinned gchandles remain same and strong gc handles count has increased only by one.

  • We will compare finalize queue stats in dumps, I am only including the interesting objects and the interesting comments for the sake of brevity

Finalize Queue in first dump

0:000> !finalizequeue
generation 2 has 9433 finalizable objects (05501508->0550a86c)
Ready for finalization 0 objects (0550af4c->0550af4c)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     1907 45768 System.Drawing.Bitmap
…………………………………………….
Total 9873 objects

Finalize Queue in second dump

0:000> !finalizequeue
generation 2 has 10545 finalizable objects (05501508->0550b9cc)
Ready for finalization 0 objects (0550bdac->0550bdac)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     2951 70824 System.Drawing.Bitmap
…………………………………………….
Total 10793 objects

Aha, Do we see something interesting here???? Of course, numbers of finalizable objects in generation 2 have increased by almost 1000 and on top of that number of objects ready to be finalized is 0. So why are these objects not getting finalized?

  • We have to find out why System.Drawing.Bitmap is not getting finalized.

As shown in above step,  generation 2 has 9433 finalizable objects (05501508->0550a86c).
We have finalizable objects starting from memory address 05501508 and ending at 0550a86c. You don’t want to dumpheap by type(System.Drawing.Bitmap) to look at the roots to this object, you will have to dump too many objects unless you get lucky. The better way is probably to display the memory and get the address of an object. Size of the System.Drawing.Bitmap object is 24 Bytes so we may be able to get the object address by specifying the address range ending with finalize queue @ 0550a86c. We will subtract 24*4 = 96 bytes(60) from 0550a86c which is 550A80C.
First column is the finalize queue address and the rest are the memory addresses of the objects
0:000> dd 550A80C 0550a86c

0550a80c  17b6e074 17b6e11c 17b6e1c4 17b6e26c
…………………………………………………………………………………….
0550a86c  17b76734
0:000> !do 17b6e074

Name: System.Drawing.Bitmap —-> Make sure this is System.Drawing.Bitmap
MethodTable: 7ae3c9f8
EEClass: 7ade4014
Size: 24(0×18) bytes

0:000> !gcroot -nostacks 17b6e704
DOMAIN(001581B0):HANDLE(Strong):ff11f8:Root:01981b64(System.Threading.Thread)->
………………………………………………………………………………………
01d00f54(MyApp.MyForm)->
160875cc(MyApp.Controls.MyControl)->
1618b578(Infragistics.Win.UltraWinGrid.UltraGridRow)->
14f1a7c0(Infragistics.Win.UltraWinGrid.CellsCollection)->
…………………………………………………………………………………….
17b6e674(Infragistics.Win.UltraWinGrid.UltraGridCell)->
17b6e6f4(Infragistics.Win.UltraWinGrid.UnBoundData)->
17b6e704(System.Drawing.Bitmap)

This is rooted in some strong handles so this is not rooted in finalization queue what that means is object is not ready to be finalized yet as we saw in finalizeQ stats. I am hiding the customer data so basically, we have a windows forms containing user control with infragistics UltraGrid and the System.Drawing.Bitmap is being set in a cell.

Let’s look at the sample code
foreach (UltraGridRow row in rows)
{
row.Cells[someindex] =<bitmap object>
}
This is where we have the problem because if there are let’s say 5000 rows then we are creating 5000 bitmap objects and as long as form is alive these objects will never be disposed. System.Drawing.Bitmap uses unmanaged GDIPlus library and this is not a lightweight object that’s why it was crashing with outofmemory exception and only in a particular scenario but this may go un-noticed during test cycle by QA team unless the test case covers this very particular scenario.
Resolution
I am sure there are many ways to fix it but one easy way to fix is create the drawing objects for rows visible in the client area and handle scroll/resize events to set the image and dispose the objects not in use.

Share