Interesting WinDbg Extension SOS commands in CLR 4.0/.NET Framework 4.0 CTP, .NET runtime dll renamed and sos commands just got richer

We will review the WinDbg Extension SOS.dll in .NET Framework 4.0 CTP. CLR 4.0 has renamed runtime dll from mscorwks.dll to CLR.DLL, that’s really helpful.

loading SOS dll depending on the location of .net 4.0 runtime aka CLR.DLL, execute the following command

.loadby sos clr

1.  DML Support – YES, finally.  SOS supports DML in .NET 1.1 but it was gone in clr 2.0.  Silverlight CoreCLR supports DML and now .NET framework 4.0 supports it as well.

Execute the following command to turn on DMLfor every command or use /D option

0:003> .prefer_dml 1
DML versions of commands on by default

0:003> !dumpheap /D -type Exception -stat

For people new to WinDbg, Why am I so excited about DML support in SOS?
DML Snapshot

If you look at the above snapshot, you have the link for each MethodTable address which you can just click on to execute the command. No need to type, however not every commands will have the DML support but !dumpobject is another important one, you can just click on object address to dump an object from GC Heap.

2. The following additional extension commands are added

Examining code and stacks

!ThreadState

Examining CLR data structures

!DumpSigElem

Diagnostic Utilities

!VerifyObj
!FindRoots
!HeapStat
!GCWhere
!ListNearObj (lno)
!AnalyzeOOM (ao)

Examining the GC history

!HistInit
!HistStats
!HistRoot
!HistObj
!HistObjFind
!HistClear

!ThreadState Command

When you execute !threads command, you will see the similar output as shown below

PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  310 00161438      a020 Enabled  013b4c64:013b5fe8 00159230     1 MTA
2    2  8c4 0016dab0      b220 Enabled  00000000:00000000 00159230     0 MTA (Finalizer)

First column is your debugger thread id and the second column ID is ManagedThread ID, OSID column is OS thread ID so that means OSID column will be 0 or some garbage when a runtime uses Fiber.

You will see the State column which is a bit flag as shown below(taken from Shared CLI)

TS_Unknown                = 0×00000000,    // threads are initialized this way

TS_AbortRequested         = 0×00000001,    // Abort the thread
TS_GCSuspendPending       = 0×00000002,    // waiting to get to safe spot for GC
TS_UserSuspendPending     = 0×00000004,    // user suspension at next opportunity
TS_DebugSuspendPending    = 0×00000008,    // Is the debugger suspending threads?
TS_GCOnTransitions        = 0×00000010,    // Force a GC on stub transitions (GCStress only)

TS_SuspendUnstarted       = 0×00400000,    // latch a user suspension on an unstarted thread

TS_ThreadPoolThread       = 0×00800000,    // is this a threadpool thread?
TS_TPWorkerThread         = 0×01000000,    // is this a threadpool worker thread?

TS_Interruptible          = 0×02000000,    // sitting in a Sleep(), Wait(), Join()
TS_Interrupted            = 0×04000000,    // was awakened by an interrupt APC. !!! This can be moved to TSNC

TS_CompletionPortThread   = 0×08000000,    // Completion port thread
………………………………………………………………………..

SOS in CLR4.0 has !threadstate command, which tells you exactly the state of the thread given the bit field, the following output shows you the threadstate bit for Worker Thread, Completion Port Thread and Finalizer Thread

0:000> !ThreadState 1009220
Legal to Join
Background
CLR Owns
In Multi Threaded Apartment
Thread Pool Worker Thread
0:000> !ThreadState 800a220
Legal to Join
Background
CoInitialized
In Multi Threaded Apartment
Completion Port Thread
0:000> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Other Important Commands

!findroots – This is a very powerful and interesting command, because it allows you to break into debugee when CLR garbage collect generational objects.

!GCWhere - tells you the generation number along with the GC heap segment, you no longer need to map the object address with the GC heap segment or use any other extension dll

!HeapStat- This is another cool command, this command displays the stat on generational heap including generation sizes

!AnalyzeOOM – displays the detailed informatin on Last System.OutOfMemoryException

I can’t do justice on detailed documentation for each of these commands because SOS !help documentation has done a very good job. You can either look at !help documentation  or read below. I am just copying and pasting from SOS Help documentation

0:020> !help ThreadState
——————————————————————————-
!ThreadState value

The !Threads command outputs, among other things, the state of the thread.
This is a bit field which corresponds to various states the thread is in.
To check the state of the thread, simply pass that bit field from the
output of !Threads into !ThreadState.

Example:
0:003> !Threads
ThreadCount:      2
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  250 0019b068      a020 Disabled 02349668:02349fe8 0015def0     0 MTA
2    2  944 001a6020      b220 Enabled  00000000:00000000 0015def0     0 MTA (Finalizer)
0:003> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Possible thread states:
Thread Abort Requested
GC Suspend Pending
User Suspend Pending
Debug Suspend Pending
GC On Transitions
Legal to Join
Yield Requested
Hijacked by the GC
Blocking GC for Stack Overflow
Background
Unstarted
Dead
CLR Owns
CoInitialized
In Single Threaded Apartment
In Multi Threaded Apartment
Reported Dead
Task Reset
Sync Suspended
Debug Will Sync
Stack Crawl Needed
Suspend Unstarted
Aborted
Thread Pool Worker Thread
Interruptible
Interrupted
Completion Port Thread
Abort Initiated
Finalized
Failed to Start
Detached
0:020> !help DumpSigElem
——————————————————————————-
!DumpSigElem <sigaddr> <moduleaddr>

This command dumps a single element of a signature object.  For most circumstances,
you should use !DumpSig to look at individual signature objects, but if you find a
signature that has been corrupted in some manner you can use !DumpSigElem to read out
the valid portions of it.

If we look at a valid signature object for a method we see the following:
0:000> !dumpsig 0x000007fe`ec20879d 0x000007fe`eabd1000
[DEFAULT] [hasThis] Void (Boolean,String,String)

We can look at the individual elements of this object by adding the offsets into the
object which correspond to the return value and parameters:
0:000> !dumpsigelem 0x000007fe`ec20879d+2 0x000007fe`eabd1000
Void
0:000> !dumpsigelem 0x000007fe`ec20879d+3 0x000007fe`eabd1000
Boolean
0:000> !dumpsigelem 0x000007fe`ec20879d+4 0x000007fe`eabd1000
String
0:000> !dumpsigelem 0x000007fe`ec20879d+5 0x000007fe`eabd1000
String

We can do something similar for fields.  Here is the full signature of a field:
0:000> !dumpsig 0x000007fe`eb7fd8cd 0x000007fe`eabd1000
[FIELD] ValueClass System.RuntimeTypeHandle

Using !DumpSigElem we can find the type of the field by adding the offset of it (1) to
the address of the signature:
0:000> !dumpsigelem 0x000007fe`eb7fd8cd+1 0x000007fe`eabd1000
ValueClass System.RuntimeTypeHandle

!DumpSigElem will also work with generics.  Let a function be defined as follows:
public A Test(IEnumerable<B> n)

The elements of this signature can be obtained by adding offsets into the signature
when calling !DumpSigElem:

0:000> !dumpsigelem 00000000`00bc2437+2 000007ff00043178
__Canon
0:000> !dumpsigelem 00000000`00bc2437+4 000007ff00043178
Class System.Collections.Generic.IEnumerable`1<__Canon>

The actual offsets that you should add are determined by the contents of the
signature itself.  By trial and error you should be able to find various elements
of the signature.

0:020> !help VerifyObj
——————————————————————————-
!VerifyObj <object address>

!VerifyObj is a diagnostic tool that checks the object that is passed as an
argument for signs of corruption.

0:002> !verifyobj 028000ec
object 0x28000ec does not have valid method table

0:002> !verifyobj 0680017c
object 0x680017c: bad member 00000001 at 06800184

0:020> !help FindRoots
——————————————————————————-
!FindRoots -gen <N> | -gen any | <object address>

The “-gen” form causes the debugger to break in the debuggee on the next
collection of the specified generation.  The effect is reset as soon as the
break occurs, in other words, if you need to break on the next collection you
would need to reissue the command.

The last form of this command is meant to be used after the break caused by the
other forms has occurred.  Now the debuggee is in the right state for
!FindRoots to be able to identify roots for objects from the current condemned
generations.

!FindRoots is a diagnostic command that is meant to answer the following
question:

“I see that GCs are happening, however my objects have still not been
collected. Why? Who is holding onto them?”

The process of answering the question would go something like this:

1. Find out the generation of the object of interest using the !GCWhere
command, say it is gen 1:
!GCWhere <object address>

2. Instruct the runtime to stop the next time it collects that generation using
the !FindRoots command:
!FindRoots -gen 1
g

3. When the next GC starts, and has proceeded past the mark phase a CLR
notification will cause a break in the debugger:
(fd0.ec4): CLR notification exception – code e0444143 (first chance)
CLR notification: GC – end of mark phase.
Condemned generation: 1.

4. Now we can use the !FindRoots <object address> to find out the cross
generational references to the object of interest.  In other words, even if the
object is not referenced by any “proper” root it may still be referenced by an
older object (from an older generation), from a generation that has not yet been
scheduled for collection.  At this point !FindRoots will search those older
generations too, and report those roots.
0:002> !findroots 06808094
older generations::Root:  068012f8(AAA.Test+a)->
06808094(AAA.Test+b)

0:020> !help HeapStat
——————————————————————————-
!HeapStat [-inclUnrooted | -iu]

This command shows the generation sizes for each heap and the total, how much free
space there is in each generation on each heap.  If the -inclUnrooted option is
specified the report will include information about the managed objects from the
GC heap that are not rooted anymore.

Sample output:

0:002> !heapstat
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

0:002> !heapstat -inclUnrooted
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

Unrooted objects:                                           Percentage
Heap0    152212       0            306196       0           SOH: 94% LOH:  0%
Heap1    155704       0            0            0           SOH: 97% LOH:  0%
Total    307916       0            306196       0

The percentage column contains a breakout of free or unrooted bytes to total bytes.

0:020> !help GCWhere
——————————————————————————-
!GCWhere <object address>

!GCWhere displays the location in the GC heap of the argument passed in.

0:002> !GCWhere 02800038
Address  Gen Heap segment  begin    allocated size
02800038 2    0   02800000 02800038 0282b740  12

When the argument lies in the managed heap, but is not a valid *object* address
the “size” is displayed as 0:

0:002> !GCWhere 0280003c
Address  Gen Heap segment  begin    allocated size
0280003c 2    0   02800000 02800038 0282b740  0

0:020> !help ListNearObj
——————————————————————————-
!ListNearObj <object address>

!ListNearObj is a diagnostic tool that displays the object preceeding and
succeeding the address passed in:

The command looks for the address in the GC heap that looks like a valid
beginning of a managed object (based on a valid method table) and the object
following the argument address.

0:002> !ListNearObj 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

0:002> !ListNearObj 028000f0
Before: 0x28000ec           72 (0×48      ) System.ExecutionEngineException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

The command considers the heap as “locally consistent” if:
prev_obj_addr + prev_obj_size = arg_addr && arg_obj + arg_size = next_obj_addr
OR
prev_obj_addr + prev_obj_size = next_obj_addr

When the condition is not satisfied:

0:002> !lno 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency not confirmed.

0:020> !help AnalyzeOOM
——————————————————————————-
!AnalyzeOOM

!AnalyzeOOM displays the info of the last OOM occured on an allocation request to
the GC heap (in Server GC it displays OOM, if any, on each GC heap).

To see the managed exception(s) use the !Threads command which will show you
managed exception(s), if any, on each managed thread. If you do see an
OutOfMemoryException exception you can use the !PrintException command on it.
To get the full callstack use the “kb” command in the debugger for that thread.
For example, to display thread 3′s stack use ~3kb.

OOM exceptions could be because of the following reasons:

1) allocation request to GC heap
in which case you will see JIT_New* on the call stack because managed code called new.
2) other runtime allocation failure
for example, failure to expand the finalize queue when GC.ReRegisterForFinalize is
called.
3) some other code you use throws a managed OOM exception
for example, some .NET framework code converts a native OOM exception to managed
and throws it.

The !AnalyzeOOM command aims to help you with investigating 1) which is the most
difficult because it requires some internal info from GC. The only exception is
we don’t support allocating objects larger than 2GB on CLR v2.0 or prior. And this
command will not display any managed OOM because we will throw OOM right away
instead of even trying to allocate it on the GC heap.

There are 2 legitimate scenarios where GC would return OOM to allocation requests -
one is if the process is running out of VM space to reserve a segment; the other
is if the system is running out physical memory (+ page file if you have one) so
GC can not commit memory it needs. You can look at these scenarios by using performance
counters or debugger commands. For example for the former scenario the “!address
-summary” debugger command will show you the largest free region in the VM. For
the latter scenario you can look at the “Memory\% Committed Bytes In Use” see
if you are running low on commit space. One important thing to keep in mind is
when you do this kind of memory analysis it could an aftereffect and doesn’t
completely agree with what this command tells you, in which case the command should
be respected because it truly reflects what happened during GC.

The other cases should be fairly obvious from the callstack.

Sample output:

0:011> !ao
———Heap 2 ———
Managed OOM occured after GC #28 (Requested to allocate 1234 bytes)
Reason: Didn’t have enough memory to commit
Detail: SOH: Didn’t have enough memory to grow the internal GC datastructures (800000 bytes) -
on GC entry available commit space was 500 MB
———Heap 4 ———
Managed OOM occured after GC #12 (Requested to allocate 100000 bytes)
Reason: Didn’t have enough memory to allocate an LOH segment
Detail: LOH: Failed to reserve memory (16777216 bytes)

0:020> !help FAQ
——————————————————————————-
>> Where can I get the right version of SOS for my build?

If you are running version 1.1 or 2.0 of the CLR, SOS.DLL is installed in the
same directory as the main CLR dll (CLR.DLL). Newer versions of the
Windows Debugger provide a command to make it easy to load the right copy of
SOS.DLL:

“.loadby sos clr”

That will load the SOS extension DLL from the same place that CLR.DLL is
loaded in the process. You shouldn’t attempt to use a version of SOS.DLL that
doesn’t match the version of CLR.DLL. You can find the version of
CLR.DLL by running

“lmvm clr”

in the debugger.  Note that if you are running CoreCLR (e.g. Silverlight)
then you should replace “clr” with “coreclr”.

If you are using a dump file created on another machine, it is a little bit
more complex. You need to make sure the mscordacwks.dll file that came with
that install is on your symbol path, and you need to load the corresponding
version of sos.dll (typing .load <full path to sos.dll> rather than using the
.loadby shortcut). Within the Microsoft corpnet, we keep tagged versions
of mscordacwks.dll, with names like mscordacwks_<architecture>_<version>.dll
that the Windows Debugger can load. If you have the correct symbol path to the
binaries for that version of the Runtime, the Windows Debugger will load the
correct mscordacwks.dll file.

>> I have a chicken and egg problem. I want to use SOS commands, but the CLR
isn’t loaded yet. What can I do?

In the debugger at startup you can type:

“sxe clrn”

Let the program run, and it will stop with the notice

“CLR notification: module ‘mscorlib’ loaded”

At this time you can use SOS commands. To turn off spurious notifications,
type:

“sxd clrn”

>> I got the following error message. Now what?

0:000> .loadby sos clr
0:000> !DumpStackObjects
Failed to find runtime DLL (clr.dll), 0×80004005
Extension commands need clr.dll in order to have something to do.
0:000>

This means that the CLR is not loaded yet, or has been unloaded. You need to
wait until your managed program is running in order to use these commands. If
you have just started the program a good way to do this is to type

bp clr!EEStartup “g @$ra”

in the debugger, and let it run. After the function EEStartup is finished,
there will be a minimal managed environment for executing SOS commands.

>> I have a partial memory minidump, and !DumpObj doesn’t work. Why?

In order to run SOS commands, many CLR data structures need to be traversed.
When creating a minidump without full memory, special functions are called at
dump creation time to bring those structures into the minidump, and allow a
minimum set of SOS debugging commands to work. At this time, those commands
that can provide full or partial output are:

CLRStack
Threads
Help
PrintException
EEVersion

For a minidump created with this minimal set of functionality in mind, you
will get an error message when running any other commands. A full memory dump
(obtained with “.dump /ma <filename>” in the Windows Debugger) is often the
best way to debug a managed program at this level.

>> What other tools can I use to find my bug?

Turn on Managed Debugging Assistants. These enable additional runtime diagnostics,
particularly in the area of PInvoke/Interop. Adam Nathan has written some great
information about that:

http://blogs.msdn.com/adam_nathan/

>> Does SOS support DML?

Yes.  SOS respects the .prefer_dml option in the debugger.  If this setting is
turned on, then SOS will output DML by default.  Alternatively, you may leave
it off and add /D to the beginning of a command to get DML based output for it.
Not all SOS commands support DML output.

0:020> !help HistInit
——————————————————————————-
!HistInit

Before running any of the Hist – family commands you need to initialize the SOS
structures from the stress log saved in the debuggee.  This is achieved by the
HistInit command.

Sample output:

0:001> !HistInit
Attempting to read Stress log
STRESS LOG:
facilitiesToLog  = 0xffffffff
levelToLog       = 6
MaxLogSizePerThread = 0×10000 (65536)
MaxTotalLogSize = 0×1000000 (16777216)
CurrentTotalLogChunk = 9
ThreadsWithLogs  = 3
Clock frequency  = 3.392 GHz
Start time         15:26:31
Last message time  15:26:56
Total elapsed time 25.077 sec
……………………………….
—————————- 2407 total entries —————————–

SUCCESS: GCHist structures initialized

0:020> !help HistStats
——————————————————————————-
!HistStats

HistStat provides a number of garbage collection statistics obtained from the
stress log.

Sample output:

0:003> !HistStats
GCCount    Plugs Promotes   Relocs
———————————–
2296        0       35       86
2295        0       34       85
2294        0       34       85

2286        0       32       83
2285        0       32       83
322        0       23       55
0        0        0        0
Root 01e411b8 relocated multiple times in gc 322
Root 01e411bc relocated multiple times in gc 322

Root 01e413f8 relocated multiple times in gc 322
Root 01e413fc relocated multiple times in gc 322

0:020> !help histroot
——————————————————————————-
!HistRoot <root>

The root value obtained from !HistObjFind can be used to track the movement of
an object through the GCs.

HistRoot provides information related to both promotions and relocations of the
root specified as the argument.

0:003> !HistRoot 01e411b8
GCCount    Value       MT Promoted?                Notes
———————————————————
2296 028970d4 5b6c5cd8       yes
2295 028970d4 5b6c5cd8       yes
2294 028970d4 5b6c5cd8       yes
2293 028970d4 5b6c5cd8       yes
2292 028970d4 5b6c5cd8       yes
2291 028970d4 5b6c5cd8       yes
2290 028970d4 5b6c5cd8       yes
2289 028970d4 5b6c5cd8       yes
2288 028970d4 5b6c5cd8       yes
2287 028970d4 5b6c5cd8       yes
2286 028970d4 5b6c5cd8       yes
2285 028970d4 5b6c5cd8       yes
322 028970e8 5b6c5cd8       yes Duplicate promote/relocs

0:020> !help HistObj
——————————————————————————-
!HistObj <obj_address>

This command examines all stress log relocation records and displays the chain
of GC relocations that may have led to the address passed in as an argument.
Conceptually the output is:

GenN    obj_address   root1, root2, root3,
GenN-1  prev_obj_addr root1, root2,
GenN-2  prev_prev_oa  root1, root4,

Sample output:
0:003> !HistObj 028970d4
GCCount   Object                                    Roots
———————————————————
2296 028970d4 00223fc4, 01e411b8,
2295 028970d4 00223fc4, 01e411b8,
2294 028970d4 00223fc4, 01e411b8,
2293 028970d4 00223fc4, 01e411b8,
2292 028970d4 00223fc4, 01e411b8,
2291 028970d4 00223fc4, 01e411b8,
2290 028970d4 00223fc4, 01e411b8,
2289 028970d4 00223fc4, 01e411b8,
2288 028970d4 00223fc4, 01e411b8,
2287 028970d4 00223fc4, 01e411b8,
2286 028970d4 00223fc4, 01e411b8,
2285 028970d4 00223fc4, 01e411b8,
322 028970d4 01e411b8,
0 028970d4

0:020> !help HistObjFind
——————————————————————————-
!HistObjFind <obj_address>

To examine log entries related to an object whose present address is known one
would use this command. The output of this command contains all entries that
reference the object:

0:003> !HistObjFind 028970d4
GCCount   Object                                  Message
———————————————————
2296 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2296 028970d4 Relocation NEWVALUE for root 00223fc4
2296 028970d4 Relocation NEWVALUE for root 01e411b8

2295 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2295 028970d4 Relocation NEWVALUE for root 00223fc4
2295 028970d4 Relocation NEWVALUE for root 01e411b8

0:020> !help HistClear
——————————————————————————-
!HistClear

This command releases any resources used by the Hist-family of commands.
Generally there’s no need to call this explicitly, as each HistInit will first
cleanup the previous resources.

Share

visual studio 2008 memory leak/memory issue on x86 – the operation could not be completed.Not enough storage is available to complete this operation

And No I don’t have a solution for it and probably the only workaround is to make your visual studio Large Address Aware(3GB switch) on x86.

vs2008 error message

Steps to re-create

1. download and unzip http://debuggingblog.com/resources/transcripts.zip

2. open, close the xml file and try to load it the second time

3. If you load the xml file using IE8, you will see the followings once you close it

——————– State SUMMARY ————————–
TotSize (      KB)   Pct(Tots)  Usage
19e6f000 (  424380) : 20.24%   : MEM_COMMIT
f3b4000 (  249552) : 11.90%   : MEM_FREE
56dcd000 ( 1423156) : 67.86%   : MEM_RESERVE

Almost 1.4 GB Memory allocated in GC Segements for xml file is still reserved even after unloading the xml file.

However, visual studio 2008 is another story

0:000> !eeheap -gc
ephemeral segment allocation context: none
segment    begin allocated     size
01830000 01831000  027ecadc 0x00fbbadc(16497372)
12860000 12861000  137616c4 0x00f006c4(15730372)
………………………………………………………………………………..

We have bunch of 16MB GC segments and most of the objects are in gen 2.

0c55d1ec   739459     85777244 Microsoft.XmlEditor.XmlElement
0c559858  1496448     89786880 Microsoft.XmlEditor.Identifier
001f1918   105303     97472784      Free
793308ec  2369315    387475460 System.String
Total 9375571 objects

we have 90+ MB of free blocks and 380+MB in System.String. There are 2.36 million string objects, yeah so you don’t wanna pick each one of the string object to find GC root unless Microsoft or someone is paying you a dime to dump each object and aha a dump a day will make your day for sure.

0:000> !dumpheap -mt 0c9d4134
Address       MT     Size
018f241c 0c9d4134       68
5f610108 0c9d4134       68
total 2 objects
Statistics:
MT    Count    TotalSize Class Name
0c9d4134        2          136 Microsoft.XmlEditor.XmlDocumentProperties
Total 2 objects
0:000> !objsize 018f241c
sizeof(018f241c) =    507372388 (  0x1e3de364) bytes (Microsoft.XmlEditor.XmlDocumentProperties)
0:000> !objsize 5f610108
sizeof(5f610108) =    507371128 (  0x1e3dde78) bytes (Microsoft.XmlEditor.XmlDocumentProperties)

Did you just see that almost 1GB of virtual memory rooted in Microsoft.XmlEditor.XmlDocumentProperties? That’s just outrageous, I mean why would microsoft visual studio take up 1.2 GB of virtual memory to open a 58MB file, although It does make use of schema context cache.

0:000> !gcroot -nostacks 018f241c
DOMAIN(001EC570):HANDLE(RefCnt):16d1b20:Root:018f241c(Microsoft.XmlEditor.XmlDocumentProperties)

GCHandle of type RefCnt is keeping reference to Microsoft.XmlEditor.XmlDocumentProperties

There is an OutOfMemoryException thrown with the following callstack

Exception object: 5ed00a34
Exception type: System.OutOfMemoryException
Message: Insufficient memory to continue the execution of the program.
InnerException: <none>
StackTrace (generated):
SP       IP       Function
0012F5A0 0C97E8B3 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.NativeMethods.ThrowOnFailure(Int32, Int32[])+0x3b
0012F5AC 0C9E94BB Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.GetText()+0x3c
0012F5DC 0C9E9360 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.BeginParse()+0×55
0012F644 0C9ECF38 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.OnIdle(Boolean)+0×80
0012F654 0C9ECE28 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.LanguageService.OnIdle(Boolean)+0xd8
0012F674 0C9ECCDD Microsoft_XmlEditor!Microsoft.XmlEditor.XmlLanguageService.OnIdle(Boolean)+0×35
0012F684 0C9ECC34 Microsoft_XmlEditor!Microsoft.XmlEditor.Package.FDoIdle(UInt32)+0xc4

Conclusion

I hope this is fixed in Visual Studio 2010, I do need to try it out.

Share

ASP.NET Worker Process Recycle/.NET Application crash – What not to do in Finalizer – includes asp.net 2.0 debugging lab

Click to Download debugging lab

Debugging Labs

SFTSRC.StockTrader asp.net 2.0 web application will be used to introduce you to a series of debugging labs. We will start with a simple asp.net application which crashes and follow up with memory leak, memory fragmentation, application performance and application hang.

The first lab in series will show you how an application can crash if you are not being careful while implementing finalizer.

Issue Description

Website user gets “Server Application Unavailable error” message from time to time.

ASP.NET Debugging Lab

This lab is prepared based on a production issue resolved at a customer site. Please note that you can download this debugging lab from here. Web project/solution has been created using visual studio 2008.

What will you learn

1. How to debug asp.net application

2. How to identify what is causing asp.net application pool to recycle

3. WinDbg / sos basic commands to analyze a crash dump

Application Architecture

architecture

Breaking down your thought process

1.  Since website user gets to see a generic error handler page, you will need to identify why http request is failing.

2. As shown in the above diagram, website and webservice are hosted on two different servers, so you need to identify the server where request is failing

3. You should look at the http error logs/iis logs and the events logs on each server to identify the issue

4. Logs will also help you determine if there is a network issue

Analysis

After looking at the server logs, we have determined that worker process is getting recycled on a sever hosting Sftsrc Stock Trading WebService. A worker process can get recycled depending on the configuration, low on virtual memory, crash due to unhandled exception etc. Let’s assume that you don’t have any information on what has caused worker process to recycle

Let’s look at the lab which is prepared based on the above issue description.

Lab

We will analyze this issue using Lab01- Download it from here.

1. Once you download the lab, create 2 virtual directory “StockTrader” and “TradeWS”

2. “StockTrader” will be set to folder “\Lab01\SFTSRC.StockTrader”

3. “TradeWS” will be set to folder “\Lab01\SFTSRC.UnmanagedService”

4. Browse to http://localhost/StockTrader/Default.aspx

You will see the website as shown below
debugging lab 01 snapshot

5. Enter Stock Symbol, Quantity, Price and click on Submit Order

6. Once you submit an order, you will see “Server Unavailable Error” page in a few seconds.

7. If you look at the iis log, you will see http 503 error

please visit http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for description on HTTP 1.1 status code

8. Event log will have the error message “asp.net process recycle or stopped unexpectedly.”

9. Attach a debugger to your worker process

10. we will use adplus configuration file to get a crash dump on first chance access violation.

11. download accessviolation.cfg from here

12. go to your windbg folder and run the following command

cscript.exe adplus.vbs -c accessviolation.cfg

13. Browse to http://localhost/StockTrader/Default.aspx again

14. Once you get “server unavailable error”, you should have the dump files created under your windbg folder

15. Please note that adplus configuration file has  <ProcessName>aspnet_wp.exe</ProcessName> by default, if you are using windows server then change it to <ProcessName>w3wp.exe</ProcessName>. You may have multiple dumps created if you have more than one application pool. You can identify the worker process id for the application pool hosting TradeWS and then you can create a dump based on process id.

16. Open the dump file( *****.EXE__1st_chance_AccessViolation__full_*****c.dmp) using windbg

17.  Run following command to load sos dll

0:009> .loadby sos mscorwks

18. You should always run !analyze -v when you are doing a crash dump analysis, most of the time this one command is enough to tell you what has caused the crash

0:009> !analyze -v

Attempt to read from address 00000000

DEFAULT_BUCKET_ID:  NULL_POINTER_READ

PROCESS_NAME:  aspnet_wp.exe

MANAGED_STACK:
0195F948 05CD0848 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.GetTLS()+0×20
0195F950 05CD07E6 SFTSRC_UnmanagedService!SFTSRC.UnmanagedService.UnmanagedService.Finalize()+0x1e

I have omitted most of the output for the sake of brevity.

19. You will notice that application has crashed while executing Finalize

20. On a side note you can look at all the Exception objects on managed heap with the following command

0:009> !dumpheap -type Exception

21. disassemble managed method Finalize using IP

0:009> !u 05cd07e6

05cd07e0 ff15c005cc05    call    dword ptr ds:[5CC05C0h] (SFTSRC.UnmanagedService.UnmanagedService.GetTLS(), mdToken: 06000028)
>>> 05cd07e6 8bce            mov     ecx,esi

· You will notice that Exception is occurring while calling GetTLS, where it tries to assign the return value

22. On a side note, you can use View->Registers window to look at registers for the current thread or use r command

23. disassemble managed method GetTLS using IP

0:009> !u 05cd0848
05cd0841 e81a085b73      call    mscorlib_ni+0x1c1060 (79281060) (System.Threading.Thread.GetData(System.LocalDataStoreSlot), mdToken: 060012f3)
05cd0846 8bc8            mov     ecx,eax
>>> 05cd0848 8b01            mov     eax,dword ptr [ecx]

24. You will notice that this is occurring right while calling to System.Threading.Thread.GetData() and ecx register is null

25. SOS dll has powerful command to look at clr data structure. We are going to look at GetTLS() implementation by dumping the IL(intermediate language) of this method

26. in order to dumpil, you have to first convert instruction pointer IP to Method Descriptor

0:009> !ip2md 05cd0848
MethodDesc: 05cc05b8
Method Name: SFTSRC.UnmanagedService.UnmanagedService.GetTLS()
Class: 05c54628

0:009> !dumpil 05cc05b8
ilAddr = 05c72250
IL_0000: ldstr “ThreadSpecificData”
IL_0005: call System.Threading.Thread::GetNamedDataSlot
IL_000a: stloc.0
IL_000b: ldloc.0
IL_000c: call System.Threading.Thread::GetData
IL_0011: callvirt System.Object::ToString
IL_0016: callvirt System.String::get_Length
IL_001b: ret

27. As shown in above IL, it has crashed while converting System.Threading.Thread::GetData object to String so that means System.Threading.Thread::GetData is returning NULL

Conclusion

Application Pool crashed on finalizer thread while executing Thread.GetData(). If you are not familiar with Thread Local Storage, please read it on MSDN http://msdn.microsoft.com/en-us/library/ms686749.aspx . Thread.GetData get the allocated data from thread slot but the issue here is, thread data slots are unique per thread. No other thread (not even a child thread) can get that data. Please see more details on MSDN http://msdn.microsoft.com/en-us/library/system.threading.thread.getdata.aspx . We all know that Finalizer runs on a separate thread and a finalizer thread wakes up when it the Q is notified.

You will see that in SFTSRC.UnmanagedService.UnmanagedService.CreateTLS() method, we are allocating the data slot which is in a totally different thread that’s why Finalizer thread doesn’t have access to thread specific data and resulting in crash.

You can run the following command to look at all the thread stacks

0:009> ~*e!clrstack

OS Thread Id: 0x11cc (20)
ESP       EIP
05c4ec64 7c90e4f4 [HelperMethodFrame: 05c4ec64] System.GC.WaitForPendingFinalizers()
05c4ecb4 05cd07aa SFTSRC.UnmanagedService.UnmanagedService.CreateTLS()
05c4ecc0 05cd06e8 TradeWebService.OrderService.SendOrder(System.String, UInt32, Int32, System.String, System.String)

You will notice that, while sending the order, we create a TLS and then in order to force GC and Finalizer, CreateTLS() method calls GC.Collect and GC.WaitForPendingFinalizers.

Lesson Learnt

Never access a thread specific data in Finalizer. This lab uses Thread Local Storage to show you the thread specific data but it could be anything. For example, in windows application, thread which creates a handle or control owns it so you can’t access those in finalizer.

Exercise

1. use !tls command to look at Thread Local Storage slot

2. use !teb to get familiar with thread environment block and how to use it with !tls

3. ~ command to display all the threads and get their corresponding TEB

4. Search memory to find which thread has created the slot

Download debugging lab

Share

.NET Crash/OutofMemoryException/Memory Leak – .NET windows forms and infragistics datagrid and why is System.Drawing.Image object not getting finalized??

Issue Description
Windows forms application has crashed with OOM exception. Before application crashes, cpu is almost pegged at 100% for a few minutes

Root Cause Analysis using WinDbg

Collect full memory dump at set intervals

  • You could get a crash dump and analyze the managed heap to find out rooted objects. But, since we have access to the system I prefer to get a dump at set intervals and compare the managed heap statistics because that makes it a little easier to find the objects which are surviving GC over a period of time.
  • We will use ADPlus to automate this task
  • I will run the script to get a full memory dump 4 times every 2 minutes
  • Command to automate this task is “cscript.exe adplus.vbs -hang -pn <myapp.exe> -quiet -r 4 120″
  • First Dump file size is around 800MB which also indicates process’s memory usage at that time
  • Second Dump file size is around 1.2 GB
  • Third Dump file size is around 1.6 GB and a little later application has crashed.

This is a pure .net application, so we are going to jump ahead and look at the managed heap stats, gc handles and the objects in finalize queue. We will use sos2.dll copied under the same folder as windbg executable, we will dump only pinned and strong gchanldes to identify gc handles increasing over the time because these handles could cause memory leak. Please note that,!gcht (gchandles by type) command is only available in our windbg extension sos2.dll. You could use sos.dll!gchandles to dump gchandles but it won’t give you the objects and their stats by type and you will have to figure out yourself probably by looking at the root.
GCHandles Stats from First Dump
0:000> .load sos2
0:000> !gcht -t p

Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………….
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 185
Statistics:
………………………………………………………
Total 185 objects
GCHandles Stats from Second Dump
0:000> !gcht -t p
Pinned GC Handle Statistics:
Pinned Handles: 60
Statistics:
………………………………………………………………
Total 60 objects
0:000> !gcht -t s
Strong GC Handle Statistics:
Strong Handles: 186
Statistics:
……………………………………………………………..
Total 186 objects

Lets move over since we don’t see anything interesting with gchandles, no. of pinned gchandles remain same and strong gc handles count has increased only by one.

  • We will compare finalize queue stats in dumps, I am only including the interesting objects and the interesting comments for the sake of brevity

Finalize Queue in first dump

0:000> !finalizequeue
generation 2 has 9433 finalizable objects (05501508->0550a86c)
Ready for finalization 0 objects (0550af4c->0550af4c)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     1907 45768 System.Drawing.Bitmap
…………………………………………….
Total 9873 objects

Finalize Queue in second dump

0:000> !finalizequeue
generation 2 has 10545 finalizable objects (05501508->0550b9cc)
Ready for finalization 0 objects (0550bdac->0550bdac)
Statistics:
MT    Count    TotalSize Class Name
7ae3c9f8     2951 70824 System.Drawing.Bitmap
…………………………………………….
Total 10793 objects

Aha, Do we see something interesting here???? Of course, numbers of finalizable objects in generation 2 have increased by almost 1000 and on top of that number of objects ready to be finalized is 0. So why are these objects not getting finalized?

  • We have to find out why System.Drawing.Bitmap is not getting finalized.

As shown in above step,  generation 2 has 9433 finalizable objects (05501508->0550a86c).
We have finalizable objects starting from memory address 05501508 and ending at 0550a86c. You don’t want to dumpheap by type(System.Drawing.Bitmap) to look at the roots to this object, you will have to dump too many objects unless you get lucky. The better way is probably to display the memory and get the address of an object. Size of the System.Drawing.Bitmap object is 24 Bytes so we may be able to get the object address by specifying the address range ending with finalize queue @ 0550a86c. We will subtract 24*4 = 96 bytes(60) from 0550a86c which is 550A80C.
First column is the finalize queue address and the rest are the memory addresses of the objects
0:000> dd 550A80C 0550a86c

0550a80c  17b6e074 17b6e11c 17b6e1c4 17b6e26c
…………………………………………………………………………………….
0550a86c  17b76734
0:000> !do 17b6e074

Name: System.Drawing.Bitmap —-> Make sure this is System.Drawing.Bitmap
MethodTable: 7ae3c9f8
EEClass: 7ade4014
Size: 24(0×18) bytes

0:000> !gcroot -nostacks 17b6e704
DOMAIN(001581B0):HANDLE(Strong):ff11f8:Root:01981b64(System.Threading.Thread)->
………………………………………………………………………………………
01d00f54(MyApp.MyForm)->
160875cc(MyApp.Controls.MyControl)->
1618b578(Infragistics.Win.UltraWinGrid.UltraGridRow)->
14f1a7c0(Infragistics.Win.UltraWinGrid.CellsCollection)->
…………………………………………………………………………………….
17b6e674(Infragistics.Win.UltraWinGrid.UltraGridCell)->
17b6e6f4(Infragistics.Win.UltraWinGrid.UnBoundData)->
17b6e704(System.Drawing.Bitmap)

This is rooted in some strong handles so this is not rooted in finalization queue what that means is object is not ready to be finalized yet as we saw in finalizeQ stats. I am hiding the customer data so basically, we have a windows forms containing user control with infragistics UltraGrid and the System.Drawing.Bitmap is being set in a cell.

Let’s look at the sample code
foreach (UltraGridRow row in rows)
{
row.Cells[someindex] =<bitmap object>
}
This is where we have the problem because if there are let’s say 5000 rows then we are creating 5000 bitmap objects and as long as form is alive these objects will never be disposed. System.Drawing.Bitmap uses unmanaged GDIPlus library and this is not a lightweight object that’s why it was crashing with outofmemory exception and only in a particular scenario but this may go un-noticed during test cycle by QA team unless the test case covers this very particular scenario.
Resolution
I am sure there are many ways to fix it but one easy way to fix is create the drawing objects for rows visible in the client area and handle scroll/resize events to set the image and dispose the objects not in use.

Share