Interesting WinDbg Extension SOS commands in CLR 4.0/.NET Framework 4.0 CTP, .NET runtime dll renamed and sos commands just got richer

We will review the WinDbg Extension SOS.dll in .NET Framework 4.0 CTP. CLR 4.0 has renamed runtime dll from mscorwks.dll to CLR.DLL, that’s really helpful.

loading SOS dll depending on the location of .net 4.0 runtime aka CLR.DLL, execute the following command

.loadby sos clr

1.  DML Support – YES, finally.  SOS supports DML in .NET 1.1 but it was gone in clr 2.0.  Silverlight CoreCLR supports DML and now .NET framework 4.0 supports it as well.

Execute the following command to turn on DMLfor every command or use /D option

0:003> .prefer_dml 1
DML versions of commands on by default

0:003> !dumpheap /D -type Exception -stat

For people new to WinDbg, Why am I so excited about DML support in SOS?
DML Snapshot

If you look at the above snapshot, you have the link for each MethodTable address which you can just click on to execute the command. No need to type, however not every commands will have the DML support but !dumpobject is another important one, you can just click on object address to dump an object from GC Heap.

2. The following additional extension commands are added

Examining code and stacks

!ThreadState

Examining CLR data structures

!DumpSigElem

Diagnostic Utilities

!VerifyObj
!FindRoots
!HeapStat
!GCWhere
!ListNearObj (lno)
!AnalyzeOOM (ao)

Examining the GC history

!HistInit
!HistStats
!HistRoot
!HistObj
!HistObjFind
!HistClear

!ThreadState Command

When you execute !threads command, you will see the similar output as shown below

PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  310 00161438      a020 Enabled  013b4c64:013b5fe8 00159230     1 MTA
2    2  8c4 0016dab0      b220 Enabled  00000000:00000000 00159230     0 MTA (Finalizer)

First column is your debugger thread id and the second column ID is ManagedThread ID, OSID column is OS thread ID so that means OSID column will be 0 or some garbage when a runtime uses Fiber.

You will see the State column which is a bit flag as shown below(taken from Shared CLI)

TS_Unknown                = 0×00000000,    // threads are initialized this way

TS_AbortRequested         = 0×00000001,    // Abort the thread
TS_GCSuspendPending       = 0×00000002,    // waiting to get to safe spot for GC
TS_UserSuspendPending     = 0×00000004,    // user suspension at next opportunity
TS_DebugSuspendPending    = 0×00000008,    // Is the debugger suspending threads?
TS_GCOnTransitions        = 0×00000010,    // Force a GC on stub transitions (GCStress only)

TS_SuspendUnstarted       = 0×00400000,    // latch a user suspension on an unstarted thread

TS_ThreadPoolThread       = 0×00800000,    // is this a threadpool thread?
TS_TPWorkerThread         = 0×01000000,    // is this a threadpool worker thread?

TS_Interruptible          = 0×02000000,    // sitting in a Sleep(), Wait(), Join()
TS_Interrupted            = 0×04000000,    // was awakened by an interrupt APC. !!! This can be moved to TSNC

TS_CompletionPortThread   = 0×08000000,    // Completion port thread
………………………………………………………………………..

SOS in CLR4.0 has !threadstate command, which tells you exactly the state of the thread given the bit field, the following output shows you the threadstate bit for Worker Thread, Completion Port Thread and Finalizer Thread

0:000> !ThreadState 1009220
Legal to Join
Background
CLR Owns
In Multi Threaded Apartment
Thread Pool Worker Thread
0:000> !ThreadState 800a220
Legal to Join
Background
CoInitialized
In Multi Threaded Apartment
Completion Port Thread
0:000> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Other Important Commands

!findroots – This is a very powerful and interesting command, because it allows you to break into debugee when CLR garbage collect generational objects.

!GCWhere - tells you the generation number along with the GC heap segment, you no longer need to map the object address with the GC heap segment or use any other extension dll

!HeapStat- This is another cool command, this command displays the stat on generational heap including generation sizes

!AnalyzeOOM – displays the detailed informatin on Last System.OutOfMemoryException

I can’t do justice on detailed documentation for each of these commands because SOS !help documentation has done a very good job. You can either look at !help documentation  or read below. I am just copying and pasting from SOS Help documentation

0:020> !help ThreadState
——————————————————————————-
!ThreadState value

The !Threads command outputs, among other things, the state of the thread.
This is a bit field which corresponds to various states the thread is in.
To check the state of the thread, simply pass that bit field from the
output of !Threads into !ThreadState.

Example:
0:003> !Threads
ThreadCount:      2
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  250 0019b068      a020 Disabled 02349668:02349fe8 0015def0     0 MTA
2    2  944 001a6020      b220 Enabled  00000000:00000000 0015def0     0 MTA (Finalizer)
0:003> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Possible thread states:
Thread Abort Requested
GC Suspend Pending
User Suspend Pending
Debug Suspend Pending
GC On Transitions
Legal to Join
Yield Requested
Hijacked by the GC
Blocking GC for Stack Overflow
Background
Unstarted
Dead
CLR Owns
CoInitialized
In Single Threaded Apartment
In Multi Threaded Apartment
Reported Dead
Task Reset
Sync Suspended
Debug Will Sync
Stack Crawl Needed
Suspend Unstarted
Aborted
Thread Pool Worker Thread
Interruptible
Interrupted
Completion Port Thread
Abort Initiated
Finalized
Failed to Start
Detached
0:020> !help DumpSigElem
——————————————————————————-
!DumpSigElem <sigaddr> <moduleaddr>

This command dumps a single element of a signature object.  For most circumstances,
you should use !DumpSig to look at individual signature objects, but if you find a
signature that has been corrupted in some manner you can use !DumpSigElem to read out
the valid portions of it.

If we look at a valid signature object for a method we see the following:
0:000> !dumpsig 0x000007fe`ec20879d 0x000007fe`eabd1000
[DEFAULT] [hasThis] Void (Boolean,String,String)

We can look at the individual elements of this object by adding the offsets into the
object which correspond to the return value and parameters:
0:000> !dumpsigelem 0x000007fe`ec20879d+2 0x000007fe`eabd1000
Void
0:000> !dumpsigelem 0x000007fe`ec20879d+3 0x000007fe`eabd1000
Boolean
0:000> !dumpsigelem 0x000007fe`ec20879d+4 0x000007fe`eabd1000
String
0:000> !dumpsigelem 0x000007fe`ec20879d+5 0x000007fe`eabd1000
String

We can do something similar for fields.  Here is the full signature of a field:
0:000> !dumpsig 0x000007fe`eb7fd8cd 0x000007fe`eabd1000
[FIELD] ValueClass System.RuntimeTypeHandle

Using !DumpSigElem we can find the type of the field by adding the offset of it (1) to
the address of the signature:
0:000> !dumpsigelem 0x000007fe`eb7fd8cd+1 0x000007fe`eabd1000
ValueClass System.RuntimeTypeHandle

!DumpSigElem will also work with generics.  Let a function be defined as follows:
public A Test(IEnumerable<B> n)

The elements of this signature can be obtained by adding offsets into the signature
when calling !DumpSigElem:

0:000> !dumpsigelem 00000000`00bc2437+2 000007ff00043178
__Canon
0:000> !dumpsigelem 00000000`00bc2437+4 000007ff00043178
Class System.Collections.Generic.IEnumerable`1<__Canon>

The actual offsets that you should add are determined by the contents of the
signature itself.  By trial and error you should be able to find various elements
of the signature.

0:020> !help VerifyObj
——————————————————————————-
!VerifyObj <object address>

!VerifyObj is a diagnostic tool that checks the object that is passed as an
argument for signs of corruption.

0:002> !verifyobj 028000ec
object 0x28000ec does not have valid method table

0:002> !verifyobj 0680017c
object 0x680017c: bad member 00000001 at 06800184

0:020> !help FindRoots
——————————————————————————-
!FindRoots -gen <N> | -gen any | <object address>

The “-gen” form causes the debugger to break in the debuggee on the next
collection of the specified generation.  The effect is reset as soon as the
break occurs, in other words, if you need to break on the next collection you
would need to reissue the command.

The last form of this command is meant to be used after the break caused by the
other forms has occurred.  Now the debuggee is in the right state for
!FindRoots to be able to identify roots for objects from the current condemned
generations.

!FindRoots is a diagnostic command that is meant to answer the following
question:

“I see that GCs are happening, however my objects have still not been
collected. Why? Who is holding onto them?”

The process of answering the question would go something like this:

1. Find out the generation of the object of interest using the !GCWhere
command, say it is gen 1:
!GCWhere <object address>

2. Instruct the runtime to stop the next time it collects that generation using
the !FindRoots command:
!FindRoots -gen 1
g

3. When the next GC starts, and has proceeded past the mark phase a CLR
notification will cause a break in the debugger:
(fd0.ec4): CLR notification exception – code e0444143 (first chance)
CLR notification: GC – end of mark phase.
Condemned generation: 1.

4. Now we can use the !FindRoots <object address> to find out the cross
generational references to the object of interest.  In other words, even if the
object is not referenced by any “proper” root it may still be referenced by an
older object (from an older generation), from a generation that has not yet been
scheduled for collection.  At this point !FindRoots will search those older
generations too, and report those roots.
0:002> !findroots 06808094
older generations::Root:  068012f8(AAA.Test+a)->
06808094(AAA.Test+b)

0:020> !help HeapStat
——————————————————————————-
!HeapStat [-inclUnrooted | -iu]

This command shows the generation sizes for each heap and the total, how much free
space there is in each generation on each heap.  If the -inclUnrooted option is
specified the report will include information about the managed objects from the
GC heap that are not rooted anymore.

Sample output:

0:002> !heapstat
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

0:002> !heapstat -inclUnrooted
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

Unrooted objects:                                           Percentage
Heap0    152212       0            306196       0           SOH: 94% LOH:  0%
Heap1    155704       0            0            0           SOH: 97% LOH:  0%
Total    307916       0            306196       0

The percentage column contains a breakout of free or unrooted bytes to total bytes.

0:020> !help GCWhere
——————————————————————————-
!GCWhere <object address>

!GCWhere displays the location in the GC heap of the argument passed in.

0:002> !GCWhere 02800038
Address  Gen Heap segment  begin    allocated size
02800038 2    0   02800000 02800038 0282b740  12

When the argument lies in the managed heap, but is not a valid *object* address
the “size” is displayed as 0:

0:002> !GCWhere 0280003c
Address  Gen Heap segment  begin    allocated size
0280003c 2    0   02800000 02800038 0282b740  0

0:020> !help ListNearObj
——————————————————————————-
!ListNearObj <object address>

!ListNearObj is a diagnostic tool that displays the object preceeding and
succeeding the address passed in:

The command looks for the address in the GC heap that looks like a valid
beginning of a managed object (based on a valid method table) and the object
following the argument address.

0:002> !ListNearObj 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

0:002> !ListNearObj 028000f0
Before: 0x28000ec           72 (0×48      ) System.ExecutionEngineException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

The command considers the heap as “locally consistent” if:
prev_obj_addr + prev_obj_size = arg_addr && arg_obj + arg_size = next_obj_addr
OR
prev_obj_addr + prev_obj_size = next_obj_addr

When the condition is not satisfied:

0:002> !lno 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency not confirmed.

0:020> !help AnalyzeOOM
——————————————————————————-
!AnalyzeOOM

!AnalyzeOOM displays the info of the last OOM occured on an allocation request to
the GC heap (in Server GC it displays OOM, if any, on each GC heap).

To see the managed exception(s) use the !Threads command which will show you
managed exception(s), if any, on each managed thread. If you do see an
OutOfMemoryException exception you can use the !PrintException command on it.
To get the full callstack use the “kb” command in the debugger for that thread.
For example, to display thread 3′s stack use ~3kb.

OOM exceptions could be because of the following reasons:

1) allocation request to GC heap
in which case you will see JIT_New* on the call stack because managed code called new.
2) other runtime allocation failure
for example, failure to expand the finalize queue when GC.ReRegisterForFinalize is
called.
3) some other code you use throws a managed OOM exception
for example, some .NET framework code converts a native OOM exception to managed
and throws it.

The !AnalyzeOOM command aims to help you with investigating 1) which is the most
difficult because it requires some internal info from GC. The only exception is
we don’t support allocating objects larger than 2GB on CLR v2.0 or prior. And this
command will not display any managed OOM because we will throw OOM right away
instead of even trying to allocate it on the GC heap.

There are 2 legitimate scenarios where GC would return OOM to allocation requests -
one is if the process is running out of VM space to reserve a segment; the other
is if the system is running out physical memory (+ page file if you have one) so
GC can not commit memory it needs. You can look at these scenarios by using performance
counters or debugger commands. For example for the former scenario the “!address
-summary” debugger command will show you the largest free region in the VM. For
the latter scenario you can look at the “Memory\% Committed Bytes In Use” see
if you are running low on commit space. One important thing to keep in mind is
when you do this kind of memory analysis it could an aftereffect and doesn’t
completely agree with what this command tells you, in which case the command should
be respected because it truly reflects what happened during GC.

The other cases should be fairly obvious from the callstack.

Sample output:

0:011> !ao
———Heap 2 ———
Managed OOM occured after GC #28 (Requested to allocate 1234 bytes)
Reason: Didn’t have enough memory to commit
Detail: SOH: Didn’t have enough memory to grow the internal GC datastructures (800000 bytes) -
on GC entry available commit space was 500 MB
———Heap 4 ———
Managed OOM occured after GC #12 (Requested to allocate 100000 bytes)
Reason: Didn’t have enough memory to allocate an LOH segment
Detail: LOH: Failed to reserve memory (16777216 bytes)

0:020> !help FAQ
——————————————————————————-
>> Where can I get the right version of SOS for my build?

If you are running version 1.1 or 2.0 of the CLR, SOS.DLL is installed in the
same directory as the main CLR dll (CLR.DLL). Newer versions of the
Windows Debugger provide a command to make it easy to load the right copy of
SOS.DLL:

“.loadby sos clr”

That will load the SOS extension DLL from the same place that CLR.DLL is
loaded in the process. You shouldn’t attempt to use a version of SOS.DLL that
doesn’t match the version of CLR.DLL. You can find the version of
CLR.DLL by running

“lmvm clr”

in the debugger.  Note that if you are running CoreCLR (e.g. Silverlight)
then you should replace “clr” with “coreclr”.

If you are using a dump file created on another machine, it is a little bit
more complex. You need to make sure the mscordacwks.dll file that came with
that install is on your symbol path, and you need to load the corresponding
version of sos.dll (typing .load <full path to sos.dll> rather than using the
.loadby shortcut). Within the Microsoft corpnet, we keep tagged versions
of mscordacwks.dll, with names like mscordacwks_<architecture>_<version>.dll
that the Windows Debugger can load. If you have the correct symbol path to the
binaries for that version of the Runtime, the Windows Debugger will load the
correct mscordacwks.dll file.

>> I have a chicken and egg problem. I want to use SOS commands, but the CLR
isn’t loaded yet. What can I do?

In the debugger at startup you can type:

“sxe clrn”

Let the program run, and it will stop with the notice

“CLR notification: module ‘mscorlib’ loaded”

At this time you can use SOS commands. To turn off spurious notifications,
type:

“sxd clrn”

>> I got the following error message. Now what?

0:000> .loadby sos clr
0:000> !DumpStackObjects
Failed to find runtime DLL (clr.dll), 0×80004005
Extension commands need clr.dll in order to have something to do.
0:000>

This means that the CLR is not loaded yet, or has been unloaded. You need to
wait until your managed program is running in order to use these commands. If
you have just started the program a good way to do this is to type

bp clr!EEStartup “g @$ra”

in the debugger, and let it run. After the function EEStartup is finished,
there will be a minimal managed environment for executing SOS commands.

>> I have a partial memory minidump, and !DumpObj doesn’t work. Why?

In order to run SOS commands, many CLR data structures need to be traversed.
When creating a minidump without full memory, special functions are called at
dump creation time to bring those structures into the minidump, and allow a
minimum set of SOS debugging commands to work. At this time, those commands
that can provide full or partial output are:

CLRStack
Threads
Help
PrintException
EEVersion

For a minidump created with this minimal set of functionality in mind, you
will get an error message when running any other commands. A full memory dump
(obtained with “.dump /ma <filename>” in the Windows Debugger) is often the
best way to debug a managed program at this level.

>> What other tools can I use to find my bug?

Turn on Managed Debugging Assistants. These enable additional runtime diagnostics,
particularly in the area of PInvoke/Interop. Adam Nathan has written some great
information about that:

http://blogs.msdn.com/adam_nathan/

>> Does SOS support DML?

Yes.  SOS respects the .prefer_dml option in the debugger.  If this setting is
turned on, then SOS will output DML by default.  Alternatively, you may leave
it off and add /D to the beginning of a command to get DML based output for it.
Not all SOS commands support DML output.

0:020> !help HistInit
——————————————————————————-
!HistInit

Before running any of the Hist – family commands you need to initialize the SOS
structures from the stress log saved in the debuggee.  This is achieved by the
HistInit command.

Sample output:

0:001> !HistInit
Attempting to read Stress log
STRESS LOG:
facilitiesToLog  = 0xffffffff
levelToLog       = 6
MaxLogSizePerThread = 0×10000 (65536)
MaxTotalLogSize = 0×1000000 (16777216)
CurrentTotalLogChunk = 9
ThreadsWithLogs  = 3
Clock frequency  = 3.392 GHz
Start time         15:26:31
Last message time  15:26:56
Total elapsed time 25.077 sec
……………………………….
—————————- 2407 total entries —————————–

SUCCESS: GCHist structures initialized

0:020> !help HistStats
——————————————————————————-
!HistStats

HistStat provides a number of garbage collection statistics obtained from the
stress log.

Sample output:

0:003> !HistStats
GCCount    Plugs Promotes   Relocs
———————————–
2296        0       35       86
2295        0       34       85
2294        0       34       85

2286        0       32       83
2285        0       32       83
322        0       23       55
0        0        0        0
Root 01e411b8 relocated multiple times in gc 322
Root 01e411bc relocated multiple times in gc 322

Root 01e413f8 relocated multiple times in gc 322
Root 01e413fc relocated multiple times in gc 322

0:020> !help histroot
——————————————————————————-
!HistRoot <root>

The root value obtained from !HistObjFind can be used to track the movement of
an object through the GCs.

HistRoot provides information related to both promotions and relocations of the
root specified as the argument.

0:003> !HistRoot 01e411b8
GCCount    Value       MT Promoted?                Notes
———————————————————
2296 028970d4 5b6c5cd8       yes
2295 028970d4 5b6c5cd8       yes
2294 028970d4 5b6c5cd8       yes
2293 028970d4 5b6c5cd8       yes
2292 028970d4 5b6c5cd8       yes
2291 028970d4 5b6c5cd8       yes
2290 028970d4 5b6c5cd8       yes
2289 028970d4 5b6c5cd8       yes
2288 028970d4 5b6c5cd8       yes
2287 028970d4 5b6c5cd8       yes
2286 028970d4 5b6c5cd8       yes
2285 028970d4 5b6c5cd8       yes
322 028970e8 5b6c5cd8       yes Duplicate promote/relocs

0:020> !help HistObj
——————————————————————————-
!HistObj <obj_address>

This command examines all stress log relocation records and displays the chain
of GC relocations that may have led to the address passed in as an argument.
Conceptually the output is:

GenN    obj_address   root1, root2, root3,
GenN-1  prev_obj_addr root1, root2,
GenN-2  prev_prev_oa  root1, root4,

Sample output:
0:003> !HistObj 028970d4
GCCount   Object                                    Roots
———————————————————
2296 028970d4 00223fc4, 01e411b8,
2295 028970d4 00223fc4, 01e411b8,
2294 028970d4 00223fc4, 01e411b8,
2293 028970d4 00223fc4, 01e411b8,
2292 028970d4 00223fc4, 01e411b8,
2291 028970d4 00223fc4, 01e411b8,
2290 028970d4 00223fc4, 01e411b8,
2289 028970d4 00223fc4, 01e411b8,
2288 028970d4 00223fc4, 01e411b8,
2287 028970d4 00223fc4, 01e411b8,
2286 028970d4 00223fc4, 01e411b8,
2285 028970d4 00223fc4, 01e411b8,
322 028970d4 01e411b8,
0 028970d4

0:020> !help HistObjFind
——————————————————————————-
!HistObjFind <obj_address>

To examine log entries related to an object whose present address is known one
would use this command. The output of this command contains all entries that
reference the object:

0:003> !HistObjFind 028970d4
GCCount   Object                                  Message
———————————————————
2296 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2296 028970d4 Relocation NEWVALUE for root 00223fc4
2296 028970d4 Relocation NEWVALUE for root 01e411b8

2295 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2295 028970d4 Relocation NEWVALUE for root 00223fc4
2295 028970d4 Relocation NEWVALUE for root 01e411b8

0:020> !help HistClear
——————————————————————————-
!HistClear

This command releases any resources used by the Hist-family of commands.
Generally there’s no need to call this explicitly, as each HistInit will first
cleanup the previous resources.

Your email address will not be published. Required fields are marked *

*