Visual Studio 2010 or WinDbg take longer than usual to load debug symbols.

Do you dislike a situation where applications on your machine have been working fine for weeks and months at a time and all of a sudden they begin to run very slowly? Well, I do too, especially when it comes to applications I use all day & every day.

At any given point in time I usually have Visual Studio and/or WinDbg running and today I noticed that my Visual Studio 2010 started to consume more CPU cycles than it does usually, and it was loading debug symbols for a release mode postmortem dump very, very slowly. First thing I did was to open the same dump in WinDbg, and guess what? WinDbg also was acting exactly the same. It was consuming approximately 30 to 40 percent of CPU cycles while in .reload /f /v and it was taking its sweet time.

On my machine, I use _NT_SYMBOL_PATH environment variable that controls my path to local symbol cache and remote symbol servers, and it is applicable to all debuggers that I use. I must point out that while I did come across several blogs that recommend against using this variable, it isn’t the root cause of the problem I had experienced today. After all, my debuggers were working fine last week (Friday) and not today (Monday).

So, what changed?

While I didn’t change anything, my machine is a member of a corporate domain so it is quite possible that “something” was changed on my machine for me.

Troubleshooting approach: simultaneous sniffer and process monitor capture on my machine while reloading symbols in VS 2010 or WinDbg.

Observations:

I noticed 3 to 7 second delays originating from my machine between each request to a remote symbol server (in this case: Microsoft public symbol server). Meaning: WinDbg or VS 2010 were blocked / busy doing something else before they even get to the point where they ask remote symbol server for a file, if it is missing in my cache.

sniffer

I also noticed excessive registry activity surrounding the registry keys shown below.

Continue reading

WinDbg exhibits a memory leak when you debug postmortem managed dumps

Howdy there!
My name is Olegas and I’ll be blogging here from time to time. Prashant has a very nice collection already and I’ll be adding my 2 cents every once in a while.

Recently I’ve come across an interesting behavior in WinDbg and I decided to look into it a bit further.

The scenario:
• You are debugging a managed dump using WinDbg 6.11.0001.404 on 32bit platform.
• You are trying to dump hundreds of managed objects to inspect their properties. You use a script similar, but not limited to
.foreach (MyObj {!dumpheap -MT 00687320 -short}) {!do MyObj}

The observed behavior:
• After dumping few dozen objects, WinDbg begins to report
<Note: this object has an invalid CLASS field>
Invalid object
• Your 32bt machine begins to respond very slowly and you notice excessive paging.
• Perfmon shows a behavior within WinDbg consistent with a memory leak
Perfmon_Graph

Continue reading

ASP.NET App Slow Response and Application Pool/AppDomain Recycle, Event message: Application is shutting down. Reason: Unknown – Windows Server 2003

Scenario
From time to time, asp.net application response is very slow on Windows Server 2003

Rants and the resolution

After turning on recycle events, logged message in application event log was Event message: Application is shutting down. Reason: Unknown. Slow response is always timed with this message in the application event log so that confirmed that Application Pool is terminating so no wonder asp.net response is slow from time to time.

However, the only missing piece was why? Since, the Reason is unknown :-) . This application pool is configured for web garden with 6 app pools in it so we decided to attach debugger in production box to 2 worker processes.

If you are just starting out with debugging or have not read John Robbins Book on debugging, I would like to stress the followings when using debugger in production environment

1. By Default, ADPlus  writes the call stack on first-chance exception. Walking call stack also results in Symbol loading, symbol loading along with the stack walking causes a performance hit when a debugger is attached. The last thing you want in production environment is to cause performance hit because of  debugger.

2. Don’t just use ADPlus script to attach a debugger to the worker process by name because it will attach the debugger to each worker process in your production server causing  further performance hit.

3. Don’t use DebugDiag in production environment unless you really have a good reason for it.

Continue reading

LoadLibrary failed, Win32 error 0n193 “%1 is not a valid Win32 application.” Please check your debugger configuration and/or network access.

0:000> .loadby sos coreclr
The call to LoadLibrary(c:Program Files (x86)Microsoft Silverlight3.0.40624.0sos) failed, Win32 error 0n193
“%1 is not a valid Win32 application.”
Please check your debugger configuration and/or network access.

Make sure you are not using WinDbg 64 bit version. Silverlight is not 64 bit yet so even if you have a browser running on 64 bit os, sos dll for silverlight coreclr will fail to load on WinDbg 64 bit. Analyze your dump with WinDbg x86 version. I have WinDbg 32 bit and 64 bit both installed on my vista 64 bit os, although I still prefer XP or may be windows 7 from now on.

ProcDump sysiternals tool – really really helpful to create a memory dump based on CPU Usage

As described in Sysinternals documentation http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx

ProcDump is a command-line utility whose primary purpose is monitoring an application for CPU spikes and generating crash dumps during a spike that an administrator or developer can use to determine the cause of the spike. ProcDump also includes hung window monitoring (using the same definition of a window hang that Windows and Task Manager use) and unhandled exception monitoring. It also can serve as a general process dump utility that you can embed in other scripts.

You don’t need to write your own utility to create a memory dump by monitoring performance counter. Don’t forget to use the switch “-ma” to dump full memory(especially for .net app) because by default it only dumps thread and handle.

This is really helpful to get a memory dump based on CPU usage and we could probably get the memory dump without using ADPlus in most of the cases.

syntax to dump full memory given process id is

procdump <process id> -ma

syntax to dump full memory given process id and cpu usage 80%(threshold)

procdump <process id> -ma -c 80

WinDbg meta-command tip to display all the extension commands exported by WinDbg extension

Usually, A WinDbg extension will have the !help command in case you need to look at the supported commands in an extension. However, not all commands may be documented or no documentation at all. In that case you can use Depends or any dissembler to look at the Export section.

But, with .extmatch command, you can achieve the same right in the debugger itself as shown below.

Below is the executed command to display all the extension commands supported by loaded SOS in CLR 4.0

remember the space between sos and *

0:020> .extmatch /D /e c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos *
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.AnalyzeOOM
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.BPMD
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.CLRStack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.COMState
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ClrStack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpArray
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpAssembly
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpClass
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpDomain
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpHeap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpIL
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpLog
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpMD
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpMT
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpModule
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpObj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpRuntimeTypes
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpSig
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpSigElem
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpStack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpStackObjects
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.DumpVC
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Dumplog
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Dumpruntimetypes
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.EEHeap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.EEStack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.EEVersion
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.EHInfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Ehinfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.FinalizeQueue
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.FindAppDomain
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.FindRoots
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Findappdomain
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCHandleLeaks
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCHandleleaks
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCHandles
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCHeapStat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCInfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCRoot
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GCWhere
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GcHeapStat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.GcWhere
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Gchandleleaks
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HandleCLRN
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HeapStat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Help
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistClear
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistInit
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistObj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistObjFind
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistRoot
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.HistStats
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.IP2MD
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ListNearObj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.MinidumpMode
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Minidumpmode
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Name2EE
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ObjSize
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.PrintException
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Printexception
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ProcInfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.RCWCleanupList
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Rcwcleanuplist
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.SOSFlush
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.SaveModule
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.StopOnException
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Stoponexception
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.SyncBlk
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ThreadPool
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ThreadState
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Threads
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Token2EE
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.TraverseHeap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Traverseheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.U
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.VMMap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.VMStat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.VerifyHeap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.VerifyObj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.VerifyStackTrace
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.Verifyheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.WatsonBuckets
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.analyzeoom
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ao
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.bpmd
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.clrstack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.comstate
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.da
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.do
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dso
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumparray
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpassembly
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpclass
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpdomain
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpil
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumplog
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpmd
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpmodule
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpmt
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpobj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpruntimetypes
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpsig
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpsigelem
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpstack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpstackobjects
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.dumpvc
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.eeheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.eestack
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.eeversion
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ehinfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.finalizequeue
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.findappdomain
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.findroots
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.fq
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gchandleleaks
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gchandles
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gcheapstat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gcinfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gcroot
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.gcwhere
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.heapstat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.help
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histclear
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histinit
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histobj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histobjfind
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histroot
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.histstats
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.hof
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.ip2md
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.listnearobj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.lno
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.minidumpmode
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.name2ee
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.objsize
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.pe
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.printexception
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.procinfo
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.rcwcleanuplist
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.savemodule
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.soe
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.sosflush
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.stoponexception
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.syncblk
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.t
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.threadpool
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.threads
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.threadstate
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.token2ee
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.tp
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.traverseheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.u
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.verifyheap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.verifyobj
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.vh
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.vmmap
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.vmstat
!c:\WINDOWS\Microsoft.NET\Framework\v4.0.20506\sos.vo

Interesting WinDbg Extension SOS commands in CLR 4.0/.NET Framework 4.0 CTP, .NET runtime dll renamed and sos commands just got richer

We will review the WinDbg Extension SOS.dll in .NET Framework 4.0 CTP. CLR 4.0 has renamed runtime dll from mscorwks.dll to CLR.DLL, that’s really helpful.

loading SOS dll depending on the location of .net 4.0 runtime aka CLR.DLL, execute the following command

.loadby sos clr

1.  DML Support – YES, finally.  SOS supports DML in .NET 1.1 but it was gone in clr 2.0.  Silverlight CoreCLR supports DML and now .NET framework 4.0 supports it as well.

Execute the following command to turn on DMLfor every command or use /D option

0:003> .prefer_dml 1
DML versions of commands on by default

0:003> !dumpheap /D -type Exception -stat

For people new to WinDbg, Why am I so excited about DML support in SOS?
DML Snapshot

If you look at the above snapshot, you have the link for each MethodTable address which you can just click on to execute the command. No need to type, however not every commands will have the DML support but !dumpobject is another important one, you can just click on object address to dump an object from GC Heap.

2. The following additional extension commands are added

Examining code and stacks

!ThreadState

Examining CLR data structures

!DumpSigElem

Diagnostic Utilities

!VerifyObj
!FindRoots
!HeapStat
!GCWhere
!ListNearObj (lno)
!AnalyzeOOM (ao)

Examining the GC history

!HistInit
!HistStats
!HistRoot
!HistObj
!HistObjFind
!HistClear

!ThreadState Command

When you execute !threads command, you will see the similar output as shown below

PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  310 00161438      a020 Enabled  013b4c64:013b5fe8 00159230     1 MTA
2    2  8c4 0016dab0      b220 Enabled  00000000:00000000 00159230     0 MTA (Finalizer)

First column is your debugger thread id and the second column ID is ManagedThread ID, OSID column is OS thread ID so that means OSID column will be 0 or some garbage when a runtime uses Fiber.

You will see the State column which is a bit flag as shown below(taken from Shared CLI)

TS_Unknown                = 0×00000000,    // threads are initialized this way

TS_AbortRequested         = 0×00000001,    // Abort the thread
TS_GCSuspendPending       = 0×00000002,    // waiting to get to safe spot for GC
TS_UserSuspendPending     = 0×00000004,    // user suspension at next opportunity
TS_DebugSuspendPending    = 0×00000008,    // Is the debugger suspending threads?
TS_GCOnTransitions        = 0×00000010,    // Force a GC on stub transitions (GCStress only)

TS_SuspendUnstarted       = 0×00400000,    // latch a user suspension on an unstarted thread

TS_ThreadPoolThread       = 0×00800000,    // is this a threadpool thread?
TS_TPWorkerThread         = 0×01000000,    // is this a threadpool worker thread?

TS_Interruptible          = 0×02000000,    // sitting in a Sleep(), Wait(), Join()
TS_Interrupted            = 0×04000000,    // was awakened by an interrupt APC. !!! This can be moved to TSNC

TS_CompletionPortThread   = 0×08000000,    // Completion port thread
………………………………………………………………………..

SOS in CLR4.0 has !threadstate command, which tells you exactly the state of the thread given the bit field, the following output shows you the threadstate bit for Worker Thread, Completion Port Thread and Finalizer Thread

0:000> !ThreadState 1009220
Legal to Join
Background
CLR Owns
In Multi Threaded Apartment
Thread Pool Worker Thread
0:000> !ThreadState 800a220
Legal to Join
Background
CoInitialized
In Multi Threaded Apartment
Completion Port Thread
0:000> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Other Important Commands

!findroots – This is a very powerful and interesting command, because it allows you to break into debugee when CLR garbage collect generational objects.

!GCWhere - tells you the generation number along with the GC heap segment, you no longer need to map the object address with the GC heap segment or use any other extension dll

!HeapStat- This is another cool command, this command displays the stat on generational heap including generation sizes

!AnalyzeOOM – displays the detailed informatin on Last System.OutOfMemoryException

I can’t do justice on detailed documentation for each of these commands because SOS !help documentation has done a very good job. You can either look at !help documentation  or read below. I am just copying and pasting from SOS Help documentation

0:020> !help ThreadState
——————————————————————————-
!ThreadState value

The !Threads command outputs, among other things, the state of the thread.
This is a bit field which corresponds to various states the thread is in.
To check the state of the thread, simply pass that bit field from the
output of !Threads into !ThreadState.

Example:
0:003> !Threads
ThreadCount:      2
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
PreEmptive   GC Alloc           Lock
ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0    1  250 0019b068      a020 Disabled 02349668:02349fe8 0015def0     0 MTA
2    2  944 001a6020      b220 Enabled  00000000:00000000 0015def0     0 MTA (Finalizer)
0:003> !ThreadState b220
Legal to Join
Background
CLR Owns
CoInitialized
In Multi Threaded Apartment

Possible thread states:
Thread Abort Requested
GC Suspend Pending
User Suspend Pending
Debug Suspend Pending
GC On Transitions
Legal to Join
Yield Requested
Hijacked by the GC
Blocking GC for Stack Overflow
Background
Unstarted
Dead
CLR Owns
CoInitialized
In Single Threaded Apartment
In Multi Threaded Apartment
Reported Dead
Task Reset
Sync Suspended
Debug Will Sync
Stack Crawl Needed
Suspend Unstarted
Aborted
Thread Pool Worker Thread
Interruptible
Interrupted
Completion Port Thread
Abort Initiated
Finalized
Failed to Start
Detached
0:020> !help DumpSigElem
——————————————————————————-
!DumpSigElem <sigaddr> <moduleaddr>

This command dumps a single element of a signature object.  For most circumstances,
you should use !DumpSig to look at individual signature objects, but if you find a
signature that has been corrupted in some manner you can use !DumpSigElem to read out
the valid portions of it.

If we look at a valid signature object for a method we see the following:
0:000> !dumpsig 0x000007fe`ec20879d 0x000007fe`eabd1000
[DEFAULT] [hasThis] Void (Boolean,String,String)

We can look at the individual elements of this object by adding the offsets into the
object which correspond to the return value and parameters:
0:000> !dumpsigelem 0x000007fe`ec20879d+2 0x000007fe`eabd1000
Void
0:000> !dumpsigelem 0x000007fe`ec20879d+3 0x000007fe`eabd1000
Boolean
0:000> !dumpsigelem 0x000007fe`ec20879d+4 0x000007fe`eabd1000
String
0:000> !dumpsigelem 0x000007fe`ec20879d+5 0x000007fe`eabd1000
String

We can do something similar for fields.  Here is the full signature of a field:
0:000> !dumpsig 0x000007fe`eb7fd8cd 0x000007fe`eabd1000
[FIELD] ValueClass System.RuntimeTypeHandle

Using !DumpSigElem we can find the type of the field by adding the offset of it (1) to
the address of the signature:
0:000> !dumpsigelem 0x000007fe`eb7fd8cd+1 0x000007fe`eabd1000
ValueClass System.RuntimeTypeHandle

!DumpSigElem will also work with generics.  Let a function be defined as follows:
public A Test(IEnumerable<B> n)

The elements of this signature can be obtained by adding offsets into the signature
when calling !DumpSigElem:

0:000> !dumpsigelem 00000000`00bc2437+2 000007ff00043178
__Canon
0:000> !dumpsigelem 00000000`00bc2437+4 000007ff00043178
Class System.Collections.Generic.IEnumerable`1<__Canon>

The actual offsets that you should add are determined by the contents of the
signature itself.  By trial and error you should be able to find various elements
of the signature.

0:020> !help VerifyObj
——————————————————————————-
!VerifyObj <object address>

!VerifyObj is a diagnostic tool that checks the object that is passed as an
argument for signs of corruption.

0:002> !verifyobj 028000ec
object 0x28000ec does not have valid method table

0:002> !verifyobj 0680017c
object 0x680017c: bad member 00000001 at 06800184

0:020> !help FindRoots
——————————————————————————-
!FindRoots -gen <N> | -gen any | <object address>

The “-gen” form causes the debugger to break in the debuggee on the next
collection of the specified generation.  The effect is reset as soon as the
break occurs, in other words, if you need to break on the next collection you
would need to reissue the command.

The last form of this command is meant to be used after the break caused by the
other forms has occurred.  Now the debuggee is in the right state for
!FindRoots to be able to identify roots for objects from the current condemned
generations.

!FindRoots is a diagnostic command that is meant to answer the following
question:

“I see that GCs are happening, however my objects have still not been
collected. Why? Who is holding onto them?”

The process of answering the question would go something like this:

1. Find out the generation of the object of interest using the !GCWhere
command, say it is gen 1:
!GCWhere <object address>

2. Instruct the runtime to stop the next time it collects that generation using
the !FindRoots command:
!FindRoots -gen 1
g

3. When the next GC starts, and has proceeded past the mark phase a CLR
notification will cause a break in the debugger:
(fd0.ec4): CLR notification exception – code e0444143 (first chance)
CLR notification: GC – end of mark phase.
Condemned generation: 1.

4. Now we can use the !FindRoots <object address> to find out the cross
generational references to the object of interest.  In other words, even if the
object is not referenced by any “proper” root it may still be referenced by an
older object (from an older generation), from a generation that has not yet been
scheduled for collection.  At this point !FindRoots will search those older
generations too, and report those roots.
0:002> !findroots 06808094
older generations::Root:  068012f8(AAA.Test+a)->
06808094(AAA.Test+b)

0:020> !help HeapStat
——————————————————————————-
!HeapStat [-inclUnrooted | -iu]

This command shows the generation sizes for each heap and the total, how much free
space there is in each generation on each heap.  If the -inclUnrooted option is
specified the report will include information about the managed objects from the
GC heap that are not rooted anymore.

Sample output:

0:002> !heapstat
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

0:002> !heapstat -inclUnrooted
Heap     Gen0         Gen1         Gen2         LOH
Heap0    177904       12           306956       8784
Heap1    159652       12           12           16
Total    337556       24           306968       8800

Free space:                                                 Percentage
Heap0    28           12           12           64          SOH:  0% LOH:  0%
Heap1    104          12           12           16          SOH:  0% LOH:100%
Total    132          24           24           80

Unrooted objects:                                           Percentage
Heap0    152212       0            306196       0           SOH: 94% LOH:  0%
Heap1    155704       0            0            0           SOH: 97% LOH:  0%
Total    307916       0            306196       0

The percentage column contains a breakout of free or unrooted bytes to total bytes.

0:020> !help GCWhere
——————————————————————————-
!GCWhere <object address>

!GCWhere displays the location in the GC heap of the argument passed in.

0:002> !GCWhere 02800038
Address  Gen Heap segment  begin    allocated size
02800038 2    0   02800000 02800038 0282b740  12

When the argument lies in the managed heap, but is not a valid *object* address
the “size” is displayed as 0:

0:002> !GCWhere 0280003c
Address  Gen Heap segment  begin    allocated size
0280003c 2    0   02800000 02800038 0282b740  0

0:020> !help ListNearObj
——————————————————————————-
!ListNearObj <object address>

!ListNearObj is a diagnostic tool that displays the object preceeding and
succeeding the address passed in:

The command looks for the address in the GC heap that looks like a valid
beginning of a managed object (based on a valid method table) and the object
following the argument address.

0:002> !ListNearObj 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

0:002> !ListNearObj 028000f0
Before: 0x28000ec           72 (0×48      ) System.ExecutionEngineException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency confirmed.

The command considers the heap as “locally consistent” if:
prev_obj_addr + prev_obj_size = arg_addr && arg_obj + arg_size = next_obj_addr
OR
prev_obj_addr + prev_obj_size = next_obj_addr

When the condition is not satisfied:

0:002> !lno 028000ec
Before: 0x28000a4           72 (0×48      ) System.StackOverflowException
After:  0×2800134           72 (0×48      ) System.Threading.ThreadAbortException
Heap local consistency not confirmed.

0:020> !help AnalyzeOOM
——————————————————————————-
!AnalyzeOOM

!AnalyzeOOM displays the info of the last OOM occured on an allocation request to
the GC heap (in Server GC it displays OOM, if any, on each GC heap).

To see the managed exception(s) use the !Threads command which will show you
managed exception(s), if any, on each managed thread. If you do see an
OutOfMemoryException exception you can use the !PrintException command on it.
To get the full callstack use the “kb” command in the debugger for that thread.
For example, to display thread 3′s stack use ~3kb.

OOM exceptions could be because of the following reasons:

1) allocation request to GC heap
in which case you will see JIT_New* on the call stack because managed code called new.
2) other runtime allocation failure
for example, failure to expand the finalize queue when GC.ReRegisterForFinalize is
called.
3) some other code you use throws a managed OOM exception
for example, some .NET framework code converts a native OOM exception to managed
and throws it.

The !AnalyzeOOM command aims to help you with investigating 1) which is the most
difficult because it requires some internal info from GC. The only exception is
we don’t support allocating objects larger than 2GB on CLR v2.0 or prior. And this
command will not display any managed OOM because we will throw OOM right away
instead of even trying to allocate it on the GC heap.

There are 2 legitimate scenarios where GC would return OOM to allocation requests -
one is if the process is running out of VM space to reserve a segment; the other
is if the system is running out physical memory (+ page file if you have one) so
GC can not commit memory it needs. You can look at these scenarios by using performance
counters or debugger commands. For example for the former scenario the “!address
-summary” debugger command will show you the largest free region in the VM. For
the latter scenario you can look at the “Memory\% Committed Bytes In Use” see
if you are running low on commit space. One important thing to keep in mind is
when you do this kind of memory analysis it could an aftereffect and doesn’t
completely agree with what this command tells you, in which case the command should
be respected because it truly reflects what happened during GC.

The other cases should be fairly obvious from the callstack.

Sample output:

0:011> !ao
———Heap 2 ———
Managed OOM occured after GC #28 (Requested to allocate 1234 bytes)
Reason: Didn’t have enough memory to commit
Detail: SOH: Didn’t have enough memory to grow the internal GC datastructures (800000 bytes) -
on GC entry available commit space was 500 MB
———Heap 4 ———
Managed OOM occured after GC #12 (Requested to allocate 100000 bytes)
Reason: Didn’t have enough memory to allocate an LOH segment
Detail: LOH: Failed to reserve memory (16777216 bytes)

0:020> !help FAQ
——————————————————————————-
>> Where can I get the right version of SOS for my build?

If you are running version 1.1 or 2.0 of the CLR, SOS.DLL is installed in the
same directory as the main CLR dll (CLR.DLL). Newer versions of the
Windows Debugger provide a command to make it easy to load the right copy of
SOS.DLL:

“.loadby sos clr”

That will load the SOS extension DLL from the same place that CLR.DLL is
loaded in the process. You shouldn’t attempt to use a version of SOS.DLL that
doesn’t match the version of CLR.DLL. You can find the version of
CLR.DLL by running

“lmvm clr”

in the debugger.  Note that if you are running CoreCLR (e.g. Silverlight)
then you should replace “clr” with “coreclr”.

If you are using a dump file created on another machine, it is a little bit
more complex. You need to make sure the mscordacwks.dll file that came with
that install is on your symbol path, and you need to load the corresponding
version of sos.dll (typing .load <full path to sos.dll> rather than using the
.loadby shortcut). Within the Microsoft corpnet, we keep tagged versions
of mscordacwks.dll, with names like mscordacwks_<architecture>_<version>.dll
that the Windows Debugger can load. If you have the correct symbol path to the
binaries for that version of the Runtime, the Windows Debugger will load the
correct mscordacwks.dll file.

>> I have a chicken and egg problem. I want to use SOS commands, but the CLR
isn’t loaded yet. What can I do?

In the debugger at startup you can type:

“sxe clrn”

Let the program run, and it will stop with the notice

“CLR notification: module ‘mscorlib’ loaded”

At this time you can use SOS commands. To turn off spurious notifications,
type:

“sxd clrn”

>> I got the following error message. Now what?

0:000> .loadby sos clr
0:000> !DumpStackObjects
Failed to find runtime DLL (clr.dll), 0×80004005
Extension commands need clr.dll in order to have something to do.
0:000>

This means that the CLR is not loaded yet, or has been unloaded. You need to
wait until your managed program is running in order to use these commands. If
you have just started the program a good way to do this is to type

bp clr!EEStartup “g @$ra”

in the debugger, and let it run. After the function EEStartup is finished,
there will be a minimal managed environment for executing SOS commands.

>> I have a partial memory minidump, and !DumpObj doesn’t work. Why?

In order to run SOS commands, many CLR data structures need to be traversed.
When creating a minidump without full memory, special functions are called at
dump creation time to bring those structures into the minidump, and allow a
minimum set of SOS debugging commands to work. At this time, those commands
that can provide full or partial output are:

CLRStack
Threads
Help
PrintException
EEVersion

For a minidump created with this minimal set of functionality in mind, you
will get an error message when running any other commands. A full memory dump
(obtained with “.dump /ma <filename>” in the Windows Debugger) is often the
best way to debug a managed program at this level.

>> What other tools can I use to find my bug?

Turn on Managed Debugging Assistants. These enable additional runtime diagnostics,
particularly in the area of PInvoke/Interop. Adam Nathan has written some great
information about that:

http://blogs.msdn.com/adam_nathan/

>> Does SOS support DML?

Yes.  SOS respects the .prefer_dml option in the debugger.  If this setting is
turned on, then SOS will output DML by default.  Alternatively, you may leave
it off and add /D to the beginning of a command to get DML based output for it.
Not all SOS commands support DML output.

0:020> !help HistInit
——————————————————————————-
!HistInit

Before running any of the Hist – family commands you need to initialize the SOS
structures from the stress log saved in the debuggee.  This is achieved by the
HistInit command.

Sample output:

0:001> !HistInit
Attempting to read Stress log
STRESS LOG:
facilitiesToLog  = 0xffffffff
levelToLog       = 6
MaxLogSizePerThread = 0×10000 (65536)
MaxTotalLogSize = 0×1000000 (16777216)
CurrentTotalLogChunk = 9
ThreadsWithLogs  = 3
Clock frequency  = 3.392 GHz
Start time         15:26:31
Last message time  15:26:56
Total elapsed time 25.077 sec
……………………………….
—————————- 2407 total entries —————————–

SUCCESS: GCHist structures initialized

0:020> !help HistStats
——————————————————————————-
!HistStats

HistStat provides a number of garbage collection statistics obtained from the
stress log.

Sample output:

0:003> !HistStats
GCCount    Plugs Promotes   Relocs
———————————–
2296        0       35       86
2295        0       34       85
2294        0       34       85

2286        0       32       83
2285        0       32       83
322        0       23       55
0        0        0        0
Root 01e411b8 relocated multiple times in gc 322
Root 01e411bc relocated multiple times in gc 322

Root 01e413f8 relocated multiple times in gc 322
Root 01e413fc relocated multiple times in gc 322

0:020> !help histroot
——————————————————————————-
!HistRoot <root>

The root value obtained from !HistObjFind can be used to track the movement of
an object through the GCs.

HistRoot provides information related to both promotions and relocations of the
root specified as the argument.

0:003> !HistRoot 01e411b8
GCCount    Value       MT Promoted?                Notes
———————————————————
2296 028970d4 5b6c5cd8       yes
2295 028970d4 5b6c5cd8       yes
2294 028970d4 5b6c5cd8       yes
2293 028970d4 5b6c5cd8       yes
2292 028970d4 5b6c5cd8       yes
2291 028970d4 5b6c5cd8       yes
2290 028970d4 5b6c5cd8       yes
2289 028970d4 5b6c5cd8       yes
2288 028970d4 5b6c5cd8       yes
2287 028970d4 5b6c5cd8       yes
2286 028970d4 5b6c5cd8       yes
2285 028970d4 5b6c5cd8       yes
322 028970e8 5b6c5cd8       yes Duplicate promote/relocs

0:020> !help HistObj
——————————————————————————-
!HistObj <obj_address>

This command examines all stress log relocation records and displays the chain
of GC relocations that may have led to the address passed in as an argument.
Conceptually the output is:

GenN    obj_address   root1, root2, root3,
GenN-1  prev_obj_addr root1, root2,
GenN-2  prev_prev_oa  root1, root4,

Sample output:
0:003> !HistObj 028970d4
GCCount   Object                                    Roots
———————————————————
2296 028970d4 00223fc4, 01e411b8,
2295 028970d4 00223fc4, 01e411b8,
2294 028970d4 00223fc4, 01e411b8,
2293 028970d4 00223fc4, 01e411b8,
2292 028970d4 00223fc4, 01e411b8,
2291 028970d4 00223fc4, 01e411b8,
2290 028970d4 00223fc4, 01e411b8,
2289 028970d4 00223fc4, 01e411b8,
2288 028970d4 00223fc4, 01e411b8,
2287 028970d4 00223fc4, 01e411b8,
2286 028970d4 00223fc4, 01e411b8,
2285 028970d4 00223fc4, 01e411b8,
322 028970d4 01e411b8,
0 028970d4

0:020> !help HistObjFind
——————————————————————————-
!HistObjFind <obj_address>

To examine log entries related to an object whose present address is known one
would use this command. The output of this command contains all entries that
reference the object:

0:003> !HistObjFind 028970d4
GCCount   Object                                  Message
———————————————————
2296 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2296 028970d4 Relocation NEWVALUE for root 00223fc4
2296 028970d4 Relocation NEWVALUE for root 01e411b8

2295 028970d4 Promotion for root 01e411b8 (MT = 5b6c5cd8)
2295 028970d4 Relocation NEWVALUE for root 00223fc4
2295 028970d4 Relocation NEWVALUE for root 01e411b8

0:020> !help HistClear
——————————————————————————-
!HistClear

This command releases any resources used by the Hist-family of commands.
Generally there’s no need to call this explicitly, as each HistInit will first
cleanup the previous resources.

visual studio 2008 memory leak/memory issue on x86 – the operation could not be completed.Not enough storage is available to complete this operation

And No I don’t have a solution for it and probably the only workaround is to make your visual studio Large Address Aware(3GB switch) on x86.

vs2008 error message

Steps to re-create

1. download and unzip http://debuggingblog.com/resources/transcripts.zip

2. open, close the xml file and try to load it the second time

3. If you load the xml file using IE8, you will see the followings once you close it

——————– State SUMMARY ————————–
TotSize (      KB)   Pct(Tots)  Usage
19e6f000 (  424380) : 20.24%   : MEM_COMMIT
f3b4000 (  249552) : 11.90%   : MEM_FREE
56dcd000 ( 1423156) : 67.86%   : MEM_RESERVE

Almost 1.4 GB Memory allocated in GC Segements for xml file is still reserved even after unloading the xml file.

However, visual studio 2008 is another story

0:000> !eeheap -gc
ephemeral segment allocation context: none
segment    begin allocated     size
01830000 01831000  027ecadc 0x00fbbadc(16497372)
12860000 12861000  137616c4 0x00f006c4(15730372)
………………………………………………………………………………..

We have bunch of 16MB GC segments and most of the objects are in gen 2.

0c55d1ec   739459     85777244 Microsoft.XmlEditor.XmlElement
0c559858  1496448     89786880 Microsoft.XmlEditor.Identifier
001f1918   105303     97472784      Free
793308ec  2369315    387475460 System.String
Total 9375571 objects

we have 90+ MB of free blocks and 380+MB in System.String. There are 2.36 million string objects, yeah so you don’t wanna pick each one of the string object to find GC root unless Microsoft or someone is paying you a dime to dump each object and aha a dump a day will make your day for sure.

0:000> !dumpheap -mt 0c9d4134
Address       MT     Size
018f241c 0c9d4134       68
5f610108 0c9d4134       68
total 2 objects
Statistics:
MT    Count    TotalSize Class Name
0c9d4134        2          136 Microsoft.XmlEditor.XmlDocumentProperties
Total 2 objects
0:000> !objsize 018f241c
sizeof(018f241c) =    507372388 (  0x1e3de364) bytes (Microsoft.XmlEditor.XmlDocumentProperties)
0:000> !objsize 5f610108
sizeof(5f610108) =    507371128 (  0x1e3dde78) bytes (Microsoft.XmlEditor.XmlDocumentProperties)

Did you just see that almost 1GB of virtual memory rooted in Microsoft.XmlEditor.XmlDocumentProperties? That’s just outrageous, I mean why would microsoft visual studio take up 1.2 GB of virtual memory to open a 58MB file, although It does make use of schema context cache.

0:000> !gcroot -nostacks 018f241c
DOMAIN(001EC570):HANDLE(RefCnt):16d1b20:Root:018f241c(Microsoft.XmlEditor.XmlDocumentProperties)

GCHandle of type RefCnt is keeping reference to Microsoft.XmlEditor.XmlDocumentProperties

There is an OutOfMemoryException thrown with the following callstack

Exception object: 5ed00a34
Exception type: System.OutOfMemoryException
Message: Insufficient memory to continue the execution of the program.
InnerException: <none>
StackTrace (generated):
SP       IP       Function
0012F5A0 0C97E8B3 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.NativeMethods.ThrowOnFailure(Int32, Int32[])+0x3b
0012F5AC 0C9E94BB Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.GetText()+0x3c
0012F5DC 0C9E9360 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.BeginParse()+0×55
0012F644 0C9ECF38 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.Source.OnIdle(Boolean)+0×80
0012F654 0C9ECE28 Microsoft_VisualStudio_Package_LanguageService_9_0!Microsoft.VisualStudio.Package.LanguageService.OnIdle(Boolean)+0xd8
0012F674 0C9ECCDD Microsoft_XmlEditor!Microsoft.XmlEditor.XmlLanguageService.OnIdle(Boolean)+0×35
0012F684 0C9ECC34 Microsoft_XmlEditor!Microsoft.XmlEditor.Package.FDoIdle(UInt32)+0xc4

Conclusion

I hope this is fixed in Visual Studio 2010, I do need to try it out.

Silverlight App not working as expected in FireFox, IE shows error message “Unhandled Error in Silverlight 2 Application – Element is already the child of another element”

Problem Description

This exception is thrown in a Silverlight 2 App while trying to show and hide System.Windows.Controls.TabItem in System.Windows.Controls.TabControl

Steps to Recreate

browse to http://debuggingblog.com/sl/project1/default.html and click on button “Show Angelina”  3 times

Debugging Silverlight app using WinDbg

We will be analyzing a memory dump of IE on System.InvalidOperationException

1. We have the following two InvalidOperationException exception objects on managed heap

0:007> !dumpheap -type System.InvalidOperationException
Address       MT     Size
1013c79c 0f1ff824       72
1015addc 0f1ff824       72
total 2 objects
Statistics:
MT    Count    TotalSize Class Name
0f1ff824        2          144 System.InvalidOperationException
Total 2 objects

2. Lets get the stack trace when this exception occured

0:007> !pe 1013c79c
Exception object: 1013c79c
Exception type:   System.InvalidOperationException
Message:          Element is already the child of another element.
InnerException:   <none>
StackTrace (generated):
SP       IP       Function
021FF640 0F40F06A !MS.Internal.XcpImports.CheckHResult(UInt32)+0×32
021FF64C 0E3A4CE9 !MS.Internal.XcpImports.SetValue(MS.Internal.INativeCoreTypeWrapper, System.Windows.DependencyProperty, System.Windows.DependencyObject)+0xa9
021FF680 0E3A4118 !MS.Internal.XcpImports.SetValue(MS.Internal.INativeCoreTypeWrapper, System.Windows.DependencyProperty, System.Object)+0×100
021FF71C 0E3A3FBC !System.Windows.DependencyObject.SetObjectValueToCore(System.Windows.DependencyProperty, System.Object)+0x1c4
021FF760 0E3A1CBB !System.Windows.DependencyObject.SetValueInternal(System.Windows.DependencyProperty, System.Object, Boolean, Boolean, Boolean)+0×503
021FF834 0E3A17A1 !System.Windows.DependencyObject.SetValueInternal(System.Windows.DependencyProperty, System.Object)+0×21
021FF848 0E3A1763 !System.Windows.DependencyObject.SetValue(System.Windows.DependencyProperty, System.Object)+0x1b
021FF85C 0E3D68D7 !System.Windows.Controls.ContentControl.set_Content(System.Object)+0×37
021FF870 0E3212CF !SilverlightApplication1.Page.ShowContent()+0×57

021FF880 0E3208BE !SilverlightApplication1.Page.ShowHideAngelina_Click(System.Object, System.Windows.RoutedEventArgs)+0×76
021FF898 0E4099B5 !System.Windows.Controls.Primitives.ButtonBase.OnClick()+0x5d
021FF8B0 0E40993F !System.Windows.Controls.Button.OnClick()+0×47
021FF8C0 0E40986D !System.Windows.Controls.Primitives.ButtonBase.OnMouseLeftButtonUp(System.Windows.Input.MouseButtonEventArgs)+0×85
021FF8D0 0E4097D1 !System.Windows.Controls.Control.OnMouseLeftButtonUp(System.Windows.Controls.Control, System.EventArgs)+0×41
021FF8E0 0E36C887 !MS.Internal.JoltHelper.FireEvent(IntPtr, IntPtr, Int32, System.String)+0x1b7

3. This exception occured while trying to Set the Content of a Control, so lets find out what is this content and who holds this UIElement, lets get the IL of method SilverlightApplication1.Page.ShowContent()

0:007> !dumpil 0f224060
ilAddr = 0f4607f8
IL_0000: ldarg.0
IL_0001: newobj System.Windows.Controls.TabItem::.ctor  // A new TabItem object is created
IL_0006: stfld SilverlightApplication1.Page::item
IL_000b: ldarg.0
IL_000c: ldfld SilverlightApplication1.Page::item
IL_0011: ldstr “Angelina Jolie”
IL_0016: callvirt System.Windows.Controls.TabItem::set_Header
IL_001b: ldarg.0
IL_001c: ldfld SilverlightApplication1.Page::item//Get the reference to a  TabItem object created earlier
IL_0021: ldarg.0
IL_0022: ldfld SilverlightApplication1.Page::content
IL_0027: callvirt System.Windows.Controls.ContentControl::set_Content //Set_Content is called  with SilverlightApplication1.Page object’s data member content
IL_002c: ldarg.0
IL_002d: ldfld SilverlightApplication1.Page::TabList
IL_0032: callvirt System.Windows.Controls.ItemsControl::get_Items
IL_0037: ldarg.0
IL_0038: ldfld SilverlightApplication1.Page::item

We have 2 interesting objects TabItem and SilverlightApplication1.Page::content here

4. First get the address of SilverlightApplication1.Page::content object which is being set in TabItem because that’s what resulting in exception

0:007> !do 1003b3c0
Name:        SilverlightApplication1.Page
Fields:
MT    Field   Offset                 Type VT     Attr    Value Name
0f42e730  400000d       58 …Windows.UIElement  0 instance 1011c0e8 content

0:007> !dumpheap -mt 0e396648
Address       MT     Size
1003ed3c 0e396648      140
101276a0 0e396648      140
1013c050 0e396648      140
total 3 objects
Statistics:
MT    Count    TotalSize Class Name
0e396648        3          420 System.Windows.Controls.TabItem
Total 3 objects

We have 3 TabItem objects on heap and the address of the content object(UIElement) is 1011c0e8

5. The next step to find out is which TabItem object has this UIElement and why is this TabItem object still around.

0:007> !dumpheap -mt 0e396648
Address       MT     Size
1003ed3c 0e396648      140
101276a0 0e396648      140
1013c050 0e396648      140
total 3 objects

0:007> !do 101276a0
Name:        System.Windows.Controls.TabItem
MethodTable: 0e396648
EEClass:     0e3bb768
Size:        140(0x8c) bytes
File:        System.Windows.Controls, Version=2.0.5.0, Culture=neutral
Fields:
MT    Field   Offset                 Type VT     Attr    Value Name

0ec444e8  40002b4       34        System.Object  0 instance 1011c0e8 _treeContent

0:007> !gcroot 101276a0
Note: Roots found on stacks may be false positives. Run “!help gcroot” for
more info.
Scan Thread 7 OSTHread 1710
Scan Thread 27 OSTHread 1200
Scan Thread 28 OSTHread 1278
Scan Thread 31 OSTHread 1488

6. TabItem object with address = 0e396648 has the reference to content object 1011c0e8 and this object is not rooted so that means this object is ready to be garbage collected.

There is already a TabItem object on managed heap holding reference to the same content object which is being assigned to new TabItem object on SilverlightApplication1.Page.ShowHideAngelina_Click(), that’s why we get System.InvaildOperationException with the error message “Element is already the child of another element”

7. You can get the address of Method Description for each of the methods in SilverlightApplication1.Page

0:007> !dumpmt -md 0f224094
————————————–
MethodDesc Table
Entry       MethodDesc      JIT Name
0e320848   0f224038 JIT SilverlightApplication1.Page.ShowHideAngelina_Click(System.Object,
0e321300   0f224058      JIT SilverlightApplication1.Page.HideContent()
0e321278   0f224060      JIT SilverlightApplication1.Page.ShowContent()

Let’s look at the implementation of ShowHideAngelina_Click

0:007> !dumpil 0f224038
ilAddr = 0f46071c
IL_0012: ldstr “Show”
IL_0017: callvirt System.String::Contains
IL_001c: brfalse.s IL_003f
IL_001e: ldloc.0
IL_0039: call SilverlightApplication1.Page::ShowContent
IL_003e: ret
IL_003f: ldloc.0
IL_004a: ldarg.0
IL_004b: call SilverlightApplication1.Page::HideContent
IL_0050: ret

This method checks the Text of the button and calls ‘ShowContent’ if the text =“Show” otherwise calls ‘HideContent’, so let’s look at HideContent implementation

0:007> !dumpil 0f224058
ilAddr = 0f4607d6
IL_0000: ldarg.0
IL_0001: ldfld SilverlightApplication1.Page::item
IL_0006: brfalse.s IL_001f
IL_0008: ldarg.0
IL_0009: ldfld SilverlightApplication1.Page::TabList
IL_000e: callvirt System.Windows.Controls.ItemsControl::get_Items
IL_0013: ldarg.0
IL_0014: ldfld SilverlightApplication1.Page::item
IL_0019: callvirt class [System.Windows]System.Windows.PresentationF◆䷒¥⽫Ż�::Remove
IL_001e: pop
IL_001f: ret

HideContent calls TabList.get_Items and remove the TabItem from System.Windows.Control.TabControl

Resolution

HideContent() method removes the TabItem from TabControl.Items so unless TabItem is Garbage Collected  removed TabItem will still hold the reference to Content UIElement. You can quickly fix this issue by assiging Content = null, when that item is removed as shown below in HideContent

//OLD Buggy Implementation

private void HideContent()
{
if (item != null)
this.TabList.Items.Remove(item);
}

//new implementation to make sure Content is set to null

private void HideContent()
{
item.Content = null; //assign TabItem Content = NULL
if (item != null)
this.TabList.Items.Remove(item);
}

High CPU Usage and Windows Forms application hang with SQLCE database and the SqlCeLockTimeoutException

Problem Description

Windows froms application on a user machine is consuming high cpu resource and application appears to be hung.

Analysis

Usually, the cause of high cpu hang is

1. infinite loop

2. machine is busy

3. High CPU in GC

4. Thread is spinning with sleep time in 100s of milliseconds

We will start with collecting the user mode memory dump of windows forms application. We should always collect more than one dump to analyze hang scenario. I prefer to use adplus script with -r 3 300″ to collect where 3 specifies the number of times ADPlus will run in hang mode and 300 is the interval in seconds between each run.

1. Lets open the memory dump using WinDbg

2. The first command to run to analyze hang is !runaway extension to find out how long each of the threads have been running for

0:000> !runaway 3
User Mode Time
Thread       Time
0:218       0 days 0:08:38.359
8:a50       0 days 0:02:29.968
Kernel Mode Time
Thread       Time
8:a50       0 days 0:03:23.875
0:218       0 days 0:02:06.046

I have listed the interesting threads. Thread # 0 has consumed 2 mins in kernel mode and more than 8 mins in user mode and thread # 8 has consumed nearly equal amount of time around 2:xx in user and kernel mode.

3. Lets look at thread stack of each of thread # 0 and thread # 8

0:000> ~8e!clrstack
OS Thread Id: 0xa50 (8)
ESP       EIP
06d8f604 7c90e4f4 [InlinedCallFrame: 06d8f604] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
06d8f600 7b1d8e48 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
06d8f69c 7b1d8937 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
06d8f6f0 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
06d8f720 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
06d8f734 05893609 MyApp.Splasher.ShowThread()

We can ignore thread 8 because this is displaying splasher screen and pumping the message.

0:000> ~0e!clrstack
OS Thread Id: 0×218 (0)
ESP       EIP
0012e4f0 7c90e4f4 [NDirectMethodFrameStandalone: 0012e4f0] System.Data.SqlServerCe.NativeMethods.CompileQueryPlan(IntPtr, System.String, System.Data.SqlServerCe.ResultSetOptions, IntPtr[], IntPtr, Int32, IntPtr ByRef, IntPtr)
0012e518 07d72c57 System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()
System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String,

In Thread # 0 ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

We will now look at unmanaged stack

0:000> ~0kb
ChildEBP RetAddr  Args to Child
0012cd18 7c802455 0000007f 00000000 0012cd44 kernel32!SleepEx+0×61
0012cd28 7d57b50e 0000007f 04c105a8 0012cda4 kernel32!Sleep+0xf
0012cd44 7d57ffa0 00001388 00000002 04c105a8 sqlcese35!Session::_WaitForLock+0x5f
0012cd80 7d580b6b 0176a388 01764778 00000001 sqlcese35!Session::_Lock+0×280
0012cdcc 7d5571bb 0012cdf8 0000000c 00000003 sqlcese35!Session::LockRow+0xba
0012ce20 7d56c74e 004ad00c 04c105a8 003ff978 sqlcese35!BaseMover::_SetCurrentBookmark+0×88
0012ce34 7d56cad0 00000000 00000001 04c105a8 sqlcese35!IndexMover::_SetCurrentRow+0×80
0012ceb0 7d56cb52 0000000d 003ffa98 00000001 sqlcese35!IndexMover::_Seek+0×191
0012ced4 7d565e5c 00000000 00000000 04c105a8 sqlcese35!IndexMover::Seek+0x5a
0012cef4 7d57be24 003ff04c 00000000 00000000 sqlcese35!Cursor::Seek+0×88
0012cf24 7d7ba8fa 04bd4788 0180c8ce 0181963c sqlcese35!Session::OpenHistogram+0xcb
0012cf80 7d7b31f2 00000000 0180c8ce 0181963c sqlceqp35!StatMan::LoadHistogram+0×53
0012cfac 7d7b3558 04b9b31c 0180c8a4 0180f874 sqlceqp35!QPIndexSchemaInfo::LoadHistogramHelper+0×57
0012d524 7d79dab5 04b9b31c 0180c8a4 0180c8a4 sqlceqp35!QPIndexSchemaInfo::LoadHistogram+0×62
0012d54c 7d7b8b23 04b9b31c 00024f1c 00000069 sqlceqp35!QPIScanAccessPlan::CostIndexScan+0×56
0012d5fc 7d79dea3 04b9b31c 0012d818 0180c8a4 sqlceqp35!QPIndexList::Matches+0×265
0012d6a8 7d79e339 04b9b31c 0012d818 01802274 sqlceqp35!RelBaseBlockPlan::CreateIScanPlan+0x7d
0012d6f8 7d79f9ff 04b9b31c 0012d818 0012d728 sqlceqp35!RelBaseBlockPlan::CreatePlan+0×130

Let’s understand what’s going on in this call stack

Reader thread calls command.ExecuteReader(CommandBehavior.SingleRow)

ExecuteReader makes a call to ExecuteCommand

ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

The below steps are getting executed in umanaged

  • Unmanaged code creates the query plan sqlceqp35!RelBaseBlockPlan::CreatePlan
  • Reader Thread’s Query plan has created a query plan
  • Cursor is performing index seek on predicate column = @columnId
  • Index seeker sets the current row during seek
  • sql ce session tries to lock the current row
  • Thread just spins on lock with Sleep time of 127 milliseconds

Since thread sleep time is only 127 milliseconds so that explains the high cpu usage. So why is it spinning on lock forever.

4. Since this is a managed application, we should always look at all the exception objects on managed heap

0:000> !dumpheap -type Exception
Address       MT     Size
………………………………..
173e4aec 07d84d68       76
Statistics:
MT Count TotalSize Class Name
……………………………………………………
07d84d68       32         2432
System.Data.SqlServerCe.SqlCeLockTimeoutException
Total 44 objects

0:000> !pe 173e4aec
Exception object: 173e4aec
Exception type: System.Data.SqlServerCe.SqlCeLockTimeoutException
Message:
InnerException:
StackTrace (generated):
SP IP Function
0012E4A0 07D72D63 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()+0x24b
0012E560 07D73D4C System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String, System.Data.SqlServerCe.ResultSetOptions)+0x23c
0012E5A0 07D73EE6 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteReader(System.Data.CommandBehavior)+0×16
0012E5A4 04A8C52D GlobalPlanning_Sync!GlobalPlanning.Sync.PlanetSyncManager.OnSyncProgress(System.Object, Microsoft.Synchronization.Data.SyncProgressEventArgs)+0x81d

Dumping the exception object confirms the same call stack as thread 0

You can get the error details by dumping the heap for

0:000> !dumpheap -type SqlServerCe.SqlCeError
Address       MT     Size
173e4a74 07d826dc       36
total 64 objects
Statistics:
MT    Count    TotalSize Class Name
……………………………………………………………………………………….
07d826dc       32         1152 System.Data.SqlServerCe.SqlCeError
Total 64 objects

You can get the formatted error message by dumping System.Data.SqlServerCe.SqlCeError.formattedMessage

0:000> !do 173762b0
SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 2,Thread id = 536,Process id = 3564,Table name = __SysObjects,Conflict type = s lock (x blocks),Resource = RID: 1197:12 ]

In above error message, you will notice that Resource = RID which is a row identifier that means a single row in a table is locked. Conflict type = s lock which means that this is in shared lock mode for read access.

This is confirmed by dumping SQLCECommand object on managed heap that select statement is being executed.

Another point to note is, with default isolation level SELECT statement in SQL doesn’t require S lock unless it is in a transaction.

SQL CE compact edition online book says that

“When a statement has waited longer than the LOCK_TIMEOUT setting, the blocked statement is canceled automatically, and the error message SSCE_M_LOCKTIMEOUT, “The system timed out waiting for a lock,” is returned to the application.However, SQL Server Compact 3.5 does not roll back or cancel any transaction that contains the statement. The application must have an error handler that can trap the error message SSCE_M_LOCKTIMEOUT. If an application does not trap the error, it can proceed without knowing that an individual statement within a transaction has been canceled. Errors can occur because statements later in the transaction might depend on the statement that was not executed.”

However, if your database file is fragmented or large its possible that your select query may not run with default time out of 5 seconds in desktop so it will keep on throwing locktimeout exception every 5 seconds for every select query.

There is only one connection object on managed heap and no other threads are executing the SQLCECommand so this is most likely a fragmented database because this issue gets resolved after database is compacted.

However, we should also look at thread stack memory to find any hidden exception to make sure there is no other handled exception resulting in not leaving the spinlock

you can get the Thread Environment Block stack address using !teb command and use dds command to dump words and symbols

0:000> !teb
TEB at 7ffdf000
ExceptionList:        0012cd08
StackBase:            00130000
StackLimit:           00126000

0:000> dds 00126000 00130000

………………………………………………………………………………………….

0012df94  7d58272d sqlcese35!SpinLock::Enter+0xd
0012dfa8  7d55bc0b sqlcese35!AutoSpinLock::AutoSpinLock+0×17
0012dfac  7d55bc48 sqlcese35!AutoSpinLock::Leave+0×12
0012dfb4  7d57c150 sqlcese35!Session::GetSyncSession+0x4c
0012dfc8  7d58ebbd sqlcese35!TrackingUtils::GetSyncSessionInfo+0×19
0012dfcc  7d58ebf4 sqlcese35!TrackingUtils::GetSyncSessionInfo+0×50

……………………………………………………………………………………………….

Only interesting bits are shown above for brevity but there is no exception dispatcher found on raw stack memory so that confirms that no exception has been thrown in this thread call stack

Resolution and Workaround

1. There is no way for you to determine in SQL CE whether a resource is locked before locking it, except to attempt to access the data catch the time out exception, so one workaround is to have a configurable SQL CE lock timeout so that when SQL CE database is fragmented over time and when you get an exception you can increase the timeout value

2.  SQL CE database is file based loaded in process’s address space so  SQL CE database can become fragmented over time and fragmentation causes some serious performance issue. To avoid fragmentation,  you can compact the database when locktimeout exception is thrown

However, if the lock is due to coding error in that case review your code path but in this case there is only one connection object with a command object on one thread so this was a fragmentation issue and locktimeout issue did get resolved after compacting the database.