High CPU Usage and Windows Forms application hang with SQLCE database and the SqlCeLockTimeoutException

Problem Description

Windows froms application on a user machine is consuming high cpu resource and application appears to be hung.

Analysis

Usually, the cause of high cpu hang is

1. infinite loop

2. machine is busy

3. High CPU in GC

4. Thread is spinning with sleep time in 100s of milliseconds

We will start with collecting the user mode memory dump of windows forms application. We should always collect more than one dump to analyze hang scenario. I prefer to use adplus script with -r 3 300″ to collect where 3 specifies the number of times ADPlus will run in hang mode and 300 is the interval in seconds between each run.

1. Lets open the memory dump using WinDbg

2. The first command to run to analyze hang is !runaway extension to find out how long each of the threads have been running for

0:000> !runaway 3
User Mode Time
Thread       Time
0:218       0 days 0:08:38.359
8:a50       0 days 0:02:29.968
Kernel Mode Time
Thread       Time
8:a50       0 days 0:03:23.875
0:218       0 days 0:02:06.046

I have listed the interesting threads. Thread # 0 has consumed 2 mins in kernel mode and more than 8 mins in user mode and thread # 8 has consumed nearly equal amount of time around 2:xx in user and kernel mode.

3. Lets look at thread stack of each of thread # 0 and thread # 8

0:000> ~8e!clrstack
OS Thread Id: 0xa50 (8)
ESP       EIP
06d8f604 7c90e4f4 [InlinedCallFrame: 06d8f604] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
06d8f600 7b1d8e48 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
06d8f69c 7b1d8937 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
06d8f6f0 7b1d8781 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
06d8f720 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
06d8f734 05893609 MyApp.Splasher.ShowThread()

We can ignore thread 8 because this is displaying splasher screen and pumping the message.

0:000> ~0e!clrstack
OS Thread Id: 0×218 (0)
ESP       EIP
0012e4f0 7c90e4f4 [NDirectMethodFrameStandalone: 0012e4f0] System.Data.SqlServerCe.NativeMethods.CompileQueryPlan(IntPtr, System.String, System.Data.SqlServerCe.ResultSetOptions, IntPtr[], IntPtr, Int32, IntPtr ByRef, IntPtr)
0012e518 07d72c57 System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()
System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String,

In Thread # 0 ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

We will now look at unmanaged stack

0:000> ~0kb
ChildEBP RetAddr  Args to Child
0012cd18 7c802455 0000007f 00000000 0012cd44 kernel32!SleepEx+0×61
0012cd28 7d57b50e 0000007f 04c105a8 0012cda4 kernel32!Sleep+0xf
0012cd44 7d57ffa0 00001388 00000002 04c105a8 sqlcese35!Session::_WaitForLock+0x5f
0012cd80 7d580b6b 0176a388 01764778 00000001 sqlcese35!Session::_Lock+0×280
0012cdcc 7d5571bb 0012cdf8 0000000c 00000003 sqlcese35!Session::LockRow+0xba
0012ce20 7d56c74e 004ad00c 04c105a8 003ff978 sqlcese35!BaseMover::_SetCurrentBookmark+0×88
0012ce34 7d56cad0 00000000 00000001 04c105a8 sqlcese35!IndexMover::_SetCurrentRow+0×80
0012ceb0 7d56cb52 0000000d 003ffa98 00000001 sqlcese35!IndexMover::_Seek+0×191
0012ced4 7d565e5c 00000000 00000000 04c105a8 sqlcese35!IndexMover::Seek+0x5a
0012cef4 7d57be24 003ff04c 00000000 00000000 sqlcese35!Cursor::Seek+0×88
0012cf24 7d7ba8fa 04bd4788 0180c8ce 0181963c sqlcese35!Session::OpenHistogram+0xcb
0012cf80 7d7b31f2 00000000 0180c8ce 0181963c sqlceqp35!StatMan::LoadHistogram+0×53
0012cfac 7d7b3558 04b9b31c 0180c8a4 0180f874 sqlceqp35!QPIndexSchemaInfo::LoadHistogramHelper+0×57
0012d524 7d79dab5 04b9b31c 0180c8a4 0180c8a4 sqlceqp35!QPIndexSchemaInfo::LoadHistogram+0×62
0012d54c 7d7b8b23 04b9b31c 00024f1c 00000069 sqlceqp35!QPIScanAccessPlan::CostIndexScan+0×56
0012d5fc 7d79dea3 04b9b31c 0012d818 0180c8a4 sqlceqp35!QPIndexList::Matches+0×265
0012d6a8 7d79e339 04b9b31c 0012d818 01802274 sqlceqp35!RelBaseBlockPlan::CreateIScanPlan+0x7d
0012d6f8 7d79f9ff 04b9b31c 0012d818 0012d728 sqlceqp35!RelBaseBlockPlan::CreatePlan+0×130

Let’s understand what’s going on in this call stack

Reader thread calls command.ExecuteReader(CommandBehavior.SingleRow)

ExecuteReader makes a call to ExecuteCommand

ExecuteCommand transtions the instruction to Unmanaged SQLCE code to compile query plan(SqlCeCommand.CompileQueryPlan())

The below steps are getting executed in umanaged

  • Unmanaged code creates the query plan sqlceqp35!RelBaseBlockPlan::CreatePlan
  • Reader Thread’s Query plan has created a query plan
  • Cursor is performing index seek on predicate column = @columnId
  • Index seeker sets the current row during seek
  • sql ce session tries to lock the current row
  • Thread just spins on lock with Sleep time of 127 milliseconds

Since thread sleep time is only 127 milliseconds so that explains the high cpu usage. So why is it spinning on lock forever.

4. Since this is a managed application, we should always look at all the exception objects on managed heap

0:000> !dumpheap -type Exception
Address       MT     Size
………………………………..
173e4aec 07d84d68       76
Statistics:
MT Count TotalSize Class Name
……………………………………………………
07d84d68       32         2432
System.Data.SqlServerCe.SqlCeLockTimeoutException
Total 44 objects

0:000> !pe 173e4aec
Exception object: 173e4aec
Exception type: System.Data.SqlServerCe.SqlCeLockTimeoutException
Message:
InnerException:
StackTrace (generated):
SP IP Function
0012E4A0 07D72D63 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.CompileQueryPlan()+0x24b
0012E560 07D73D4C System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteCommand(System.Data.CommandBehavior, System.String, System.Data.SqlServerCe.ResultSetOptions)+0x23c
0012E5A0 07D73EE6 System_Data_SqlServerCe_ni!System.Data.SqlServerCe.SqlCeCommand.ExecuteReader(System.Data.CommandBehavior)+0×16
0012E5A4 04A8C52D GlobalPlanning_Sync!GlobalPlanning.Sync.PlanetSyncManager.OnSyncProgress(System.Object, Microsoft.Synchronization.Data.SyncProgressEventArgs)+0x81d

Dumping the exception object confirms the same call stack as thread 0

You can get the error details by dumping the heap for

0:000> !dumpheap -type SqlServerCe.SqlCeError
Address       MT     Size
173e4a74 07d826dc       36
total 64 objects
Statistics:
MT    Count    TotalSize Class Name
……………………………………………………………………………………….
07d826dc       32         1152 System.Data.SqlServerCe.SqlCeError
Total 64 objects

You can get the formatted error message by dumping System.Data.SqlServerCe.SqlCeError.formattedMessage

0:000> !do 173762b0
SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 2,Thread id = 536,Process id = 3564,Table name = __SysObjects,Conflict type = s lock (x blocks),Resource = RID: 1197:12 ]

In above error message, you will notice that Resource = RID which is a row identifier that means a single row in a table is locked. Conflict type = s lock which means that this is in shared lock mode for read access.

This is confirmed by dumping SQLCECommand object on managed heap that select statement is being executed.

Another point to note is, with default isolation level SELECT statement in SQL doesn’t require S lock unless it is in a transaction.

SQL CE compact edition online book says that

“When a statement has waited longer than the LOCK_TIMEOUT setting, the blocked statement is canceled automatically, and the error message SSCE_M_LOCKTIMEOUT, “The system timed out waiting for a lock,” is returned to the application.However, SQL Server Compact 3.5 does not roll back or cancel any transaction that contains the statement. The application must have an error handler that can trap the error message SSCE_M_LOCKTIMEOUT. If an application does not trap the error, it can proceed without knowing that an individual statement within a transaction has been canceled. Errors can occur because statements later in the transaction might depend on the statement that was not executed.”

However, if your database file is fragmented or large its possible that your select query may not run with default time out of 5 seconds in desktop so it will keep on throwing locktimeout exception every 5 seconds for every select query.

There is only one connection object on managed heap and no other threads are executing the SQLCECommand so this is most likely a fragmented database because this issue gets resolved after database is compacted.

However, we should also look at thread stack memory to find any hidden exception to make sure there is no other handled exception resulting in not leaving the spinlock

you can get the Thread Environment Block stack address using !teb command and use dds command to dump words and symbols

0:000> !teb
TEB at 7ffdf000
ExceptionList:        0012cd08
StackBase:            00130000
StackLimit:           00126000

0:000> dds 00126000 00130000

………………………………………………………………………………………….

0012df94  7d58272d sqlcese35!SpinLock::Enter+0xd
0012dfa8  7d55bc0b sqlcese35!AutoSpinLock::AutoSpinLock+0×17
0012dfac  7d55bc48 sqlcese35!AutoSpinLock::Leave+0×12
0012dfb4  7d57c150 sqlcese35!Session::GetSyncSession+0x4c
0012dfc8  7d58ebbd sqlcese35!TrackingUtils::GetSyncSessionInfo+0×19
0012dfcc  7d58ebf4 sqlcese35!TrackingUtils::GetSyncSessionInfo+0×50

……………………………………………………………………………………………….

Only interesting bits are shown above for brevity but there is no exception dispatcher found on raw stack memory so that confirms that no exception has been thrown in this thread call stack

Resolution and Workaround

1. There is no way for you to determine in SQL CE whether a resource is locked before locking it, except to attempt to access the data catch the time out exception, so one workaround is to have a configurable SQL CE lock timeout so that when SQL CE database is fragmented over time and when you get an exception you can increase the timeout value

2.  SQL CE database is file based loaded in process’s address space so  SQL CE database can become fragmented over time and fragmentation causes some serious performance issue. To avoid fragmentation,  you can compact the database when locktimeout exception is thrown

However, if the lock is due to coding error in that case review your code path but in this case there is only one connection object with a command object on one thread so this was a fragmentation issue and locktimeout issue did get resolved after compacting the database.