A curious case of cookie threading issue

Threading is hard. It’s hard to get right. It’s hard to avoid race condition. Even with experienced developers, it’s not always a given (trust me, I’ve been there).

This time, the problem comes from a report that a customer constantly has high CPU situation, on all instances. Memory dumps were taken and I was able to take a look. As always, high CPU can be result of several causes, most likely thread deadlocks. For educational purposes, let’s take this memory dump step by step.

First steps are as with routine Windbg – open it. use Ctrl + D to start debugging a memory dump, then you would need to run .loadby sos clr to load the clr runtime

If you are debugging a memory dump that is captured on Azure, .loadby sos clr will not work with this error (if your Windows is installed to C:\ drive)

0:000> .loadby sos clr
The call to LoadLibrary(D:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll) failed, Win32 error 0n126
    "The specified module could not be found."
Please check your debugger configuration and/or network access.

Simply fix it by copy the path to sos.dll, replacing D with C, and rerun

.load C:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll

That should fix it.

Next step would be checking the CPU situation by .threadpool command

0:000> !threadpool
CPU utilization: 98%
Worker Thread: Total: 160 Running: 6 Idle: 134 MaxLimit: 32767 MinLimit: 140
Work Request in Queue: 0
Number of Timers: 2
Completion Port Thread:Total: 6 Free: 4 MaxFree: 8 CurrentLimit: 6 MaxLimit: 1000 MinLimit: 4

This confirms a high CPU situation. It’s however worth noting this CPU is for the entire instance, not necessarily for the process (w3wp) only, but it’s highly likely for an Azure App Service instance that the w3wp is the problem here.

The next step would be checking if there is any long running thread

0:000> !runaway
 User Mode Time
  Thread       Time
   39:1e34     2 days 11:44:59.546
   40:219c     2 days 8:08:56.765
   41:2198     1 days 4:53:46.687
   17:1480     0 days 0:19:38.171
   20:1334     0 days 0:14:57.718
   18:1798     0 days 0:12:53.625
   19:12f8     0 days 0:12:05.015
   31:a9c      0 days 0:03:39.093
   27:1494     0 days 0:01:35.406
   36:1e14     0 days 0:01:28.890
    5:1124     0 days 0:00:38.578
    6:1560     0 days 0:00:37.218
    3:ff8      0 days 0:00:34.984

Normally, the first few threads in the list are not that interesting, they are usually timers that usually run from the start. But let’s leave no stone unturned, shall we? Let’s switch to the longest running thread

0:000> ~39s
00007ff9`297d9f59 458bf7          mov     r14d,r15d

Hmm, it does not look like a listener. Let’s see what it actually is with !clrstack

0:039> !clrstack
OS Thread Id: 0x1e34 (39)
        Child SP               IP Call Site
000000740273cff0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
000000740273d080 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
000000740273d0c0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
000000740273d100 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
000000740273d140 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
000000740273d1f0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)
000000740273d2a0 00007ff93153b492 Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder.GetProductPriceViewModel(System.String, Abck.Web.Features.User.UserContext)

That looks interesting. If we move to the second longest running thread, it looks oddly familiar

0:040> !clrstack
OS Thread Id: 0x219c (40)
        Child SP               IP Call Site
00000074be23d060 00007ff926c82f2e System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Contains(System.__Canon)
00000074be23d0d0 00007ff9307aebcd System.Web.HttpCookieCollection.EnsureKeyValidated(System.String, System.String)
00000074be23d110 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000074be23d150 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000074be23d190 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000074be23d1d0 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()

And the third longest running thread

0:041> !clrstack
OS Thread Id: 0x2198 (41)
        Child SP               IP Call Site
00000078597bd8d0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
00000078597bd960 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000078597bd9a0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000078597bd9e0 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000078597bda20 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
00000078597bdad0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)

Any guess on what happened?

It looks like what we have a clear case of infinite loop. There were multiple threads trying to get the same cookie, and under the neath, the HttpCookieCollection.Get is not threadsafe, it tries to access an underlying HashSet<T> without properly locks. While one thread tried to remove items from the hashset, another tried to read it, and messed up the internal hash table. All three threads are then doomed in infinite loop, keep running and use up all CPU resources.

The fix in this case would be as simple as a lock around the code to get the CountryCurrency, but it’s best to be careful. Again, when fixing the problem of lack of lock, it’s easy to run into the problem of too much of lock.

Delete the undelete-able folder – Windows 7, 8, 10

If you are using Windows 7, 8, or 10, sometime there are some folders appear to be super stubborn, and refuse to go away. No matter how much you try, it just can’t be deleted. It’s undelete-able. Any attempts would result in this dreadful dialog box:

“Are you f*cking kidding me? This is my computer and I have the the absolute control over everything” – You said. Naturally, you would go to open the Security tab in Properties dialog to grant the permissions to everyone.

Strangely, the tab is empty! In other words, there is no one granted permission to nothing.

What? Why? How?

The truth is quite shocking: the folder might be already deleted – but there are processes still referring to it – and by that they keep the handle to the folder open. When that happens, you can’t really do anything to the folder without terminating the process first.

How to find the process then?

Just use Process Explorer, a free SysInternals tool from Microsoft.  Download, extract and run the .exe file (procexp.exe if you are running x86 version, or procexp64.exe if you are running 64 bit version of Windows) Click on Find/Find Handle or Dll, or use Ctrl+F shortcut. In this case we’re looking for typings, which is a fairly unique string. You can of course search by the path if the folder name is quite generic (i.e. there might be a lot of them). Wait for a little bit for this to show up. We got you the sneaky process(es):

Now you know which process is holding the handle to the folder, it’s as easy to fix it. Double click on the process and Process Explorer even let you know the parent process. In most of the cases you can have a peaceful exit and everything will be fixed. In worst cases, a forceful close might be needed.


Episerver Commerce performance optimization – part 2

Or lock or no lock – that’s the question.

This is the second part of the series on how can you improve the performance of Episerver Commerce site – or more precisely, to avoid the deadlocks and 100% CPU usage. This is not Commerce specific actually, and you can apply the knowledge and techniques here for a normal CMS site as well.

It’s a common and well-known best practice to store the slow-to-retrieve data in cache. These days memory is cheap – not free – but cheap. Yet it is still much faster than the fastest PCIe SSD in the market (if your site is running on traditional HDD, it’s not even close). And having objects in cache means you won’t have to open the connection to SQL Server, wait for it to read the data and send back to you – which all cost time. And if the object you need is a complex one, for example a Catalog content, you will also save the time needed to construct the object. Even if it’s fast, it is still not instantaneous, and it will cost you both memory and CPU cycles. All in all – caching is the right way to go. But how to get it right?

One common mistake for to have no lock when you load the data for the first time and insert it into cache.

Continue reading “Episerver Commerce performance optimization – part 2”