A curious case of cookie threading issue

Threading is hard. It’s hard to get right. It’s hard to avoid race condition. Even with experienced developers, it’s not always a given (trust me, I’ve been there).

This time, the problem comes from a report that a customer constantly has high CPU situation, on all instances. Memory dumps were taken and I was able to take a look. As always, high CPU can be result of several causes, most likely thread deadlocks. For educational purposes, let’s take this memory dump step by step.

First steps are as with routine Windbg – open it. use Ctrl + D to start debugging a memory dump, then you would need to run .loadby sos clr to load the clr runtime

If you are debugging a memory dump that is captured on Azure, .loadby sos clr will not work with this error (if your Windows is installed to C:\ drive)

0:000> .loadby sos clr
The call to LoadLibrary(D:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll) failed, Win32 error 0n126
    "The specified module could not be found."
Please check your debugger configuration and/or network access.

Simply fix it by copy the path to sos.dll, replacing D with C, and rerun

.load C:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll

That should fix it.

Next step would be checking the CPU situation by .threadpool command

0:000> !threadpool
CPU utilization: 98%
Worker Thread: Total: 160 Running: 6 Idle: 134 MaxLimit: 32767 MinLimit: 140
Work Request in Queue: 0
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 6 Free: 4 MaxFree: 8 CurrentLimit: 6 MaxLimit: 1000 MinLimit: 4

This confirms a high CPU situation. It’s however worth noting this CPU is for the entire instance, not necessarily for the process (w3wp) only, but it’s highly likely for an Azure App Service instance that the w3wp is the problem here.

The next step would be checking if there is any long running thread

0:000> !runaway
 User Mode Time
  Thread       Time
   39:1e34     2 days 11:44:59.546
   40:219c     2 days 8:08:56.765
   41:2198     1 days 4:53:46.687
   17:1480     0 days 0:19:38.171
   20:1334     0 days 0:14:57.718
   18:1798     0 days 0:12:53.625
   19:12f8     0 days 0:12:05.015
   31:a9c      0 days 0:03:39.093
   27:1494     0 days 0:01:35.406
   36:1e14     0 days 0:01:28.890
    5:1124     0 days 0:00:38.578
    6:1560     0 days 0:00:37.218
    3:ff8      0 days 0:00:34.984

Normally, the first few threads in the list are not that interesting, they are usually timers that usually run from the start. But let’s leave no stone unturned, shall we? Let’s switch to the longest running thread

0:000> ~39s
00007ff9`297d9f59 458bf7          mov     r14d,r15d

Hmm, it does not look like a listener. Let’s see what it actually is with !clrstack

0:039> !clrstack
OS Thread Id: 0x1e34 (39)
        Child SP               IP Call Site
000000740273cff0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
000000740273d080 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
000000740273d0c0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
000000740273d100 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
000000740273d140 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
000000740273d1f0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)
000000740273d2a0 00007ff93153b492 Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder.GetProductPriceViewModel(System.String, Abck.Web.Features.User.UserContext)

That looks interesting. If we move to the second longest running thread, it looks oddly familiar

0:040> !clrstack
OS Thread Id: 0x219c (40)
        Child SP               IP Call Site
00000074be23d060 00007ff926c82f2e System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Contains(System.__Canon)
00000074be23d0d0 00007ff9307aebcd System.Web.HttpCookieCollection.EnsureKeyValidated(System.String, System.String)
00000074be23d110 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000074be23d150 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000074be23d190 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000074be23d1d0 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()

And the third longest running thread

0:041> !clrstack
OS Thread Id: 0x2198 (41)
        Child SP               IP Call Site
00000078597bd8d0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
00000078597bd960 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000078597bd9a0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000078597bd9e0 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000078597bda20 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
00000078597bdad0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)

Any guess on what happened?

It looks like what we have a clear case of infinite loop. There were multiple threads trying to get the same cookie, and under the neath, the HttpCookieCollection.Get is not threadsafe, it tries to access an underlying HashSet<T> without properly locks. While one thread tried to remove items from the hashset, another tried to read it, and messed up the internal hash table. All three threads are then doomed in infinite loop, keep running and use up all CPU resources.

The fix in this case would be as simple as a lock around the code to get the CountryCurrency, but it’s best to be careful. Again, when fixing the problem of lack of lock, it’s easy to run into the problem of too much of lock.

Potential performance issue with Maxmind.db

From time to time, I have to dig into some customers’ profiler traces to figure out why their site is slow (yeah, if you follow me, you’d know that’s kind of my main job). There are multiple issues that can eat your website performance for breakfast, from loading too much content, to unmaintained database indexes. While my blog does not cover everything, I think you can get a good grasp of what mistakes to avoid.

But sometimes the problem might come from a 3rd party library/framework. It’s not new, as we have seen it with A curious case of memory dump diagnostic: How Stackify can cause troubles to your site – Quan Mai’s blog (vimvq1987.com). The problem with those types of issues is that they are usually overlooked.

The library we’ll be investigating today would be Maxmind.db. To be honest, I’ve never used it my own, but it seems to be a very popular choice to geography-map the visitors. It’s usually used by Optimizely sites for that purpose, using VisitorGroup (which is why it came under my radar).

For several sites that use it, it seems more often than not stuck in this stack

It’s essentially to think that CreateActivator is doing something heavy here (evidently with the LambdaCompiler.Compile part. A peek from decompiling actually shows that yes, it’s heavy. I’m not quite sure I can post the decompiled code here without violating any agreement (I did, in fact, accepted no agreement at this point), but it’s quite straightforward code: TypeActivatorCreator uses reflection to get the constructors of the Type passed to it, to sees if there is any constructor decorated with MaxMind.Db.Constructor attribute, then prepares some parameters, and creates an LambdaExpression that would create an instance of that Type, using found constructor (which is a good thing because a compiled expression would perform much better than just a reflection call).

(I’m using Mindmax.db 2.0.0, for the record)

The code is straightforward, but it is also slow – as any code which involves reflection and lambda compilation would be. The essential step would be to cache any result of this. This is actually a very good place to cache. The number of types are fixed during runtime (except for very edge cases where you dynamically create new types), so you won’t have to worry about cache invalidation. The cache would significantly improve the performance of above code.

And in TypeActivatorCreator there is a cache for it. It is a simple ConcurrentDictionary<Type, TypeActivator> , which would return an TypeActivator if the Type was requested before, or create one and cache it, it it hasn’t been. As I said, this is a very good place to add cache to this.

There is a cache for that, which is good. However, the very important tidbit here is that the dictionary is not static. That means, the cache only works, if the class is registered as Singleton (by itself, or by another class down the dependency chain), meaning, only one of the instance is created and shared between thread (which is why the ConcurrentDictionary part is important).

But except it’s not.

When I look at a memory dump that collected for a customer that is using Maxmind.db, this is what I got:

0:000> !dumpheap -stat -type TypeAcivatorCreator
Statistics:
MT Count TotalSize Class Name
00007ffa920f67e0 1 24 MaxMind.Db.TypeAcivatorCreator+<>c
00007ffa920f6500 147 3528 MaxMind.Db.TypeAcivatorCreator
Total 148 objects

So there were 147 instances of TypeAcivatorCreator. Note that this is only the number of existing instances. There might be other instances that were disposed and garbaged by CLR.

Now it’s clear why it has been performing bad. For supposedly every request, a new instance of TypeActivatorCreator is created, and therefore its internal cache is simply empty (it is just newly created, too). Therefore each of request will go through the expensive path of CreateActivator, and performance suffers.

The obvious fix here is to make the dictionary static, or making the TypeActivatorCreator class Singleton. I don’t have the full source code of Mindmax.Db to determine which is better, but I’d be leaning toward the former.

Moral of the story:

  • Caching is very, very important, especially when you are dealing with reflection and lambda compilation
  • You can get it right 99%, but the 1% left could still destroy performance.

Update:

I reached out to Maxmind.db regarding this issue on November 9th, 2021

About 6h later they replied with this

I was at first confused, then somewhat disappointed. It is a small thing to fix to improve overall performance, rather than relying on/expecting customers to do what you say in documentation. But well, let’s just say we have different opinions.

A curious case of memory dump diagnostic: How Stackify can cause troubles to your site

It’s been a while since my last blog post, and this time it will be fun time with windbg. One of DXC customers has been having problem with instances of their site hang and restart. We at Episerver takes the availability and performance of DXC sites very seriously, therefore I was asked to help diagnosing the problem. It’s something I’d like to do and to learn anyway, so game is on.

The golden standard for looking at the kind of those problems is still the tried and trusted Windbg. With the help from the Managed Services Engineer, I was able to access several memory dumps from the site. Most of them were taken when the site stalled, which is usually the most useful moment because it should contain the most critical piece of information why the stall happened.

Continue reading “A curious case of memory dump diagnostic: How Stackify can cause troubles to your site”

Useful T-SQL snippets for development and troubleshooting

This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting

SET STATISTICS IO ON

Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.

SET STATISTICS TIME ON

Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total

Continue reading “Useful T-SQL snippets for development and troubleshooting”

Exploring Large Object Heap with WinDBG

This is the second part of http://vimvq1987.com/debug-net-memory-dump-windbg-crash-course-part-1/, – which is far from complete. In this post, we will explore the Large Object Heap (LOH) of a .NET application with WinDBG

Why LOH? It’s a special heap contains the memory objects which are more than 85000 bytes in size – which, previously, never compacted (that was changed with .NET 4.5 when you have an option to compact LOH, but beware of the consequences).

If you know about the generation garbage collection in CLR, you already know that when an object is no longer used, its memory will be claimed back for later use. GC do more than that, by trying to compact the “free” memory – so it’ll move (via copy and delete) the survived objects next to each other, therefore you’ll be a big, continuous free memory. This helps improving performance in upcoming allocations, but only until a point. If the objects are too big, then the costs of moving the objects around might outweigh the benefits. Microsoft did a bunch of performance test and they decided that 85000 bytes is the break point, where the costs are bigger than the performance gain. Then we have LOH, where the memory is almost never compacted.

Continue reading “Exploring Large Object Heap with WinDBG”

Debug .NET memory dump with WinDBG – crash course. Part 1

If you ask me what had I been doing the last two weeks – then the answer is I was pulling my hairs.  A customer had a problem with their site as the memory hiked up after catalog imports and stayed there “forever” – and in the end it slowed the site down. I jumped in and almost regretted that decision – had to spent days  messing around with WinDBG and memory dumps. In the end – I found the problem and it was fixed. A lot of hairs were loss in progress, but I learned something about WinDBG – and that’s what I’m sharing today.

WinDBG is probably the most famous tool for debugging stuffs on Windows. Out of the box, it only works with native applications, aka assembly and such – but lucky for us, there are plenty of extensions to allow it to work with .NET application. The “standard” SOS and more advanced extension SOSEX. SOS is included in WinDBG, while you can download SOSEX from here (for 64 bit) or here (for 32 bit) . Download the zip file and extract the dll somewhere.

WinDBG comes with the Windows SDK, not the standard .NET framework, so you’ll probably need to install it separately from here

Continue reading “Debug .NET memory dump with WinDBG – crash course. Part 1”