A curious case of cookie threading issue – part 2

A while back, I wrote this post A curious case of cookie threading issue – Quan Mai’s blog (vimvq1987.com) as an example of how to use WinDbg to diagnose a race condition (with results in an infinite loop). Naturally that issue needs to be fixed. A starting point to fix a race condition is to put locks whenever you change the data. But it’s not that simple. Too much locking would definitely hurt your application performance, and increase the chance of you running into another nasty problem (namely, deadlock). So the art of locking is “lagom” – Swedish word for “just right”, not too much, not too little.

ReaderWriteLockSlim comes to rescue. You only need to lock when writing, not locking. So this for reading

            readerWriterLockSlim.EnterReadLock();
            try
            {
                return httpRequest.Cookies[CookieKey];
            }
            finally
            {
                readerWriterLockSlim.ExitReadLock();
            }

And this for writing

            readerWriterLockSlim.EnterWriteLock();
            try
            {
                httpRequest.Cookies.Remove(CookieKey);               
            }
            finally
            {
                readerWriterLockSlim.ExitWriteLock();
            }

Would be enough, right?

No. The issue runs deeper than that.

If we go back to the stacktrace of the issue

0:039> !clrstack
OS Thread Id: 0x1e34 (39)
        Child SP               IP Call Site
000000740273cff0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
000000740273d080 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
000000740273d0c0 00007ff9314de98d 

and look closer. Something is not right. If you do not spot the issue – it’s fine, I missed it too. The problem is that we have a HashSet.Remove inside a HttpCookieCollection.Get. Uh oh.

If we look at source code of HttpCookieCollection, inside Get, it calls to EnsureKeyValidated

if (cookie != null) {
                EnsureKeyValidated(name, cookie.Value);
            }

referencesource/HttpCookieCollection.cs at master · microsoft/referencesource · GitHub

which itself calls to

            _keysAwaitingValidation.Remove(key);

referencesource/HttpCookieCollection.cs at master · microsoft/referencesource · GitHub

_keysAwaitingValidation is a HashSet<string> . That explains the Remove we saw. And that explains why ReaderWriterLockSlim is not enough – changes are made within the supposedly read only action as well.

The only valid solution here is to lock both read and write actions on HttpCookieCollection. However, as HttpCookieCollection is per request, so our lock object should only be per request as well (we certainly do not want every thread to be locked when we get cookie on a request).

Moral of the story:

  • Look closer. There might always be something underneath. Close to the truth is still not the truth.
  • Never assume that a Get method is thread-safe. The implementation can do plenty of unexpected things under the hood.
  • You might ask why this happen, as HttpCookieCollection is per request. Turned out that there is a code that use ThreadPool.QueueUserWorkItem to queue tasks which share the same HttpContext object. As we learned, that’s the recipe for disaster. Think twice (or thrice) before sharing an object that is not thread-safe between threads.

Delete a content – directly from database

Before we even start, I would reiterate that manipulating data directly should be avoided unless absolutely necessary, it should be used as the last resort, and should be proceeded with cautions – always back up first and test your queries on development database first before running it in production. And if the situation dictates that you have to run the query, better do it with the 4 eyes principle – having a colleague double check it for you. When it comes to production database, nothing is too careful.

Now back to the question, if you absolutely have to delete a content, you should do like this

exec editDeletePage @pageId = 123, @ForceDelete = 1

It is basically what Content Clouds (i.e. CMS) does under the hood, without the cache validation on the application layer of course.

So the moral of the story – do everything with API if you can. If you absolutely have to, use the built-in stored procedures – they are tested vigorously and should have minimal issues/bugs, and should take care of everything, data-wise for you. Only write your own query if there is no SP that can be used.

Update: Initially I mentioned Tomas’ post in this, and that gave impression his way is incorrect. I should have written better. My apologies to Tomas

A curious case of cookie threading issue

Threading is hard. It’s hard to get right. It’s hard to avoid race condition. Even with experienced developers, it’s not always a given (trust me, I’ve been there).

This time, the problem comes from a report that a customer constantly has high CPU situation, on all instances. Memory dumps were taken and I was able to take a look. As always, high CPU can be result of several causes, most likely thread deadlocks. For educational purposes, let’s take this memory dump step by step.

First steps are as with routine Windbg – open it. use Ctrl + D to start debugging a memory dump, then you would need to run .loadby sos clr to load the clr runtime

If you are debugging a memory dump that is captured on Azure, .loadby sos clr will not work with this error (if your Windows is installed to C:\ drive)

0:000> .loadby sos clr
The call to LoadLibrary(D:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll) failed, Win32 error 0n126
    "The specified module could not be found."
Please check your debugger configuration and/or network access.

Simply fix it by copy the path to sos.dll, replacing D with C, and rerun

.load C:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll

That should fix it.

Next step would be checking the CPU situation by .threadpool command

0:000> !threadpool
CPU utilization: 98%
Worker Thread: Total: 160 Running: 6 Idle: 134 MaxLimit: 32767 MinLimit: 140
Work Request in Queue: 0
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 6 Free: 4 MaxFree: 8 CurrentLimit: 6 MaxLimit: 1000 MinLimit: 4

This confirms a high CPU situation. It’s however worth noting this CPU is for the entire instance, not necessarily for the process (w3wp) only, but it’s highly likely for an Azure App Service instance that the w3wp is the problem here.

The next step would be checking if there is any long running thread

0:000> !runaway
 User Mode Time
  Thread       Time
   39:1e34     2 days 11:44:59.546
   40:219c     2 days 8:08:56.765
   41:2198     1 days 4:53:46.687
   17:1480     0 days 0:19:38.171
   20:1334     0 days 0:14:57.718
   18:1798     0 days 0:12:53.625
   19:12f8     0 days 0:12:05.015
   31:a9c      0 days 0:03:39.093
   27:1494     0 days 0:01:35.406
   36:1e14     0 days 0:01:28.890
    5:1124     0 days 0:00:38.578
    6:1560     0 days 0:00:37.218
    3:ff8      0 days 0:00:34.984

Normally, the first few threads in the list are not that interesting, they are usually just timers that run from the application start up. But let’s leave no stone unturned, shall we? Let’s switch to the longest running thread

0:000> ~39s
00007ff9`297d9f59 458bf7          mov     r14d,r15d

Hmm, it does not look like a listener. Let’s see what it actually is with !clrstack

0:039> !clrstack
OS Thread Id: 0x1e34 (39)
        Child SP               IP Call Site
000000740273cff0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
000000740273d080 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
000000740273d0c0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
000000740273d100 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
000000740273d140 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
000000740273d1f0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)
000000740273d2a0 00007ff93153b492 Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder.GetProductPriceViewModel(System.String, Abck.Web.Features.User.UserContext)

That looks interesting. If we move to the second longest running thread, it looks oddly familiar

0:040> !clrstack
OS Thread Id: 0x219c (40)
        Child SP               IP Call Site
00000074be23d060 00007ff926c82f2e System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Contains(System.__Canon)
00000074be23d0d0 00007ff9307aebcd System.Web.HttpCookieCollection.EnsureKeyValidated(System.String, System.String)
00000074be23d110 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000074be23d150 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000074be23d190 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000074be23d1d0 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()

And the third longest running thread

0:041> !clrstack
OS Thread Id: 0x2198 (41)
        Child SP               IP Call Site
00000078597bd8d0 00007ff9297d9f59 System.Collections.Generic.HashSet`1[[System.__Canon, mscorlib]].Remove(System.__Canon)
00000078597bd960 00007ff92a8eb4e3 System.Web.HttpCookieCollection.Get(System.String)
00000078597bd9a0 00007ff9314de98d Abck.Web.Features.User.UserContext.get_CountryCurrency()
00000078597bd9e0 00007ff93154367e Abck.Web.Features.Cart.Services.CountryCurrencyService.GetCountrySettingThreeAlpha(Abck.Web.Features.User.UserContext)
00000078597bda20 00007ff93153b6dd Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29.MoveNext()
00000078597bdad0 00007ff93153b54a System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[Abck.Web.Features.Product.ViewModels.Builders.CoreProductViewModelBuilder+d__29, Abck.Web]](d__29 ByRef)

Any guess on what happened?

It looks like what we have a clear case of infinite loop. There were multiple threads trying to get the same cookie, and under the neath, the HttpCookieCollection.Get is not threadsafe, it tries to access an underlying HashSet<T> without properly locks. While one thread tried to remove items from the hashset, another tried to read it, and messed up the internal hash table. All three threads are then doomed in infinite loops, keep running and use up all CPU resources.

The fix in this case would be as simple as a lock around the code to get the CountryCurrency, but it’s best to be careful. Again, when fixing the problem of lack of lock, it’s easy to run into the problem of too much of lock.

Storing 100.000 prices per SKU – part 1

One of the questions I have received, from time to time, is that how to store a lot of prices per SKU in Optimizely (B2C) Commerce Cloud. While this is usually a perfect candidate for Optimizely B2B Commerce, there are many customers invested in B2C and want to make the best out of it. Is it possible?

It’s important to understand the pricing system of Optimizely Commerce (which is, written in detail in my book – shameless plug). But in short:

  • There are two price systems, IPriceService and IPriceDetailService
  • One is handling prices in batch – i.e. prices per SKU (IPriceService), and one is handling prices per individual price (IPriceDetailService)
  • Both are cached in latest version (cache for IPriceDetailService was added in late 13.x version)

With that in mind, it would be very problematic if you use IPriceService for such high number of prices per SKU, because each time you save a price, you save a lot of prices at once (same as loading prices). This is how the default IPriceService implementation saves prices of a SKU

create procedure dbo.ecf_Pricing_SetCatalogEntryPrices
    @CatalogKeys udttCatalogKey readonly,
    @PriceValues udttCatalogEntryPrice readonly
as
begin
    begin try
        declare @initialTranCount int = @@TRANCOUNT
        if @initialTranCount = 0 begin transaction

        delete pv
        from @CatalogKeys ck
        join dbo.PriceGroup pg on ck.CatalogEntryCode = pg.CatalogEntryCode
        join dbo.PriceValue pv on pg.PriceGroupId = pv.PriceGroupId

        merge into dbo.PriceGroup tgt
        using (select distinct CatalogEntryCode, MarketId, CurrencyCode, PriceTypeId, PriceCode from @PriceValues) src
        on (    tgt.CatalogEntryCode = src.CatalogEntryCode
            and tgt.MarketId = src.MarketId
            and tgt.CurrencyCode = src.CurrencyCode
            and tgt.PriceTypeId = src.PriceTypeId
            and tgt.PriceCode = src.PriceCode)
        when matched then update set Modified = GETUTCDATE()
        when not matched then insert (Created, Modified, CatalogEntryCode, MarketId, CurrencyCode, PriceTypeId, PriceCode)
            values (GETUTCDATE(), GETUTCDATE(), src.CatalogEntryCode, src.MarketId, src.CurrencyCode, src.PriceTypeId, src.PriceCode);

        insert into dbo.PriceValue (PriceGroupId, ValidFrom, ValidUntil, MinQuantity, MaxQuantity, UnitPrice)
        select pg.PriceGroupId, src.ValidFrom, src.ValidUntil, src.MinQuantity, src.MaxQuantity, src.UnitPrice
        from @PriceValues src
        left outer join PriceGroup pg
            on  src.CatalogEntryCode = pg.CatalogEntryCode
            and src.MarketId = pg.MarketId
            and src.CurrencyCode = pg.CurrencyCode
            and src.PriceTypeId = pg.PriceTypeId
            and src.PriceCode = pg.PriceCode

        delete tgt
        from dbo.PriceGroup tgt
        join @CatalogKeys ck on tgt.CatalogEntryCode = ck.CatalogEntryCode
        left join dbo.PriceValue pv on pv.PriceGroupId = tgt.PriceGroupId
        where pv.PriceGroupId is null

        if @initialTranCount = 0 commit transaction
    end try
    begin catch
        declare @msg nvarchar(4000), @severity int, @state int
        select @msg = ERROR_MESSAGE(), @severity = ERROR_SEVERITY(), @state = ERROR_STATE()
        if @initialTranCount = 0 rollback transaction
        raiserror(@msg, @severity, @state)
    end catch
end

If you have experience with SQL (which you probably should), you will see that it’s a deletion of rows in PriceValue that have CatalogEntryCode same as , then a merge, then a deletion of left over rows. To make matters worse, IPriceService system stores data on 3 tables: PriceValue, PriceGroup and PriceType. Imagine doing that with a few dozen of thousands rows.

Even if you change just one price, all prices of that specific SKU will be touched. It’d be fine if you have like ten prices, but if you have ten thousands prices, it’ll be a huge waste.

Not just that. To save one price, you would still need to load all prices of that specific SKU. That’s two layers of waste: the read operations at database layer, and then on application, a lot of price objects will need to be constructed, and then you need to recreate a datatable to send all the data back to the database to do the expensive operation above.

And wait, because the prices saved to IPriceService needs to be synchronized to IPriceDetailService (however, you can disable this). Prices that were changed (which is, all of them) need to be replicated to another table.

So in short, IPriceService was not designed to handle many prices per SKU. If you have less than a few hundred prices per SKU (on average), it’s fine. But if you have more than 1000 prices per SKU, it’s time to look at other options.

Fix Crunchyroll app crash on playing Android

For quite a while, my Crunchyroll app on my OnePlus 8 Pro just crashes. It opens OK, but everytime I click play on everything, it crashes. Freezes for a few seconds, and a pop up appears for me to force close it.

I have tried:

  • Remove and reinstall
  • Remove app cache
  • Uninstall/Reinstall the Android Webview

None helped. Meanwhile, Crunchyroll is working fine on my iPhone 11, so it’s not their service. There’s also no update that could have triggered it.

Then I noticed my HBO Max app started doing the same. So it might have something to do with my phone. When I was almost ready to give up, I decided to try the final option – wipe the phone cache partition (I would not go as far as factory reset it, too much work!)

It worked!

This is the steps for my phone, but yours should be the same:Turn off your device.

Press and hold the Volume down and Power button at the same time.

When your phone logo appears, release the Power button, but continue holding the Volume down button.

At the Password screen, enter the Password or PIN number that you use to unlock your device. When done, tap Ok.

Tap English.

Tap Wipe data and cache.

Tap Wipe cache.

Tap Yes to continue.

Tap Reboot to restart the phone.

How to get the thumbnail preview for Neptune 2(s)

This is a feature that is only available for Cura. To make it easier to select which file to print on Elegoo Neptune 2 (and 2s), you can save your gcode files in form of TFT format, so the slicer inserts a thumbnail to the gcode, and your printer can display it.

Open the Marketplace by the button on the top right of cura

which would allow you to find Mks wifi plugin

Accept the license to install this plugin, then restart cura for it to take effect. Then you will need to activate it by selecting Menu => Settings => Printer => Manage Printer

Then select MKS Wifi Plugin to activate

Switch to Preview settings to turn on preview

If you are using Elegoo cura, they bundled Mks wifi plugin by default. but there is virtually no reason to use Elegoo cura. It’s based on Cura 4.8 which is very outdated (released on November 2020). The only reason you should install Elegoo Cura is that you can copy the start and end code and settings for your Neptune (it’s still not support by Cura), and that’s it.

Go download the latest version of Cura at Ultimaker Cura: Powerful, easy-to-use 3D printing software.

Another alternative, even simpler, and without MKS plugin is to use the post processing script. Menu => Extensions => Post Processing => Modify G-Code

Choose Add a script then select Create Thumbnail. By default, the thumbnail size is 32×32 which is way too small. I select 128×128 instead.

Now you will have a small icon next to Slice button. Clicking on it will open the Post Processing Plugin window. Note that you can see how many scripts you added (For me it’s only 1)

Slice as usual and copy your gcode files to microsd. Next time you select something to print, you will be able to see the preview of it

Left is sliced with Cura, right is sliced with Super Slicer

RedirectToAction is dead, long live RedirectToContent

In .NET 4.8/CMS 11.x and earlier, this is very commonly used to redirect an action

return RedirectToAction("Index", new{ node = contentLink });

Which will redirect the user to

public TResult Index(CheckoutPage currentPage );

and you will get the currentPage parameter set with content you specified by the contentLink.

However in .NET 5 once redirected currentPage will be null. It’s due to how .NET 5 handle the routing. The correct way is use this

return RedirectToContent(currentPage.ContentLink, "Index");

There is a action you can use – RedirectToContent. Note that the order of parameter is reserved – you pass in the content link to the content first, then the name of the action.

And that’s how’s it done in .NET 5/CMS 12.

Where to store big collection data

No, I do not mean that big, big data (in size of terabytes or more). It’s big collection, like when you have a List<string> and it has more than a few hundreds of items. Where to store it?

Naturally, you would want to store that data as a property of a content. it’s convenient and it just works, so you definitely can. But the actual question is: should you?

It’s as simple as this

public virtual IList<String> MyBigProperty {get;set;}

But under the hood, it’s more than just … that. Let’s ignore UI for a moment (rendering such long list is a bad UX no matters how you look at it, but you can simply ignore rendering that property by appropriate attributes), and focus on the backend aspects of it.

List<T> properties are serialized as a long strings, and save at once to database. If you have a big property in your content, this will happen every time you load your content:

  • The data must be read from database, then transferred through the network
  • The data must be parsed to create an array (the underlying data structure of List<T>. The original string is tossed away.
  • Now you have a big array that you might not use every time. it’s just there taking your previous LOH (as with the original string)

Same thing happens when you actually save that property

  • The data must be serialized as string, the List<T> is now tossed away
  • The data then must be transferred through the network
  • The data then saved to database. Even though it is a very long string and you changed, maybe 10 characters, it’s completely rewritten. Due to its size, there might be multiple page writes needed.

As you can see, it can create a lot of waste, especially if you rarely use that property. To make the matter worse, due to the size of the property, it means they are taking up space in LOH (large objects heap).

And imagine if you have such properties in each and every of your content. The waste is multiplied, and your site is now at risk of some frequent Gen 2 Garbage collection. Nobody likes visiting a website that freezes (if not crashes) once every 30 minutes.

Then when to store such big collection data?

The obvious answer is … somewhere else. Without other inputs, it’s hard to give you some concrete suggestions, but how’s about a normalized custom table? You have the key as the content reference, and the other column is each value of the list. Just an idea. Then you only load the data when you absolutely need it. More work, yes, but it’s the better way to do it.

Just a reminder that whatever you do, just stay away from DDS – Dynamic Data Store. It’s the worst option of all. Just, don’t 🙂

Tales of arise – super quick review

Tales of Arise Characters and Party Members

The good

The character designs are pretty good. With a lot of JPRG character designs are basically fan service, to a point it’s even ridiculous to look at (Looking at you, Xenoblade Chronicles 2), the characters from Tales of Arise are pretty great (saved some small parts, like, um, uh, the back of Kisara).

The game looks good – it is not breaking any record, but gorgeous in its own rights.

The bad

Tales has always been a budget series, and while Tales of arise might have a bigger budget than previous entries, it is still not an AAA production. And that shows in places. You see pretty repetitive enemies from place to place. A different color and a new name, and you’re done. This also clear that the game has more loading screens than it should, even with the current gen (Xbox Series X and PS5) versions. This is somewhat understandable as they share same design with previous gen consoles, but it was quite sad to see the limitations.

The voice acting is a hit or miss, i.e. inconsistent. Some actors sounds great and convincing, some, not so much.

Repetitive enemies designs. New area = almost same enemies with different colors.

Performance on boss battles – especially after Dohalim suffer badly. This was not covered by Digital Foundry analysis here, but trust me, I’ve seen it with my own eyes (And suffers from it)

The Ugly

It is one of the worst, if not the worst game when it comes to stability. And I played Witcher 3: The Wild Hunt at release on PS4, and XCOM 2 both on PS4 and Xbox Series X, that says something about it. I played (or rather, am playing) the game on Xbox Series X, and it crashes in almost every session. It does not work with Quick Resume (i.e. it should be disabled).

Mini annoyances

There are upgraded versions of equipment, but if a character is equipping that base version, it can’t be upgraded! You have to unequip that then craft the upgrade. Why?

Getting being delete variant codes

I got the question here Content Events and Service API | Optimizely Developer Community , and CHKN contacted Optimizely developer support service which is the good/right thing to do. However it could be beneficial for a wider audience, hence this blogpost.

If you are developing new Optimizely Commerce Cloud now, you should be using the latest version (CMS 12/ Commerce 14), or at least Commerce 13.31 if you have to stay with .NET 4.8. You then could use the IContentEvents to listen to any events that might be fired from Service API.

However, if you are using older versions, you might be limited to lower level, non content event through EventContext. It works, but with a catch: there is no EntryDeleting event in EventContext. At this point I’m not entirely sure why, probably just an overlook. But it’s not impossible to work around that issue.

As I suggested in the post, EntryUpdating is like a blanket event – every change to the entries goes through there. The sender is a CatalogEntryDto, which should contain information about the being deleted entries.

        private void Instance_EntryUpdating(object sender, Mediachase.Commerce.Catalog.Events.EntryEventArgs e)
        {
            var dto = sender as CatalogEntryDto;
            if (dto != null)
            {
                var rows = dto.CatalogEntry.Where(x => x.RowState == DataRowState.Deleted);
                var deletingCodes = rows.Select(x => (string)x["Code", DataRowVersion.Original]);
                //do stuffs with codes being deleted.
            }
        }

And then to listen to the event in one of your IInitializationModule

EventContext.Instance.EntryUpdating += Instance_EntryUpdating;

However, there is a caveat here: as EntryUpdating happens before the entry deletion, it is possible that the change did not go through (i.e. the changes were to be reverted). It is unlikely, but it’s a possibility. You might accept that, or you can:

  • Store the id and code in a “Deleting entry” dictionary
  • Listen to EntryDeleted event and match from DeletedEntryEventArgs parameter (which contains EntryId ) to get the deleted Code, and continue from there.