Fatal flaw with geta-notfoundhandler

First of all, I’d like to make this a farewell post to this blog. The VM that is hosting this blog – which is generously sponsored by my employer, is being decommissioned. I have looked at some alternatives to move my blog to, but none is interesting in term of time/effort/cost. With all the things going on in my life, setting up a new blog is not of priority (that’d be, surprise, surprise, tomatoes these days). We had a good run, and I hope this blog has been useful to you, one way for another. And I hope to see you again, some days.

Back to business. The topic of today is a flaw that quite fatal of geta-notfoundhandler, that is opened source at GitHub – Geta/geta-notfoundhandler: The popular NotFound handler for ASP.NET Core and Optimizely, enabling better control over your 404 page in addition to allowing redirects for old URLs that no longer works.. We have cases when a customer’s instance hit very high CPU, essentially hanging, and the only course of action is to restart the instance. Memory dumps taken at the time all point out to the notfoundhandler, specifically, CustomRedirectCollection.This issue has been brought to me by a colleague a couple of months ago. As he treated it as lowkey, we took a quick look into it, had some good ideas, but we never really got to the bottom of it. I let it slip because it was not critically urgent/important (and you know that is when something will not be addressed).

Another colleague brought up it again recently, and while I was suffering a bad headache from a prolonged flu, I decided to get this over with. Before jumping to the conclusion and the fix, let’s start with the symptoms and analysis.

As mentioned above, in the memory dumps taken, we have seen one or two threads stuck in this stacktrace

00007E69D57E67F0 00007e6c3bded5c3 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.CreateSubSegmentRedirect(System.ReadOnlySpan`1, System.ReadOnlySpan`1, Geta.NotFoundHandler.Core.Redirects.CustomRedirect, System.ReadOnlySpan`1)
00007E69D57E6880 00007e6c3a47ea43 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.FindInternal(System.String)
00007E69D57E6920 00007e6c3c1e2be5 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.Find(System.Uri)
00007E69D57E6950 00007e6c3c1e2a19 Geta.NotFoundHandler.Core.RequestHandler.HandleRequest(System.Uri, System.Uri, Geta.NotFoundHandler.Core.Redirects.CustomRedirect ByRef)
00007E69D57E6990 00007e6c3bd326f7 Geta.NotFoundHandler.Core.RequestHandler.Handle(Microsoft.AspNetCore.Http.HttpContext)
00007E69D57E69E0 00007e6c3bc9f9ba Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware+d__2.MoveNext()
00007E69D57E6A40 00007e6c3bce9418 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware+d__2, Geta.NotFoundHandler]](d__2 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E69D57E6A90 00007e6c3bce9360 Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware.InvokeAsync(Microsoft.AspNetCore.Http.HttpContext, Geta.NotFoundHandler.Core.RequestHandler)

if there are a lot of threads stuck in same stacktrace, you can think about lock contention – i.e. a lot of threads just waiting for some lock to be released. But if there are only a few threads in a case of high CPU, that suggests an endless loop, some do while or while code that never exits properly.

One quite infamous case of the endless loop is that when you mess with Dictionary in a concurrent scenario – adding to it while other threads are reading. That’s when your foreach will never end.

But where?

Luckily for us the library is open source so it’s much easier to try some dry code reading and guess where the problem is. And of course we found a while loop

        // Note: Guard against infinite buildup of redirects
        while (appendSegment.UrlPathMatch(oldPath))
        {
            appendSegment = appendSegment[oldPath.Length..];
        }

The only remaining question is what value of oldPath would trigger that endless loop. Lazy as I am, I turned to CoPilot to have it analyze the code to see what values that can be potentially the culprit. Blah blah, CoPilot says something like /abc/abc/abc/ could. But it does not. (We are still a long way from losing our jobs to AI, folks)

That’s the stop of my first engagement. I almost forgot about it, until another colleague asked me the same question, this time more serious. I was not very productive anyway due to headache, so let’s fix this problem once and for all.

So with a new memory dump I dived again into the problem. After identifying the offendeing thread, let’s run !clrstack -p to see if we can figure out the url that triggers the issue. This really raised my eyebrow

0:026> !do 0x00007e69dcd8bb60
Name:        Geta.NotFoundHandler.Core.Redirects.CustomRedirect
MethodTable: 00007e6c35be3ff8
EEClass:     00007e6c35bd30f0
Tracked Type: false
Size:        72(0x48) bytes
File:        /app/Geta.NotFoundHandler.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007e6c2ff0bac0  4000059       20       System.Boolean  1 instance                0 <WildCardSkipAppend>k__BackingField
00007e6c2ffbd2e0  400005a        8        System.String  0 instance 00007e6ad7fff3a0 _oldUrl
00007e6c2ffbd2e0  400005b       10        System.String  0 instance 00007e6ad7fff3a0 <NewUrl>k__BackingField
00007e6c2ffa9018  400005c       18         System.Int32  1 instance                0 <State>k__BackingField
00007e6c35bcf8f0  400005d       1c         System.Int32  1 instance              302 <RedirectType>k__BackingField
00007e6c3372f578  400005e       28 ...Private.CoreLib]]  1 instance 00007e69dcd8bb88 <Id>k__BackingField
0:026> !DumpObj /d 00007e6ad7fff3a0
Name:        System.String
MethodTable: 00007e6c2ffbd2e0
EEClass:     00007e6c2ff97b10
Tracked Type: false
Size:        22(0x16) bytes
File:        /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.36/System.Private.CoreLib.dll
String:      
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007e6c2ffa9018  40002ba        8         System.Int32  1 instance                0 _stringLength
00007e6c2ff0e5a8  40002bb        c          System.Char  1 instance                0 _firstChar
00007e6c2ffbd2e0  40002b9       d0        System.String  0   static 00007e6ad7fff3a0 Empty

The oldUrl is empty here. And to my surprise, passing an empty value to while loop above, really tricks it to never exit. A simple program to demonstrate the issue

var appendSegment = ReadOnlySpan<char>.Empty; 
var oldPath = "".AsSpan();
while (appendSegment.UrlPathMatch(oldPath))
{
    appendSegment = appendSegment[oldPath.Length..];
}

internal static class SpanExtensions
{
    public static ReadOnlySpan<char> RemoveTrailingSlash(this ReadOnlySpan<char> chars)
    {
        if (chars.EndsWith("/"))
            return chars[..^1];

        return chars;
    }

    public static bool UrlPathMatch(this ReadOnlySpan<char> path, ReadOnlySpan<char> otherPath)
    {
        otherPath = RemoveTrailingSlash(otherPath);

        if (path.Length < otherPath.Length)
            return false;

        for (var i = 0; i < otherPath.Length; i++)
        {
            var currentChar = char.ToLowerInvariant(path[i]);
            var otherChar = char.ToLowerInvariant(otherPath[i]);

            if (!currentChar.Equals(otherChar))
                return false;
        }

        if (path.Length == otherPath.Length)
            return true;

        return path[otherPath.Length] == '/';
    }
}

The problem is that there was a safeguard for null value, but not Empty. So, to trigger the issue, you will need to have 1. a custom redirect rule that has empty oldUrl, 2. a url that does not match any other rules defined.

Once you reached the empty rule, it’s the death sentence. The while loop never exits and it will eat all CPU resource until the instance is restart.

The fix in this case is to check the oldUrl for both null and Empty. If you are using the package in your project, this is the change you need

geta-notfoundhandler/src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs at master · quanmaiepi/geta-notfoundhandler

Maybe someone can take this and contribute to the main repo for everyone, when I’m busy attending my tomatoes

Shorten your cache keys, please

In a recent customer engagement, I have looked into a customer where their memory usage is abnormally high. Among other findings, I think one was not very well known. But as they say, small details can make a big difference – and that small detail in today’s post is the cache key.

Let’s put it to test. This is what I asked Copilot to scaffold it, and then just some small adjustments. The test is to add 10.000 items to cache, and then read each entry 10 times. One test with a very short prefix, and one with a long (but not very long) one.

using System;
using System.Runtime.Caching;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace MemoryCacheBenchmarkDemo
{
    // Use MemoryDiagnoser to capture memory allocation metrics during benchmarking.
    [MemoryDiagnoser]
    public class MemoryCacheBenchmark
    {
        private const int Iterations = 10_000;
        private const int ReadIterations = 10;

        [Benchmark(Description = "MemoryCache with Short Keys")]
        public void ShortKeysBenchmark()
        {
            using (var cache = new MemoryCache("ShortKeysCache"))
            {
                const string Prefix = "K";
                // Insertion phase using short keys (e.g., "K0", "K1", ...)
                for (int i = 0; i < Iterations; i++)
                {
                    string key = Prefix + i;
                    cache.Add(key, i, DateTimeOffset.UtcNow.AddMinutes(5));
                }

                // Retrieval phase for short keys.
                
                for (int j = 0; j < Iterations; j++)
                {
                    int sum = 0;
                    for (int i = 0; i < ReadIterations; i++)
                    {
                        string key = Prefix + i;
                        if (cache.Get(key) is int value)
                        {
                            sum += value;
                        }
                    }
                    // Use the result to prevent dead code elimination.
                    if (sum == 0)
                    {
                        throw new Exception("Unexpected sum for short keys.");
                    }
                }
            }
        }

        [Benchmark(Description = "MemoryCache with Long Keys")]
        public void LongKeysBenchmark()
        {
            using (var cache = new MemoryCache("LongKeysCache"))
            {
                const string Prefix = "ThisIsAVeryLongCacheKeyPrefix_WhichAddsExtraCharacters_IsThisLongEnoughIAmNotSure";
                // Insertion phase using long keys.
                // Example: "ThisIsAVeryLongCacheKeyPrefix_WhichAddsExtraCharacters_0", etc.
                for (int i = 0; i < Iterations; i++)
                {
                    string key = Prefix + i;
                    cache.Add(key, i, DateTimeOffset.UtcNow.AddMinutes(5));
                }

                // Retrieval phase for long keys.
                for (int j = 0; j < Iterations; j++)
                {
                    int sum = 0;
                    for (int i = 0; i < ReadIterations; i++)
                    {
                        string key = Prefix + i;
                        if (cache.Get(key) is int value)
                        {
                            sum += value;
                        }
                    }
                    // Use the result to prevent dead code elimination.
                    if (sum == 0)
                    {
                        throw new Exception("Unexpected sum for short keys.");
                    }
                }
            }
        }
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            // Executes all benchmarks in MemoryCacheBenchmark.
            BenchmarkRunner.Run<MemoryCacheBenchmark>();
        }
    }
}

And the difference. Not only the short key is faster, you end up with considerably less allocations (and therefore, Garbage collection)

The only requirement for cache key is that it’s unique. It might make sense to code it in a way that if you have to ever look into memory dumps (I wish you never have to, but it’s painful yet fun experience) – you know what cache entry is this. For example in Commerce we have this prefix for order cache key:

EP:EC:OS:

And that’s it. EP is short hand for EPiServer (so we know it’s us), EC is for eCommerce, and OS is for Order System. (I know, it’s been that way for a very long time for historical reasons, and nobody bothers to change it)

So next time you adding some cache to your class, make sure to use the shortest cache key as possible. It’s not micro optimization. If you know it’s better, why not?

Be careful with your (order) notes

This happened a quite ago but only now I have had time to write about it.

Once upon a time, I was asked to look into a customer database (in a big engagement of helping them improving performance overall)

One thing stand out is this query

WITH CTE AS
(SELECT * FROM dbo.OrderGroupNote 
WHERE OrderGroupId = @OrderGroupId)
MERGE CTE AS T
USING @OrderGroupNotes AS S
ON T.OrderNoteId = S.OrderNoteId
WHEN NOT MATCHED BY TARGET
	THEN INSERT (
		[OrderGroupId],
		[CustomerId],
		[Title],
		[Type],
		[Detail],
		[Created],
		[LineItemId],
		[Channel],
		[EventType])
	VALUES(S.OrderGroupId,
		S.CustomerId,
		S.Title,
		S.Type,
		S.Detail,
		S.Created,
		S.LineItemId,
		S.Channel,
		S.EventType)
WHEN NOT MATCHED BY SOURCE
	THEN DELETE
WHEN MATCHED AND (S.IsModified = 1) THEN 
UPDATE SET
	[OrderGroupId] = S.OrderGroupId,
	[CustomerId] = S.CustomerId,
	[Title] = S.Title,
	[Type] = S.Type,
	[Detail] = S.Detail,
	[Created] = S.Created,
	[LineItemId] = S.LineItemId,
	[Channel] = S.Channel,
	[EventType] = S.EventType;

If you can guess, that is the query to save the notes of an order. Normally it’s … fine. But for this customer, it is not, each save could result in almost 10GB, yes, you read it right, ten gigabytes of logical reads. Insane

The reason was, this customer has some orders with an absurd number of notes attached to it. The most one has 52k notes. And there are, in total, 94 orders with more than 1000 notes.

Upon investigation, they have a job to validate payment of invalid orders, which runs every 10 minutes. If the validation failed, a note will be added to the order. But because of no “limit” or “cut off”, that’s kept going for forever and continuing to add notes to orders. Each time, the operation becomes more expensive.

As a side note, this is note only expensive on the saving (to the database). It’s expensive to load from database, and it’s expensive to create all the objects in the memory.

The fix in this case is obviously to trim old notes, and to make sure that if the validation failed for X times, stop processing further.

But you might ask, could we do better. could we not save the entire order notes collection just because one note is added? That’s a good question. A really good one. I took a shot at that, but it’s … complicated. This is where we are held back by our promises of keeping thing backward compatible. When we try to make it better – you can do it better yourself as well. Make sure you do not have orders with an unusually high amount of notes.

SELECT 
    ordergroupid, 
    COUNT(OrderNoteId) AS notecount
FROM 
    OrderGroupNote
GROUP BY 
    ordergroupid
ORDER BY 
    notecount DESC;

If the most you have is less than 10, all good, less than 20 is fine. More than that and you might want to check why those orders have that many notes.

How to: create Decimal metafield with custom precision

If you are using catalog system, the way of creating metafields are easy – in fact, you can forget about “metafields”, all you should be using is the typed content type. Adding this attribute to your property is enough to set the precision as you wish.

        [DecimalSettings(10, 2)]
        public virtual Decimal MyAttributesDecimal { get; set; }

Thing is a little different if you are using order system. You don’t have the strongly typed order types to work with. To automate the metafield creation, you will have to use the underlying MetaField API yourself. You probably know how to create a metafield and add it to a desirable metaclass

            var metaField = MetaField.Create(MetaDataContext.Instance, "Mediachase.Commerce.Orders.System.Shipment", "NewMetaField3", "new", "temp",
                MetaDataType.Decimal, 17, true, true, false, false);
            var metaClass = MetaClass.Load(MetaDataContext.Instance, "ShipmentEx");
            metaClass.AddField(metaField);

However, metafield created this way will be added to the metaclass with the default (18, 0) precision, which is kind of pointless. How to control the precision of the decimal metafields?

The little secret is with the MetaField.Attributes. There are two attributes that control the precision of decimal type: MdpPrecision, and MdpScale. You have to set those after the metafield is created, but before it’s added to the metaclass (the reason was simple: the underlying query to add the column to the table looks for those values to set the precision of the column). Your code should look like this

var metaField = MetaField.Create(MetaDataContext.Instance, "Mediachase.Commerce.Orders.System.Shipment", "NewMetaField5", "new", "temp",
    MetaDataType.Decimal, 17, true, true, false, false);
metaField.Attributes["MdpPrecision"] = "38";
metaField.Attributes["MdpScale"] = "9";
var metaClass = MetaClass.Load(MetaDataContext.Instance, "ShipmentEx");
metaClass.AddField(metaField);

A small but important note to remember: both of the attributes must be set, even if you want to leave one to default value.

And Tada!

AsyncHelper can be considered harmful

.NET developers have been in the transition to move from synchronous APIs to asynchronous API. That was boosted a lot by await/async keyword of C# 5.0, but we are now in a dangerous middle ground: there are as many synchronous APIs as there are async ones. The mix of them requires the ability to call async APIs from a synchronous context, and vice versa. Calling synchronous APIs from an async context is simple – you can fire up a task and let it does the work. Calling async APIs from a sync context is much more complicated. And that is where AsyncHelper comes to the play.

AsyncHelper is a common thing used to run async code in a synchronous context. It is simple helper class with two methods to run async APIs

        public static TResult RunSync<TResult>(Func<Task<TResult>> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            return _myTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

        public static void RunSync(Func<Task> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            _myTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

There are slight variants of it, with and without setting the CurrentCulture and CurrentUICulture, but the main part is still spawning a new Task to run the async task, then blocks and gets the result using Unwrap().GetAwaiter().GetResult();

One of the reason it was so popular was people think it was written by Microsoft so it must be safe to use, but it is actually not true: the class is introduced as an internal class by AspNetIdentity AspNetIdentity/src/Microsoft.AspNet.Identity.Core/AsyncHelper.cs at main · aspnet/AspNetIdentity (github.com) .That means Microsoft teams can use it when they think it’s the right choice to do, it’s not the default recommendation to run async tasks in a synchronous context.

Unfortunately I’ve seen a fair share of threads stuck in AsyncHelper.RunSync stacktrace, likely have fallen victims of a deadlock situation.

    756A477F9790	    75ABD117CF16	[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)
    756A477F98C0	    75AB62F11BF9	System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
    756A477F9970	    75AB671E0529	System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
    756A477F99D0	    75AB671E0060	System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
    756A477F9A40	    75AB676068B8	System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task, System.Threading.Tasks.ConfigureAwaitOptions)
    756A477F9A60	    75AB661E4FE7	System.Runtime.CompilerServices.TaskAwaiter`1[[System.__Canon, System.Private.CoreLib]].GetResult()

An further explanation of why this is bad can be read here

c# – Is Task.Result the same as .GetAwaiter.GetResult()? – Stack Overflow

Async/sync is a complex topic and even experienced developers make mistake. There is no simple way to just run async code in a sync context. AsyncHelper is absolutely not. It is simple, convenient way, but does not guarantee to be correct thing in your use case. I see it as a shortcut to solve some problems but create bigger ones down the path.

Just because you can. doesn’t mean you should. That applies to AsyncHelper perfectly

The search for dictionary key

Recently I helped to chase down a ghost (and you might be surprised to know that I, for most part, spend hours to be a ghostbuster, it could be fun, sometimes). A customer reported a weird issue when a visitor goes to their website, have every thing correct in the cart, including the discount, only to have the discount disappeared when they check out. That would be a fairly easy task to debug and fix if not for the problem is random in nature. It might happen once in a while, but on average, daily. It could not be reproduced locally, or reproduced consistently on production, so all fix is based on guess work.

After a lot of dry code reading and then log reading, it turned out that it seems the problem with missing discount was problem with the missing cache. Once in a while, the cache that contains the promotion list is returned empty, resulting that no discount is applied to the order.

But why?

After a few guesses, it eventually came to me that the problem is with the caching using Dictionary, more specifically, campaigns are loaded and cached using a Dictionary, using IMarket as a key. It would be fine and highly efficient and well, if not for the fact that the default implementation of IMarket is not suitable to be a Dictionary key. It does not implement IComparable<T> and IEquatable<T> which means, for the only case that two IMarket instances to be equal, is that they are the same instances. Otherwise even if their properties all equal in value, they will not be equal.

This is a short program that demonstrates the problem. You can expect it write “False” to the output console.

public class Program
{
    private static Dictionary<AClass, int> dict = new Dictionary<AClass, int>();
    public static void Main()
    {
        dict.Add(new AClass("abc", 1), 1);
        dict.Add(new AClass("xyz", 2), 2);

        Console.WriteLine(dict.ContainsKey(new AClass("abc", 1)));
    }
}


public class AClass
{
    public AClass(string a, int b)
    {
        AString = a;
        AnInt = b;
    }

    public string AString { get; set; }
    public int AnInt { get; set; }
}

The question arises is that if the key is not matched and an empty list of campaigns returns, why this only happens sometimes. The answer is the IMarket instances themselves are cached, by default in 5 minutes. So for the problem to occur, a cache for campaigns must be loaded in memory, just before the cache for IMarket instances to be expired (then new instances are created). Once the new IMarket instances are loaded, then the campaigns cache must be accessed again before itself expires (default to 30 seconds). The timing needs to be “right” which causes this problem elusive and hard to find from normal testing – both automated and manual.

Time to some blaming and finger pointing. When I fix something I usually try to check the history of the code to understand the reason behind the original idea and intention. Was there a reason or just an overlook. And most importantly

Who wrote such code?

Me, about 7 months ago.

Uh oh.

The fix was simple enough. Instead of IMarket, we can change the key to MarketId which implements both IEquatable<T> and IComparer<T>. So it does not matter if you have two different instances of MarketId, as long as they have the same value, they will be equal.

A workaround was sent to the customer to test and after a week or so they reported back the problem is gone. The official fix is in Commerce 14.31 which was released yesterday https://nuget.optimizely.com/package/?id=EPiServer.Commerce.Core&v=14.31.0 , so you’re, always, highly recommended to upgrade.

Lessons learned:

Pick the dictionary key carefully. It should implement IEquatable<T> and IComparable<T> , properly I might ask. In general, a struct is a better choice than a class, if you can.
No matter how “experienced” you think you are, you are still a human being and can make mistake. It’s important to have someone to check your work from time to time, spotting problems that you couldn’t.

Migrate Catalog content properties

A colleague asked me yesterday – how do we migrate properties of catalog content. There is, unfortunately, no official way to do it. There are several unofficial ways to do it, however. Today we will explore the way I personally recommend – for its safety and backward compatible.

Let’s say we have FashionProduct with a MSRP property with type of Money, now we would want to change it to Decimal . There are a some hacky ways to do this, but all of them require direct database manipulation which we should try to avoid – if possible.

First we will need this piece of code. it was “stolen” from a colleague of mine and has been used for countless times. You probably want to bookmark it as it’ll likely be useful in the future (I should probably do so myself as I have to find it every time I need). It is a snippet to traverse the catalog structure based on the content type you’d want.

public virtual IEnumerable<T> GetEntriesRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : EntryContentBase
    {
        foreach (var nodeContent in LoadChildrenBatched<NodeContent>(parentLink, defaultCulture))
        {
            foreach (var entry in GetEntriesRecursive<T>(nodeContent.ContentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
        {
            yield return entry;
        }
    }

    private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
    {
        var start = 0;

        while (true)
        {
            var batch = _contentLoader.GetChildren<T>(parentLink, defaultCulture, start, 50);
            if (!batch.Any())
            {
                yield break;
            }

            foreach (var content in batch)
            {
                // Don't include linked products to avoid including them multiple times when traversing the catalog
                if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                {
                    continue;
                }

                yield return content;
            }
            start += 50;
        }
    }

To make sure we don’t load to many content at once, the batch is set size 50 but that is of course configurable (up to you)!

Now the fun part, where it actually does the work. Once we have the content, we will need to actually migrate the data, it is can be simple as this

private void MigrateProperty<T>(IEnumerable<T> contents) where T: EntryContentBase
{
      var batch = new List<T>();
      foreach(var content in contents)
      {
           var writeableClone = content.CreateWriteableClone<T>();
           Transform(writeableClone);
           batch.Add(writeableClone);
      }
      _contentRepository.Publish(batch, PublishAction.SyncDraft);
}

With the Transform method you can do whatever you want with the property value. As you might just want to rename it – it can do nothing except assign value to the new property. Or in the case we mentioned at the beginning, convert Money to Decimal is an easy task (Money is the less precision version of Decimal). Note that if you convert between data types, for example from double to int , there are potential data loss, but you are probably aware of that already.

The final step is to publish the change. For performance reasons, it is probably the best that you the Publish extension method of IContentRepository and save multiple content in one batch – may of of size 50 or 100. Those will skip things like creating new versions for optimal performance. You can read it about here New simple batch saving API for Commerce | Optimizely Developer C

The remaining question is where to put it. In a perfect world, I’d say in a migration step (i.e. a class that implement IMigrationStep ), so you ensure that your data will be properly migrated before anything else run, for example your new code that access the new property, or indexing of your content after migration. But if you have a sizeable catalog, this will take time and it might not be a good idea to let your users wait for it to complete. For that, it makes senses to do this in a schedule job and when it completes, you make a switch.

Migrating properties is not an easy or quick task, but it can be done with relative ease. It also reminds us about modeling – try to get it right from beginning so we don’t have to migrate. In the end, the fastest code is the code that does not need to be run!

Switching away from serializable cart mode

If you are using Optimizely Customized Commerce, the common wisdom is that you should be using serializable cart mode. It’s not perfect (we discussed the drawbacks in, um, my book), but generally it has performance benefits. But for any reason that you need to use the legacy cart mode, there is a switch you can use – IFeatureSwitch which can be used to change between those modes

It is important to remember that IFeatureSwitch only switch between to modes, it does not migrate any carts between those two. there is a one time cart migration job but that’s it.

To change the cart mode, it is simply as this

GetInstance<IFeatureSwitch>().DisableFeature(SerializedCarts.FeatureSerializedCarts);

However, there is a catch here.

To let IOrderRepository use the correct cart system, there is an ICartProvider which will be either CartProvider or SerializableCartProvider . The problem is that happens much earlier in the pipeline than IInitializationModule. In fact it is determined in IConfigurableModule.ConfigureContainer , which means before any IInitializationModule.Initialize. Even if we call DisableFeatures in another ConfigureContainer, there is no warranty that our code will be called before the framework code (the order of ConfigureContainer execution is indeterministic )

But fortunately, we can do that inside Startup.Configure. Due to how the feature switch data structure, it’s not as simple as adding a setting in appsettings.json, but it can be done easily in code:

            services.Configure<ApplicationOptions>(x =>
            {
                x.Features[SerializedCarts.FeatureSerializedCarts] = new ApplicationFeature
                {
                    Feature = "SerializedCarts",
                    State = FeatureState.Disabled,
                    Type = "Mediachase.Commerce.Core.Features.SerializedCarts, Mediachase.Commerce"
                };
            });

Of course, this is a workaround. The feature switch should be done as documented. It will be fixed in the future.

Command timeout for Commerce 14

While we always want to have fast database queries, it is not doable all the time. Sometimes we need to run slow queries, and we need to tell the underlying framework that this query can take some time to complete, and we’re fine with it. Otherwise, it will try to terminate the query after 30 seconds (the default time out limit)

There is a different between connection timeout and command timeout. Connection timeout is the time .NET will try to connect to the database before giving up. Command timeout is the time .NET will try to execute a command before giving up.

With Commerce 13, we have a setting added in 9.23.1, as we talked here Episerver Commerce commandTimeout configuration – Quan Mai’s blog (vimvq1987.com) , however, in Commerce 14, it’s … different.

Things are a bit complicated when it comes to command timeout with .NET 5 and up. With later versions of Microsoft.Data.SqlClient, it is possible to set command timeout directly using connection string. It is indeed a simple way to do it, but with a caveat.

The new setting is not recognized by Entity Framework/Entity Framework Core, and it will throw exception if you try to access a connection string with command timeout setting. It has another way to set the command timeout itself by each DbContext , but it does not accept the setting via Connection string. It will throw “Keyword not supported: ‘command timeout'” if such setting is present.

The workaround is to configure the command timeout for EcfSqlConnection connection string, and another different connection string without command timeout just for Entity Framework.

However, that’s with a caveat: using command timeout in connection string means that value applies to all database queries. As we discussed in the previous post above, doing so is not without drawbacks – it hides slow queries rather than let it fails. A failed query might not bring down your website, but an overloaded database will likely do.

In Commerce 14.15.6 which should be released shortly, we introduce a new setting SqlDataProviderOptions.CommandTimeout which let you set the command timeout for queries that are using SqlDataProvider – most notably the MetaDataPlus system like orders.

The important advantage of the new setting is that you can set it on the fly. If you know that some operation will be slow, you can set the command timeout to a higher value just for that operation, then set it back to default value. In most cases, you can leave it to default value (30 seconds), and do optimization on application level (reduce batch size for example) or database layer (rewrite the query, adding/optimizing indexes etc.). But sometimes you know the query would be slow and you acknowledge that – this is the tool.

Building a better wish list – part 1

If you have been using Optimized Customized Commerce, you probably know that, by default, wish list is just a cart with a special name. Can you guess the name? Surprise, surprise, it’s “Wishlist”. It’s been there since forever, from the early day of Mediachase, and then carried over to the new serializable cart. I have been “fine” with it – i.e. I accept the approach unconsciously. But until very recently I realized there are several problems with the approach.

How come it is not a very good idea?

First of all, it shares same table as the normal cart. To search for abandoned carts, you would have to skip the carts with “wishlist” name. There are only a few cart names and they are not evenly distributed, you will have hard time filtering carts by their names.

But there is more. As most customers are using the serializable cart mode now, ever growing wishlists also pose a another problem – each operation on the wishlist – adding or removing item, will result in a big write to the SerializableCart table. If you have just a few items, it might be fine, but a simple test on Commerce shows that with only 9 items in wishlist, the Data column is more than 2700 characters. And wishlists are meant to be kept forever – they will only grow in size.

My saved for later on Amazon – which is the closet thing to a “wish list”. Imagine having that on Optimizely Customized Commerce.

As wishlists are carts, they have to be in same format even though a lot of them are redundant/unnessary.

The biggest benefit, and I think it triumphs all other disadvantages we have listed, of the default wishlist implementation is it’s built-in. You can start using it without almost no additional effort. Get a cart with the predefined name and you are good to go. Building a different wish list definitely costs time and resource, a luxury not everyone can afford.

For that, I have been starting building a wish list service on my free time. I plan to make it open source when the time is right, but we’ll see about that.

Moral of the story

It is critical to take a step back, from time to time, to think about what you have done. Things might make less senses when you see it from a different perspective.
You can almost always do better.