Fatal flaw with geta-notfoundhandler

First of all, I’d like to make this a farewell post to this blog. The VM that is hosting this blog – which is generously sponsored by my employer, is being decommissioned. I have looked at some alternatives to move my blog to, but none is interesting in term of time/effort/cost. With all the things going on in my life, setting up a new blog is not of priority (that’d be, surprise, surprise, tomatoes these days). We had a good run, and I hope this blog has been useful to you, one way for another. And I hope to see you again, some days.

Back to business. The topic of today is a flaw that quite fatal of geta-notfoundhandler, that is opened source at GitHub – Geta/geta-notfoundhandler: The popular NotFound handler for ASP.NET Core and Optimizely, enabling better control over your 404 page in addition to allowing redirects for old URLs that no longer works.. We have cases when a customer’s instance hit very high CPU, essentially hanging, and the only course of action is to restart the instance. Memory dumps taken at the time all point out to the notfoundhandler, specifically, CustomRedirectCollection.This issue has been brought to me by a colleague a couple of months ago. As he treated it as lowkey, we took a quick look into it, had some good ideas, but we never really got to the bottom of it. I let it slip because it was not critically urgent/important (and you know that is when something will not be addressed).

Another colleague brought up it again recently, and while I was suffering a bad headache from a prolonged flu, I decided to get this over with. Before jumping to the conclusion and the fix, let’s start with the symptoms and analysis.

As mentioned above, in the memory dumps taken, we have seen one or two threads stuck in this stacktrace

00007E69D57E67F0 00007e6c3bded5c3 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.CreateSubSegmentRedirect(System.ReadOnlySpan`1, System.ReadOnlySpan`1, Geta.NotFoundHandler.Core.Redirects.CustomRedirect, System.ReadOnlySpan`1)
00007E69D57E6880 00007e6c3a47ea43 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.FindInternal(System.String)
00007E69D57E6920 00007e6c3c1e2be5 Geta.NotFoundHandler.Core.Redirects.CustomRedirectCollection.Find(System.Uri)
00007E69D57E6950 00007e6c3c1e2a19 Geta.NotFoundHandler.Core.RequestHandler.HandleRequest(System.Uri, System.Uri, Geta.NotFoundHandler.Core.Redirects.CustomRedirect ByRef)
00007E69D57E6990 00007e6c3bd326f7 Geta.NotFoundHandler.Core.RequestHandler.Handle(Microsoft.AspNetCore.Http.HttpContext)
00007E69D57E69E0 00007e6c3bc9f9ba Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware+d__2.MoveNext()
00007E69D57E6A40 00007e6c3bce9418 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware+d__2, Geta.NotFoundHandler]](d__2 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E69D57E6A90 00007e6c3bce9360 Geta.NotFoundHandler.Infrastructure.Initialization.NotFoundHandlerMiddleware.InvokeAsync(Microsoft.AspNetCore.Http.HttpContext, Geta.NotFoundHandler.Core.RequestHandler)

if there are a lot of threads stuck in same stacktrace, you can think about lock contention – i.e. a lot of threads just waiting for some lock to be released. But if there are only a few threads in a case of high CPU, that suggests an endless loop, some do while or while code that never exits properly.

One quite infamous case of the endless loop is that when you mess with Dictionary in a concurrent scenario – adding to it while other threads are reading. That’s when your foreach will never end.

But where?

Luckily for us the library is open source so it’s much easier to try some dry code reading and guess where the problem is. And of course we found a while loop

        // Note: Guard against infinite buildup of redirects
        while (appendSegment.UrlPathMatch(oldPath))
        {
            appendSegment = appendSegment[oldPath.Length..];
        }

The only remaining question is what value of oldPath would trigger that endless loop. Lazy as I am, I turned to CoPilot to have it analyze the code to see what values that can be potentially the culprit. Blah blah, CoPilot says something like /abc/abc/abc/ could. But it does not. (We are still a long way from losing our jobs to AI, folks)

That’s the stop of my first engagement. I almost forgot about it, until another colleague asked me the same question, this time more serious. I was not very productive anyway due to headache, so let’s fix this problem once and for all.

So with a new memory dump I dived again into the problem. After identifying the offendeing thread, let’s run !clrstack -p to see if we can figure out the url that triggers the issue. This really raised my eyebrow

0:026> !do 0x00007e69dcd8bb60
Name:        Geta.NotFoundHandler.Core.Redirects.CustomRedirect
MethodTable: 00007e6c35be3ff8
EEClass:     00007e6c35bd30f0
Tracked Type: false
Size:        72(0x48) bytes
File:        /app/Geta.NotFoundHandler.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007e6c2ff0bac0  4000059       20       System.Boolean  1 instance                0 <WildCardSkipAppend>k__BackingField
00007e6c2ffbd2e0  400005a        8        System.String  0 instance 00007e6ad7fff3a0 _oldUrl
00007e6c2ffbd2e0  400005b       10        System.String  0 instance 00007e6ad7fff3a0 <NewUrl>k__BackingField
00007e6c2ffa9018  400005c       18         System.Int32  1 instance                0 <State>k__BackingField
00007e6c35bcf8f0  400005d       1c         System.Int32  1 instance              302 <RedirectType>k__BackingField
00007e6c3372f578  400005e       28 ...Private.CoreLib]]  1 instance 00007e69dcd8bb88 <Id>k__BackingField
0:026> !DumpObj /d 00007e6ad7fff3a0
Name:        System.String
MethodTable: 00007e6c2ffbd2e0
EEClass:     00007e6c2ff97b10
Tracked Type: false
Size:        22(0x16) bytes
File:        /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.36/System.Private.CoreLib.dll
String:      
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007e6c2ffa9018  40002ba        8         System.Int32  1 instance                0 _stringLength
00007e6c2ff0e5a8  40002bb        c          System.Char  1 instance                0 _firstChar
00007e6c2ffbd2e0  40002b9       d0        System.String  0   static 00007e6ad7fff3a0 Empty

The oldUrl is empty here. And to my surprise, passing an empty value to while loop above, really tricks it to never exit. A simple program to demonstrate the issue

var appendSegment = ReadOnlySpan<char>.Empty; 
var oldPath = "".AsSpan();
while (appendSegment.UrlPathMatch(oldPath))
{
    appendSegment = appendSegment[oldPath.Length..];
}

internal static class SpanExtensions
{
    public static ReadOnlySpan<char> RemoveTrailingSlash(this ReadOnlySpan<char> chars)
    {
        if (chars.EndsWith("/"))
            return chars[..^1];

        return chars;
    }

    public static bool UrlPathMatch(this ReadOnlySpan<char> path, ReadOnlySpan<char> otherPath)
    {
        otherPath = RemoveTrailingSlash(otherPath);

        if (path.Length < otherPath.Length)
            return false;

        for (var i = 0; i < otherPath.Length; i++)
        {
            var currentChar = char.ToLowerInvariant(path[i]);
            var otherChar = char.ToLowerInvariant(otherPath[i]);

            if (!currentChar.Equals(otherChar))
                return false;
        }

        if (path.Length == otherPath.Length)
            return true;

        return path[otherPath.Length] == '/';
    }
}

The problem is that there was a safeguard for null value, but not Empty. So, to trigger the issue, you will need to have 1. a custom redirect rule that has empty oldUrl, 2. a url that does not match any other rules defined.

Once you reached the empty rule, it’s the death sentence. The while loop never exits and it will eat all CPU resource until the instance is restart.

The fix in this case is to check the oldUrl for both null and Empty. If you are using the package in your project, this is the change you need

geta-notfoundhandler/src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs at master · quanmaiepi/geta-notfoundhandler

Maybe someone can take this and contribute to the main repo for everyone, when I’m busy attending my tomatoes

Shorten your cache keys, please

In a recent customer engagement, I have looked into a customer where their memory usage is abnormally high. Among other findings, I think one was not very well known. But as they say, small details can make a big difference – and that small detail in today’s post is the cache key.

Let’s put it to test. This is what I asked Copilot to scaffold it, and then just some small adjustments. The test is to add 10.000 items to cache, and then read each entry 10 times. One test with a very short prefix, and one with a long (but not very long) one.

using System;
using System.Runtime.Caching;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace MemoryCacheBenchmarkDemo
{
    // Use MemoryDiagnoser to capture memory allocation metrics during benchmarking.
    [MemoryDiagnoser]
    public class MemoryCacheBenchmark
    {
        private const int Iterations = 10_000;
        private const int ReadIterations = 10;

        [Benchmark(Description = "MemoryCache with Short Keys")]
        public void ShortKeysBenchmark()
        {
            using (var cache = new MemoryCache("ShortKeysCache"))
            {
                const string Prefix = "K";
                // Insertion phase using short keys (e.g., "K0", "K1", ...)
                for (int i = 0; i < Iterations; i++)
                {
                    string key = Prefix + i;
                    cache.Add(key, i, DateTimeOffset.UtcNow.AddMinutes(5));
                }

                // Retrieval phase for short keys.
                
                for (int j = 0; j < Iterations; j++)
                {
                    int sum = 0;
                    for (int i = 0; i < ReadIterations; i++)
                    {
                        string key = Prefix + i;
                        if (cache.Get(key) is int value)
                        {
                            sum += value;
                        }
                    }
                    // Use the result to prevent dead code elimination.
                    if (sum == 0)
                    {
                        throw new Exception("Unexpected sum for short keys.");
                    }
                }
            }
        }

        [Benchmark(Description = "MemoryCache with Long Keys")]
        public void LongKeysBenchmark()
        {
            using (var cache = new MemoryCache("LongKeysCache"))
            {
                const string Prefix = "ThisIsAVeryLongCacheKeyPrefix_WhichAddsExtraCharacters_IsThisLongEnoughIAmNotSure";
                // Insertion phase using long keys.
                // Example: "ThisIsAVeryLongCacheKeyPrefix_WhichAddsExtraCharacters_0", etc.
                for (int i = 0; i < Iterations; i++)
                {
                    string key = Prefix + i;
                    cache.Add(key, i, DateTimeOffset.UtcNow.AddMinutes(5));
                }

                // Retrieval phase for long keys.
                for (int j = 0; j < Iterations; j++)
                {
                    int sum = 0;
                    for (int i = 0; i < ReadIterations; i++)
                    {
                        string key = Prefix + i;
                        if (cache.Get(key) is int value)
                        {
                            sum += value;
                        }
                    }
                    // Use the result to prevent dead code elimination.
                    if (sum == 0)
                    {
                        throw new Exception("Unexpected sum for short keys.");
                    }
                }
            }
        }
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            // Executes all benchmarks in MemoryCacheBenchmark.
            BenchmarkRunner.Run<MemoryCacheBenchmark>();
        }
    }
}

And the difference. Not only the short key is faster, you end up with considerably less allocations (and therefore, Garbage collection)

The only requirement for cache key is that it’s unique. It might make sense to code it in a way that if you have to ever look into memory dumps (I wish you never have to, but it’s painful yet fun experience) – you know what cache entry is this. For example in Commerce we have this prefix for order cache key:

EP:EC:OS:

And that’s it. EP is short hand for EPiServer (so we know it’s us), EC is for eCommerce, and OS is for Order System. (I know, it’s been that way for a very long time for historical reasons, and nobody bothers to change it)

So next time you adding some cache to your class, make sure to use the shortest cache key as possible. It’s not micro optimization. If you know it’s better, why not?

How to: set access right to folders

Today I stumped upon this question Solution for Handling File Upload Permissions in Episerver CMS 12, and there is a simple solution for that

Using the EditSecurity.aspx comes in handy but it is not very future proof as it is now removed in CMS 12 with its siblings WebForms part. However, we can easily make up for that by using the APIs – which is meant to remain for a long time. Even better, this could be set up to be done automatedly and repeatedly. The little secret is IContentSecurityRepository which is what EditSecurity.aspx used under the hood.

To set the access right of a content, this is what you need

        var contentlink = new ContentReference(100);
        var content = _contentRepository.Get<IContent>(contentlink);
        if (content is IContentSecurable securable)
        {
            var descriptor = securable.GetContentSecurityDescriptor();
            //You might need to add this if your content is not Catalog
            descriptor = (ContentAccessControlList)descriptor.CreateWriteableClone();
            descriptor.Clear();
            descriptor.AddEntry(new AccessControlEntry("Everyone", AccessLevel.Read, SecurityEntityType.Role));
            descriptor.AddEntry(new AccessControlEntry("Author", AccessLevel.Read | AccessLevel.Create | AccessLevel.Edit | AccessLevel.Delete | AccessLevel.Publish, SecurityEntityType.Role));
//any other access rights that you need to set
            _contentSecurityRepository.Save(contentlink, descriptor, SecuritySaveType.Replace);
        }

The first two lines are from for demonstration, and not very “automated”. You might have some hard coded value there or might have to work your magic to find the right content. (Just a kind note, if you want your code to work consistently between sites, make sure to use ContentGuid instead ContentId which is changed depending on the database).

The rest of the code is quite self-explanatory. You check if your content implements IContentSecurable which is required to have access rights. Then you clear any existing access rights and add your desirable ones. Finally save it with the SecuritySaveType.Replace to make sure that only access rights you wanted, exist.

This code can be run multiple times (Idempotent). You can add it to a scheduled job, or even better, a startup routine, to make sure that you always have the right access rights of specific content.

AsyncHelper can be considered harmful

.NET developers have been in the transition to move from synchronous APIs to asynchronous API. That was boosted a lot by await/async keyword of C# 5.0, but we are now in a dangerous middle ground: there are as many synchronous APIs as there are async ones. The mix of them requires the ability to call async APIs from a synchronous context, and vice versa. Calling synchronous APIs from an async context is simple – you can fire up a task and let it does the work. Calling async APIs from a sync context is much more complicated. And that is where AsyncHelper comes to the play.

AsyncHelper is a common thing used to run async code in a synchronous context. It is simple helper class with two methods to run async APIs

        public static TResult RunSync<TResult>(Func<Task<TResult>> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            return _myTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

        public static void RunSync(Func<Task> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            _myTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

There are slight variants of it, with and without setting the CurrentCulture and CurrentUICulture, but the main part is still spawning a new Task to run the async task, then blocks and gets the result using Unwrap().GetAwaiter().GetResult();

One of the reason it was so popular was people think it was written by Microsoft so it must be safe to use, but it is actually not true: the class is introduced as an internal class by AspNetIdentity AspNetIdentity/src/Microsoft.AspNet.Identity.Core/AsyncHelper.cs at main · aspnet/AspNetIdentity (github.com) .That means Microsoft teams can use it when they think it’s the right choice to do, it’s not the default recommendation to run async tasks in a synchronous context.

Unfortunately I’ve seen a fair share of threads stuck in AsyncHelper.RunSync stacktrace, likely have fallen victims of a deadlock situation.

    756A477F9790	    75ABD117CF16	[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)
    756A477F98C0	    75AB62F11BF9	System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
    756A477F9970	    75AB671E0529	System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
    756A477F99D0	    75AB671E0060	System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
    756A477F9A40	    75AB676068B8	System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task, System.Threading.Tasks.ConfigureAwaitOptions)
    756A477F9A60	    75AB661E4FE7	System.Runtime.CompilerServices.TaskAwaiter`1[[System.__Canon, System.Private.CoreLib]].GetResult()

An further explanation of why this is bad can be read here

c# – Is Task.Result the same as .GetAwaiter.GetResult()? – Stack Overflow

Async/sync is a complex topic and even experienced developers make mistake. There is no simple way to just run async code in a sync context. AsyncHelper is absolutely not. It is simple, convenient way, but does not guarantee to be correct thing in your use case. I see it as a shortcut to solve some problems but create bigger ones down the path.

Just because you can. doesn’t mean you should. That applies to AsyncHelper perfectly

Solving the mystery of high memory usage

Sometimes, my work is easy, the problem could be resolved with one look (when I’m lucky enough to look at where it needs to be looked, just like this one Varchar can be harmful to your performance – Quan Mai’s blog (vimvq1987.com) ). Sometimes, it is hard. Can’t count number of times that I stared blankly at the screen, and decided I’d better take a nap, roast a batch of coffee, or take a walk (that is lying, however, I don’t walk), because I’m out of idea and this is going nowhere. The life of a software diagnostic engineer is like that, sometimes you are solving the mystery of “what do I need to solve this mystery”. There are usually more dots scattered around in all places, your job is to figure out which dots make senses, which dots do not, and how to connect those that are relevant to solve the problem, and to tell a story.

The story today is about a customer complaining about their scheduled instance on DXP keeps having high memory after running Find indexing job. They have a custom job that was built to optimize performance for their language settings, but the idea is the same – load content, serialize it and send it to the server endpoint for indexing. It is, indeed a memory heavy job, especially when you have a lot of content that needs to be indexed (basically, number of content x number of languages x the complexity of the content). It is normal to have an increase in memory usage during such job – the application (or rather, the runtime, depending on which way you look at it) is doing it job – content needs to be loaded in memory, and if there is available memory it will be a huge waste if it is not used for something useful. And the application will not immediately release that memory, as the content is cached. The memory will only be reclaimed only if the cache expired, or the application has memory pressure (i.e. it asks the operating system for more memory and the OS refuses “there is nothing left”). Even if the cache is expired, the application will not always compact and release the memory back to the OS (LOH etc.)

Now what is problematic is that the customer application retains 25GB of memory for indefinitely. They waited for 24h but the memory usage is still high. The application appears to be fine, it does not crash because of memory issues (like Out of Memory), but it causes confusion s and worries to our customer. Game’s on.

One thing that does not make senses in this case is that even thought they have a custom index job, it is still a scheduled job. And for scheduled jobs, the contents are supposed to have a very short sliding expiration time (default to 1 minute). However, the cache entries in the memory dumps tell a different story. A majority of the cache entries have 12h sliding expiration time. Which does explain – in part at least – why the memory remains high. When you have a longer sliding time, chance is higher that the cache is hit at least once before it expires, which reset the expiration. If you have sufficient hit, the cache will effectively remain in memory forever, until you actively evict it (by editing the content for example)

0000753878028910                        0.77kb          0                           12:00:00                    2/16/2024 5:58:43 AM +00:00    EPPageData:601596:en__CatalogContent
0000753878029DC0                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1345603:es-pr__CatalogContent
00007538781C7F48                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351986:es-pr__CatalogContent
00007538781C8058                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1346230:es-pr__CatalogContent
00007538781C8168                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351988:es-pr__CatalogContent
00007538786FA8E8                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049433:no__CatalogContent
00007538786FC598                        0.78kb          0                           12:00:00                    2/16/2024 9:32:28 AM +00:00    EPPageData:1088026:es-pr__CatalogContent
00007538786FD9E0                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049435:no__CatalogContent
0000753878700770                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029725:da__CatalogContent
0000753878706528                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351990:es-pr__CatalogContent
0000753878706638                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1350104:es-pr__CatalogContent
00007538787A2F80                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049439:no__CatalogContent
00007538787A3FD0                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029729:da__CatalogContent
00007538787A6B48                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029731:da__CatalogContent
00007538787A74C0                        0.77kb          0                           12:00:00                    2/16/2024 6:21:34 AM +00:00    EPPageData:690644:en__CatalogContent
00007538787A9CC8                        0.78kb          0                           12:00:00                    2/16/2024 5:43:57 AM +00:00    EPPageData:181410:cs-cz__CatalogContent
00007538787ACDD8                        0.82kb          0                           12:00:00                    2/16/2024 2:17:38 PM +00:00    EPPageData:1343746__CatalogContent
00007538787ACFF8                        0.83kb          0                           12:00:00                    2/16/2024 2:17:25 PM +00:00    EPPageData:1343746:en__CatalogContent
00007538787AE658                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1350160:da__CatalogContent
00007538787AE768                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1350162:da__CatalogContent
00007538787AEA98                        0.39kb          0                           00:00:00                    2/16/2024 2:17:38 PM +00:00    EPiAnc:ContentAssetAware1343745__CatalogContent
00007538787AF058                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1347560:da__CatalogContent
00007538787B29A0                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329806:da__CatalogContent
00007538787B2E68                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329808:da__CatalogContent
00007538787B31E8                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329810:da__CatalogContent

It is not what it should be, however, as the default value for sliding expiration timeout of a content loaded by a scheduled job is 1 minute – i.e. it is considered to be load once and be done item. Was it set to 12h by mistake. Nope

Timeout is set to 600.000.000 ticks which is 60 second, which is the default value.

I have been pulling my hairs over this for quite a while. What if the cache entries were not added by the scheduled job, but by some other way not affected by the limitation of scheduled job? In short, we were deceived by customer’s statement regarding Find indexing job. It was merely a victim of same issue. It was resetting the last access to the cache entry but that’s about it.

Time to dig a bit more. While Windbg is extremely powerful, it does not let you know where is the code that load a specific content into cache (not unless you catch it red handed). So the only way to know is to look around and check if there are any suspicious call the IContentLoader.GetItems or IContentLoader.GetChildren . A colleague of mine worked with the customer to obtain their source code, and another deep dive.

Fortunately for us, the customer has a custom built Find indexer we helped to built in a previous problem, and that was shown in the search for GetItems. It struck me that it could be the culprit. The job itself is … fine, however it was given wrong data so it keeps loading content to index.

If my hypothesis is correct, then these things must be true:

The app’s memory usage will raise to 25GB regardless of the indexing job running or not. And it remains there without much fluctuation
There are a lot of row in tblFindIndexQueue

It turned out both of those were correct: there were more than 4 millions of rows in tblFindIndexQueue, and this is the memory consumption of the app over 24 hours

One we figured out the source of content loading, the fix was pretty straightforward. One thing we could do from our side is to shorten caching time of content loaded by the event-driven indexer. You should upgrade to Find 16.2.0 which contains the fix for FIND-12436 which is a nice improvement for memory usage.

Moral of story:

I’m a workaholic. I definitely should not work on weekends, but sometimes I need to because that’s when my mind is clearest
Keep looking. But as always, know when to give up and admit defeat
Take breaks. Long, shorts. Refresh your mind and look at different angles.
The sliding cache expiration time can be quite unexpected. if a content is already in cache with long sliding expiration, then a cache hit (via ISynchronizedObjectInstanceCache.ReadThrough to get that content with short sliding expiration will not change that value, only refresh the last access time, and vice versa)

Performance optimization – the hardcore series – part 3

“In 99% of the cases, premature optimization is the root of all devil”

This quote is usually said to be from Donald Knuth, usually regarded as “father of the analysis of algorithms”. His actual quote is a bit difference

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%.

If you have read my posts, you know that I always ask for measuring your application before diving in optimization. But that’s not all of the story. Without profiling, your optimization effort might be futile. But there are things you can “optimize” right away without any profiling – because – they are easy to do, they make your code simpler, easier to follow, and you can be certain they are faster.

Let’s see if you can spot the potential problematic piece of code from this snippet

public Something GetData()
{
var market = list.FirstOrDefault(x => x.MarketId == GetCurrentMarket().MarketId)
{
//do some stuffs
}

}

If you are writing similar code, don’t be discouraged. It’s easy to overlook the problem – when you call FirstOrDefault, you actually iterate over the list until you find the first matching element. And for each and every of that, GetCurrentMarket() will be called.

Because we can’t be sure when we will find the matching element, it might be the first element, or the last, or it does not exist, or anywhere in between. The median is that GetCurrentMarket will be half called half the size of list

We don’t know if GetCurrentMarket is a very lightweight implementation, or list is a very small set, but we know that if this is in one very hot path, the cost can be (very) significant. These are the allocations made by said GetCurrentMarket

This is a custom implementation of IMarketService – the default implementation is much more lightweight and should not be of concern. Of course, fewer calls are always better – no matter how quick something is.

In this specific example, a simple call to get the current market and store it in a local variable to be used in the scope of the entire method should be enough. You don’t need profiling to make such “optimization” (and as we proved, profiling only confirm our suspect )

Moral of the story

For optimization, less is almost always, more
You definitely should profile before spending any considerable amount optimizing your code. But there are things that can be optimized automatically. Make them your habit.

Performance optimization – the hardcore series – part 2

Earlier we started a new series about performance optimization, here Performance optimization – the hardcore series – part 1 – Quan Mai’s blog (vimvq1987.com) . There are ton of places where things can go wrong. A seasoned developer can, from experience, avoid some obvious performance errors. But as we will soon learn, a small thing can make a huge impact if it is called repeatedly, and a big thing might be OK to use as long as it is called once.

Let’s take this example – how would you think about this snippet – CategoryIds is a list of string converted from ContentReference

            if (CategoryIds.Any(x => new ContentReference(x).ToReferenceWithoutVersion() == contentLink))
            {
                //do stuff
            }

If this is in any “cool” path that run a few hundred times a day, you will be fine. It’s not “elegant”, but it works, and maybe you can get away with it. However, if it is in a hot path that is executed every time a visitor visits a product page in your website, it can create a huge problem.

And can you guess what it is?

new ContentReference(string) is fairly lightweight, but if it is called a lot, this is what happen. This is allocations from the constructor alone, and only within 220 seconds of the trace

A lot of allocations which should have been avoided if CategoryIds was just an IEnumerable<ContentReference> instead of IEnumerable<string>

For comparison, this is how 10.000 and 1000.000 new ContentReference would allocate

Thing is similar if you use .ToReferenceWithoutVersion() to compare to another ContentReference (although to a lesser extend as ToReferenceWithoutVersion would return the same ContentReference if the WorkId is 0, and it use cloning instead of new). The correct way to compare two instances of ContentReference without caring about versions, is to use .Compare with ContentReferenceComparer.IgnoreVersion

Moral of the story

It is not only what you do, but also how you do it
Small things can make big impacts, don’t guess, measure!

Performance optimization – the hardcore series – part 1

Hi again every body. New day – new thing to write about. today we will talk about memory allocation, and effect it has on your website performance. With .NET, memory allocations are usually overlooked because CLR handles that for you. Except in rare cases that you need to handle unmanaged resources, that you have to be conscious about releasing that said resources yourself, it’s usually fire and forget approach.

Truth is, it is more complicated than that. The more objects you created, the more memory you need, and the more time CLR needs to clean it up after you. When you might have written code that is executed blazing fast in your benchmarks, in reality, your website might still struggle to perform well in long run – and that’s because of Garbage collection. Occasional GC is not of a concern – because it’s nature of .NET CLR, but frequent GC, especially Gen 2 GC, is definitely something you should look into and should fix, because it natively affects your website performance.

The follow up question – how do you fix that.

Of course, the first step is always measuring the memory allocations of your app. Locally you can use something like Jetbrains dotMemory to profile your website, but that has a big caveat – you can’t really mimic the actual traffic to your website. Sure, it is very helpful to profile something like a scheduled job, but it is less than optimal to see how your website performs in reality. To do that, we need another tool, and I’ve found nothing better than Application Insights Profiler trace on Azure. It will sample your website periodically, taking ETL ( event trace log) files in 220 seconds (Note, depends on your .NET version, you might download a .diagsession or a .netperf.zip file from Application Insights, but they are essentially the same inside (zipped .ETL)). Those files are extremely informative, they contains whole load of information which might be overwhelming if you’re new, but take small steps, you’ll be there.

To open a ETL file, common tool is Perfview (microsoft/perfview: PerfView is a CPU and memory performance-analysis tool (github.com)). Yes it has certain 2000 look like other analysis tool (remember Windbg), but it is fast, efficient, and gets the job done

Note that once extracted ETL can be very big – in 1GB or more range often. Perfview has to go through all that event log so it’s extremely memory hungry as well, especially if you open multiple ETL files at once. My perfview kept crashing when I had a 16GB RAM machine (I had several Visual Studio instances open), and that was solved when I switched to 32GB RAM

The first step is to confirm the allocation problems with GCStats (this is one of the extreme ones, but it does happen)

Two main things to look into – Total Allocs, i.e. the total size of objects allocated, and then the time spent in Garbage collection. They are naturally closely related, but not always. Total allocation might not be high but time for GC might be – in case of large objects allocation (we will talk about it in a later post). Then for the purpose of memory allocation analysis, this is where you should look at

What you find in there, might surprise you. And that’s the purpose of this series, point out possible unexpected allocations that are easy – or fairly easy – to fix.

In this first post, we will talk about a somewhat popular feature – Injected<T>.

We all know that in Optimizely Content/Commerce, the preferred way of dependency injection is constructor injection. I.e. if your class has a dependency on a certain type, that dependency should be declared as a parameter of the constructor. That’s nice and all, but not always possible. For example you might have a static class (used for extension methods) so no constructor is available. Or in some rare cases, that you can’t added a new parameter to the constructor because it is a breaking change.

Adding Injected<T> as a hidden dependency in your class is at least working, so can you forget about it?

Not quite!

This is how the uses of Injected<T> result in allocation of Structuremap objects – yes every time you call Injected<T>.Service the whole dependency tree must be built again.

And that’s not everything, during that process, other objects need to be created as well. You can right click on a path and select “Include item”. The allocations below are for anything that were created by `module episerver.framework episerver.framework!EPiServer.ServiceLocation.Injected1[System.__Canon].get_Service() i.e. all object allocations, related to Injected<T>

You can expand further to see what Injected<T>(s) have the most allocations, and therefore, are the ones should be fixed.

How can one fix a Injected<T> then? The best fix is to make it constructor dependency, but that might not always be possible. Alternative fix is to use ServiceLocator.GetInstance, but to make that variable static if possible. That way you won’t have to call Injected<T>.Service every time you need the instance.

There are cases that you indeed need a new instance every time, then the fix might be more complicated, and you might want to check if you need the whole dependency tree, or just a data object.

Moral of the story

Performance can’t be guessed, it must be measured
Injected<T> is not your good friend. You can use it if you have no other choice, but definitely avoid it in hot paths.

Delete orphaned assets

I was asked this question: we have about 3TB of assets, any way to clean it up.

These days, storage is cheap, but still not free. and big storage means you need space for back up. and with that, bandwidth and time.

Is there away to clean up things you no longer need?

Yes!

Optimizely Content already has a scheduled job named Remove Abandoned BLOBs, but this job only removes the blobs that have no content associated. I.e. the content is deleted by IContentRepository.Delete but the blob was left behind. The job uses the log to find out which content were deleted, then find those blobs.

How’s about the assets that still have contents associated with them, but not used anywhere? Time to get your hands dirty!

Due to the nature of this task, it is best to make it a scheduled job.

All of the assets are children under the global asset root. By iterating over them, we can check if each of them is being used by another content. If not, we will add them to a list for later delete. Before deleting the content, we will find the blob and then delete it as well. Easy, right?

To get the content recursively we use this little piece of code

        public virtual IEnumerable<T> GetAssetRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : MediaData
        {
            foreach (var folder in LoadChildrenBatched<ContentFolder>(parentLink, defaultCulture))
            {
                foreach (var entry in GetAssetRecursive<T>(folder.ContentLink, defaultCulture))
                {
                    yield return entry;
                }
            }

            foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
        {
            var start = 0;

            while (!_isStopped)
            {
                var batch = _contentRepository.GetChildren<T>(parentLink, defaultCulture, start, 50);
                if (!batch.Any())
                {
                    yield break;
                }
                foreach (var content in batch)
                {
                    // Don't include linked products to avoid including them multiple times when traversing the catalog
                    if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                    {
                        continue;
                    }

                    yield return content;
                }
                start += 50;
            }
        }

And we will start from SiteDefinition.Current.GlobalAssetsRoot, and use IContentRepository.GetReferencesToContent to see if it is used in any content (both CMS and Catalog). If not, we add it to a list. Later, we use IPermanentLinkMapper to see if it has any blob associated, and delete that as well

            foreach (var asset in GetAssetRecursive<MediaData>(SiteDefinition.Current.GlobalAssetsRoot, CultureInfo.InvariantCulture))
            {
                totalAsset++;
                if (!_contentRepository.GetReferencesToContent(asset.ContentLink, false).Any())
                {
                    toDelete.Add(asset.ContentLink.ToReferenceWithoutVersion());
                }

                if (toDelete.Count % 50 == 0)
                {
                    var maps = _permanentLinkMapper.Find(toDelete);
                    foreach (var map in maps)
                    {
                        deletedAsset++;
                        _contentRepository.Delete(map.ContentReference, true, EPiServer.Security.AccessLevel.NoAccess);
                        var container = Blob.GetContainerIdentifier(map.Guid);
                        //Probably redundency, can just delete directly
                        var blob = _blobFactory.GetBlob(container);
                        if (blob != null)
                        {
                            _blobFactory.Delete(container);
                        }
                        OnStatusChanged($"Deleting asset with id {map.ContentReference}");
                    }
                    toDelete.Clear();
                }
            }

We need another round of delete after the while loop to clean up the left over (or if we have less than 50 abandoned assets)

And we’re done!

Testing this job is simple – uploading a few assets to your cms and do not use it anywhere, then run the job. it should delete those assets.

Things to improve: we might want to make sure only assets that created more than a certain number of days ago are deleted. This allows editors to upload assets for later uses without having to use them immediately.

The code has been open sourced at vimvq1987/DeleteUnusedAssets: Delete unused assets from an Optimizely/Episerver site (github.com) , and I have uploaded a nuget package to Packages (optimizely.com) to be reviewed.

Delete a content – directly from database

Before we even start, I would reiterate that manipulating data directly should be avoided unless absolutely necessary, it should be used as the last resort, and should be proceeded with cautions – always back up first and test your queries on development database first before running it in production. And if the situation dictates that you have to run the query, better do it with the 4 eyes principle – having a colleague double check it for you. When it comes to production database, nothing is too careful.

Now back to the question, if you absolutely have to delete a content, you should do like this

exec editDeletePage @pageId = 123, @ForceDelete = 1

It is basically what Content Clouds (i.e. CMS) does under the hood, without the cache validation on the application layer of course.

So the moral of the story – do everything with API if you can. If you absolutely have to, use the built-in stored procedures – they are tested vigorously and should have minimal issues/bugs, and should take care of everything, data-wise for you. Only write your own query if there is no SP that can be used.

Update: Initially I mentioned Tomas’ post in this, and that gave impression his way is incorrect. I should have written better. My apologies to Tomas