Performance optimization – the hardcore series – part 3

“In 99% of the cases, premature optimization is the root of all devil”

This quote is usually said to be from Donald Knuth, usually regarded as “father of the analysis of algorithms”. His actual quote is a bit difference

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%.

If you have read my posts, you know that I always ask for measuring your application before diving in optimization. But that’s not all of the story. Without profiling, your optimization effort might be futile. But there are things you can “optimize” right away without any profiling – because – they are easy to do, they make your code simpler, easier to follow, and you can be certain they are faster.

Let’s see if you can spot the potential problematic piece of code from this snippet

public Something GetData()
{
var market = list.FirstOrDefault(x => x.MarketId == GetCurrentMarket().MarketId)
{
//do some stuffs
}

}

If you are writing similar code, don’t be discouraged. It’s easy to overlook the problem – when you call FirstOrDefault, you actually iterate over the list until you find the first matching element. And for each and every of that, GetCurrentMarket() will be called.

Because we can’t be sure when we will find the matching element, it might be the first element, or the last, or it does not exist, or anywhere in between. The median is that GetCurrentMarket will be half called half the size of list

We don’t know if GetCurrentMarket is a very lightweight implementation, or list is a very small set, but we know that if this is in one very hot path, the cost can be (very) significant. These are the allocations made by said GetCurrentMarket

This is a custom implementation of IMarketService – the default implementation is much more lightweight and should not be of concern. Of course, fewer calls are always better – no matter how quick something is.

In this specific example, a simple call to get the current market and store it in a local variable to be used in the scope of the entire method should be enough. You don’t need profiling to make such “optimization” (and as we proved, profiling only confirm our suspect )

Moral of the story

  • For optimization, less is almost always, more
  • You definitely should profile before spending any considerable amount optimizing your code. But there are things that can be optimized automatically. Make them your habit.

Performance optimization – the hardcore series – part 2

Earlier we started a new series about performance optimization, here Performance optimization – the hardcore series – part 1 – Quan Mai’s blog (vimvq1987.com) . There are ton of places where things can go wrong. A seasoned developer can, from experience, avoid some obvious performance errors. But as we will soon learn, a small thing can make a huge impact if it is called repeatedly, and a big thing might be OK to use as long as it is called once.

Let’s take this example – how would you think about this snippet – CategoryIds is a list of string converted from ContentReference

            if (CategoryIds.Any(x => new ContentReference(x).ToReferenceWithoutVersion() == contentLink))
            {
                //do stuff
            }

If this is in any “cool” path that run a few hundred times a day, you will be fine. It’s not “elegant”, but it works, and maybe you can get away with it. However, if it is in a hot path that is executed every time a visitor visits a product page in your website, it can create a huge problem.

And can you guess what it is?

new ContentReference(string) is fairly lightweight, but if it is called a lot, this is what happen. This is allocations from the constructor alone, and only within 220 seconds of the trace

A lot of allocations which should have been avoided if CategoryIds was just an IEnumerable<ContentReference> instead of IEnumerable<string>

For comparison, this is how 10.000 and 1000.000 new ContentReference would allocate

Thing is similar if you use .ToReferenceWithoutVersion() to compare to another ContentReference (although to a lesser extend as ToReferenceWithoutVersion would return the same ContentReference if the WorkId is 0, and it use cloning instead of new). The correct way to compare two instances of ContentReference without caring about versions, is to use .Compare with ContentReferenceComparer.IgnoreVersion

Moral of the story

  • It is not only what you do, but also how you do it
  • Small things can make big impacts, don’t guess, measure!

Performance optimization – the hardcore series – part 1

Hi again every body. New day – new thing to write about. today we will talk about memory allocation, and effect it has on your website performance. With .NET, memory allocations are usually overlooked because CLR handles that for you. Except in rare cases that you need to handle unmanaged resources, that you have to be conscious about releasing that said resources yourself, it’s usually fire and forget approach.

Truth is, it is more complicated than that. The more objects you created, the more memory you need, and the more time CLR needs to clean it up after you. When you might have written code that is executed blazing fast in your benchmarks, in reality, your website might still struggle to perform well in long run – and that’s because of Garbage collection. Occasional GC is not of a concern – because it’s nature of .NET CLR, but frequent GC, especially Gen 2 GC, is definitely something you should look into and should fix, because it natively affects your website performance.

The follow up question – how do you fix that.

Of course, the first step is always measuring the memory allocations of your app. Locally you can use something like Jetbrains dotMemory to profile your website, but that has a big caveat – you can’t really mimic the actual traffic to your website. Sure, it is very helpful to profile something like a scheduled job, but it is less than optimal to see how your website performs in reality. To do that, we need another tool, and I’ve found nothing better than Application Insights Profiler trace on Azure. It will sample your website periodically, taking ETL ( event trace log) files in 220 seconds (Note, depends on your .NET version, you might download a .diagsession or a .netperf.zip file from Application Insights, but they are essentially the same inside (zipped .ETL)). Those files are extremely informative, they contains whole load of information which might be overwhelming if you’re new, but take small steps, you’ll be there.

To open a ETL file, common tool is Perfview (microsoft/perfview: PerfView is a CPU and memory performance-analysis tool (github.com)). Yes it has certain 2000 look like other analysis tool (remember Windbg), but it is fast, efficient, and gets the job done

Note that once extracted ETL can be very big – in 1GB or more range often. Perfview has to go through all that event log so it’s extremely memory hungry as well, especially if you open multiple ETL files at once. My perfview kept crashing when I had a 16GB RAM machine (I had several Visual Studio instances open), and that was solved when I switched to 32GB RAM

The first step is to confirm the allocation problems with GCStats (this is one of the extreme ones, but it does happen)

Two main things to look into – Total Allocs, i.e. the total size of objects allocated, and then the time spent in Garbage collection. They are naturally closely related, but not always. Total allocation might not be high but time for GC might be – in case of large objects allocation (we will talk about it in a later post). Then for the purpose of memory allocation analysis, this is where you should look at

What you find in there, might surprise you. And that’s the purpose of this series, point out possible unexpected allocations that are easy – or fairly easy – to fix.

In this first post, we will talk about a somewhat popular feature – Injected<T>.

We all know that in Optimizely Content/Commerce, the preferred way of dependency injection is constructor injection. I.e. if your class has a dependency on a certain type, that dependency should be declared as a parameter of the constructor. That’s nice and all, but not always possible. For example you might have a static class (used for extension methods) so no constructor is available. Or in some rare cases, that you can’t added a new parameter to the constructor because it is a breaking change.

Adding Injected<T> as a hidden dependency in your class is at least working, so can you forget about it?

Not quite!

This is how the uses of Injected<T> result in allocation of Structuremap objects – yes every time you call Injected<T>.Service the whole dependency tree must be built again.

And that’s not everything, during that process, other objects need to be created as well. You can right click on a path and select “Include item”. The allocations below are for anything that were created by `module episerver.framework episerver.framework!EPiServer.ServiceLocation.Injected1[System.__Canon].get_Service() i.e. all object allocations, related to Injected<T>

You can expand further to see what Injected<T>(s) have the most allocations, and therefore, are the ones should be fixed.

How can one fix a Injected<T> then? The best fix is to make it constructor dependency, but that might not always be possible. Alternative fix is to use ServiceLocator.GetInstance, but to make that variable static if possible. That way you won’t have to call Injected<T>.Service every time you need the instance.

There are cases that you indeed need a new instance every time, then the fix might be more complicated, and you might want to check if you need the whole dependency tree, or just a data object.

Moral of the story

  • Performance can’t be guessed, it must be measured
  • Injected<T> is not your good friend. You can use it if you have no other choice, but definitely avoid it in hot paths.

Delete orphaned assets

I was asked this question: we have about 3TB of assets, any way to clean it up.

These days, storage is cheap, but still not free. and big storage means you need space for back up. and with that, bandwidth and time.

Is there away to clean up things you no longer need?

Yes!

Optimizely Content already has a scheduled job named Remove Abandoned BLOBs, but this job only removes the blobs that have no content associated. I.e. the content is deleted by IContentRepository.Delete but the blob was left behind. The job uses the log to find out which content were deleted, then find those blobs.

How’s about the assets that still have contents associated with them, but not used anywhere? Time to get your hands dirty!

Due to the nature of this task, it is best to make it a scheduled job.

All of the assets are children under the global asset root. By iterating over them, we can check if each of them is being used by another content. If not, we will add them to a list for later delete. Before deleting the content, we will find the blob and then delete it as well. Easy, right?

To get the content recursively we use this little piece of code

        public virtual IEnumerable<T> GetAssetRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : MediaData
        {
            foreach (var folder in LoadChildrenBatched<ContentFolder>(parentLink, defaultCulture))
            {
                foreach (var entry in GetAssetRecursive<T>(folder.ContentLink, defaultCulture))
                {
                    yield return entry;
                }
            }

            foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
        {
            var start = 0;

            while (!_isStopped)
            {
                var batch = _contentRepository.GetChildren<T>(parentLink, defaultCulture, start, 50);
                if (!batch.Any())
                {
                    yield break;
                }
                foreach (var content in batch)
                {
                    // Don't include linked products to avoid including them multiple times when traversing the catalog
                    if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                    {
                        continue;
                    }

                    yield return content;
                }
                start += 50;
            }
        }

And we will start from SiteDefinition.Current.GlobalAssetsRoot, and use IContentRepository.GetReferencesToContent to see if it is used in any content (both CMS and Catalog). If not, we add it to a list. Later, we use IPermanentLinkMapper to see if it has any blob associated, and delete that as well

            foreach (var asset in GetAssetRecursive<MediaData>(SiteDefinition.Current.GlobalAssetsRoot, CultureInfo.InvariantCulture))
            {
                totalAsset++;
                if (!_contentRepository.GetReferencesToContent(asset.ContentLink, false).Any())
                {
                    toDelete.Add(asset.ContentLink.ToReferenceWithoutVersion());
                }

                if (toDelete.Count % 50 == 0)
                {
                    var maps = _permanentLinkMapper.Find(toDelete);
                    foreach (var map in maps)
                    {
                        deletedAsset++;
                        _contentRepository.Delete(map.ContentReference, true, EPiServer.Security.AccessLevel.NoAccess);
                        var container = Blob.GetContainerIdentifier(map.Guid);
                        //Probably redundency, can just delete directly
                        var blob = _blobFactory.GetBlob(container);
                        if (blob != null)
                        {
                            _blobFactory.Delete(container);
                        }
                        OnStatusChanged($"Deleting asset with id {map.ContentReference}");
                    }
                    toDelete.Clear();
                }
            }

We need another round of delete after the while loop to clean up the left over (or if we have less than 50 abandoned assets)

And we’re done!

Testing this job is simple – uploading a few assets to your cms and do not use it anywhere, then run the job. it should delete those assets.

Things to improve: we might want to make sure only assets that created more than a certain number of days ago are deleted. This allows editors to upload assets for later uses without having to use them immediately.

The code has been open sourced at vimvq1987/DeleteUnusedAssets: Delete unused assets from an Optimizely/Episerver site (github.com) , and I have uploaded a nuget package to Packages (optimizely.com) to be reviewed.

Delete a content – directly from database

Before we even start, I would reiterate that manipulating data directly should be avoided unless absolutely necessary, it should be used as the last resort, and should be proceeded with cautions – always back up first and test your queries on development database first before running it in production. And if the situation dictates that you have to run the query, better do it with the 4 eyes principle – having a colleague double check it for you. When it comes to production database, nothing is too careful.

Now back to the question, if you absolutely have to delete a content, you should do like this

exec editDeletePage @pageId = 123, @ForceDelete = 1

It is basically what Content Clouds (i.e. CMS) does under the hood, without the cache validation on the application layer of course.

So the moral of the story – do everything with API if you can. If you absolutely have to, use the built-in stored procedures – they are tested vigorously and should have minimal issues/bugs, and should take care of everything, data-wise for you. Only write your own query if there is no SP that can be used.

Update: Initially I mentioned Tomas’ post in this, and that gave impression his way is incorrect. I should have written better. My apologies to Tomas

RedirectToAction is dead, long live RedirectToContent

In .NET 4.8/CMS 11.x and earlier, this is very commonly used to redirect an action

return RedirectToAction("Index", new{ node = contentLink });

Which will redirect the user to

public TResult Index(CheckoutPage currentPage );

and you will get the currentPage parameter set with content you specified by the contentLink.

However in .NET 5 once redirected currentPage will be null. It’s due to how .NET 5 handle the routing. The correct way is use this

return RedirectToContent(currentPage.ContentLink, "Index");

There is a action you can use – RedirectToContent. Note that the order of parameter is reserved – you pass in the content link to the content first, then the name of the action.

And that’s how’s it done in .NET 5/CMS 12.

Where to store big collection data

No, I do not mean that big, big data (in size of terabytes or more). It’s big collection, like when you have a List<string> and it has more than a few hundreds of items. Where to store it?

Naturally, you would want to store that data as a property of a content. it’s convenient and it just works, so you definitely can. But the actual question is: should you?

It’s as simple as this

public virtual IList<String> MyBigProperty {get;set;}

But under the hood, it’s more than just … that. Let’s ignore UI for a moment (rendering such long list is a bad UX no matters how you look at it, but you can simply ignore rendering that property by appropriate attributes), and focus on the backend aspects of it.

List<T> properties are serialized as a long strings, and save at once to database. If you have a big property in your content, this will happen every time you load your content:

  • The data must be read from database, then transferred through the network
  • The data must be parsed to create an array (the underlying data structure of List<T>. The original string is tossed away.
  • Now you have a big array that you might not use every time. it’s just there taking your previous LOH (as with the original string)

Same thing happens when you actually save that property

  • The data must be serialized as string, the List<T> is now tossed away
  • The data then must be transferred through the network
  • The data then saved to database. Even though it is a very long string and you changed, maybe 10 characters, it’s completely rewritten. Due to its size, there might be multiple page writes needed.

As you can see, it can create a lot of waste, especially if you rarely use that property. To make the matter worse, due to the size of the property, it means they are taking up space in LOH (large objects heap).

And imagine if you have such properties in each and every of your content. The waste is multiplied, and your site is now at risk of some frequent Gen 2 Garbage collection. Nobody likes visiting a website that freezes (if not crashes) once every 30 minutes.

Then when to store such big collection data?

The obvious answer is … somewhere else. Without other inputs, it’s hard to give you some concrete suggestions, but how’s about a normalized custom table? You have the key as the content reference, and the other column is each value of the list. Just an idea. Then you only load the data when you absolutely need it. More work, yes, but it’s the better way to do it.

Just a reminder that whatever you do, just stay away from DDS – Dynamic Data Store. It’s the worst option of all. Just, don’t 🙂

The hidden gotcha with IObjectInstanceCache

It’s not a secret that cache is one of the most, if not the most, important factors to their website performance. Yes cache is great, and if you are using Optimizely Content/Commerce Cloud, you should be using ISynchronizedObjectInstanceCache to cache your objects whenever possible.

But caching is not easy. Or rather, the cache invalidation is not easy.

To ensure that you have effective caching strategy, it’s important that you have cache dependencies, i.e. In principles, there are two types of dependencies:

  • Master keys. This is to control a entire cache “segment”. For example, you could have one master key for the prices. If you needs to invalidate the entire price cache, just remove the master key and you’re done.
  • Dependency keys. This is to tell cache system that your cache item depends on this or that object. If this or that object is invalidated, your cache item will be invalidated automatically. This is particularly useful if you do not control this or that object.

ISynchronizedObjectInstanceCache allows you to control the cache dependencies by CacheEvictionPolicy . There are a few ways to construct an instance of CacheEvictionPolicy, from if the cache expiration will be absolute (i.e. it will be invalidated after a fixed amount of time), or sliding (i.e. if it is accessed, its expiration will be renewed), to if your cache will be dependent on one or more master keys, and/or one or more dependency keys, like this

   
        /// <summary>
        /// Initializes a new instance of the <see cref="CacheEvictionPolicy"/> class.
        /// </summary>
        /// <param name="cacheKeys">The dependencies to other cached items, idetified by their keys.</param>
        public CacheEvictionPolicy(IEnumerable<string> cacheKeys)

        /// <summary>
        /// Initializes a new instance of the <see cref="CacheEvictionPolicy"/> class.
        /// </summary>
        /// <param name="cacheKeys">The dependencies to other cached items, idetified by their keys.</param>
        /// <param name="masterKeys">The master keys that we depend upon.</param>
        public CacheEvictionPolicy(IEnumerable<string> cacheKeys, IEnumerable<string> masterKeys)

The constructors that takes master keys and dependency keys look pretty the same, but there is an important difference/caveat here: if there is no dependency key already existing in cache, the cache item you are inserting will be invalidated (i.e. removed from cache) immediately. (For master keys, the framework will automatically add an empty object (if none existed) for you.)

That will be some unpleasant surprise – everything seems to be working fine, no error whatsoever. But, if you look closely, your code seems to be hitting database more than it should. But other than that, your website performance is silently suffering (sometimes, not “silently”)

This is an easy mistake to make – I did once myself (albeit long ago) in an important catalog content cache path. And I saw some very experienced developers made the same mistake as well (At this point you might wonder if the API itself is to blame)

Take away:

  • Make sure you are using the right constructor when you construct an instance of CacheEvictionPolicy . Are you sure that the cache keys you are going to depend on, actually exist?

In newer version of CMS, there would be a warning in log if the cache is invalidated immediately, however, it could be missed, unless you are actively looking for it.

Note that this behavior is the same with ISynchronizedObjectInstanceCache as it extends IObjectInstanceCache.

Potential performance issue with Maxmind.db

From time to time, I have to dig into some customers’ profiler traces to figure out why their site is slow (yeah, if you follow me, you’d know that’s kind of my main job). There are multiple issues that can eat your website performance for breakfast, from loading too much content, to unmaintained database indexes. While my blog does not cover everything, I think you can get a good grasp of what mistakes to avoid.

But sometimes the problem might come from a 3rd party library/framework. It’s not new, as we have seen it with A curious case of memory dump diagnostic: How Stackify can cause troubles to your site – Quan Mai’s blog (vimvq1987.com). The problem with those types of issues is that they are usually overlooked.

The library we’ll be investigating today would be Maxmind.db. To be honest, I’ve never used it my own, but it seems to be a very popular choice to geography-map the visitors. It’s usually used by Optimizely sites for that purpose, using VisitorGroup (which is why it came under my radar).

For several sites that use it, it seems more often than not stuck in this stack

It’s essentially to think that CreateActivator is doing something heavy here (evidently with the LambdaCompiler.Compile part. A peek from decompiling actually shows that yes, it’s heavy. I’m not quite sure I can post the decompiled code here without violating any agreement (I did, in fact, accepted no agreement at this point), but it’s quite straightforward code: TypeActivatorCreator uses reflection to get the constructors of the Type passed to it, to sees if there is any constructor decorated with MaxMind.Db.Constructor attribute, then prepares some parameters, and creates an LambdaExpression that would create an instance of that Type, using found constructor (which is a good thing because a compiled expression would perform much better than just a reflection call).

(I’m using Mindmax.db 2.0.0, for the record)

The code is straightforward, but it is also slow – as any code which involves reflection and lambda compilation would be. The essential step would be to cache any result of this. This is actually a very good place to cache. The number of types are fixed during runtime (except for very edge cases where you dynamically create new types), so you won’t have to worry about cache invalidation. The cache would significantly improve the performance of above code.

And in TypeActivatorCreator there is a cache for it. It is a simple ConcurrentDictionary<Type, TypeActivator> , which would return an TypeActivator if the Type was requested before, or create one and cache it, it it hasn’t been. As I said, this is a very good place to add cache to this.

There is a cache for that, which is good. However, the very important tidbit here is that the dictionary is not static. That means, the cache only works, if the class is registered as Singleton (by itself, or by another class down the dependency chain), meaning, only one of the instance is created and shared between thread (which is why the ConcurrentDictionary part is important).

But except it’s not.

When I look at a memory dump that collected for a customer that is using Maxmind.db, this is what I got:

0:000> !dumpheap -stat -type TypeAcivatorCreator
Statistics:
MT Count TotalSize Class Name
00007ffa920f67e0 1 24 MaxMind.Db.TypeAcivatorCreator+<>c
00007ffa920f6500 147 3528 MaxMind.Db.TypeAcivatorCreator
Total 148 objects

So there were 147 instances of TypeAcivatorCreator. Note that this is only the number of existing instances. There might be other instances that were disposed and garbaged by CLR.

Now it’s clear why it has been performing bad. For supposedly every request, a new instance of TypeActivatorCreator is created, and therefore its internal cache is simply empty (it is just newly created, too). Therefore each of request will go through the expensive path of CreateActivator, and performance suffers.

The obvious fix here is to make the dictionary static, or making the TypeActivatorCreator class Singleton. I don’t have the full source code of Mindmax.Db to determine which is better, but I’d be leaning toward the former.

Moral of the story:

  • Caching is very, very important, especially when you are dealing with reflection and lambda compilation
  • You can get it right 99%, but the 1% left could still destroy performance.

Update:

I reached out to Maxmind.db regarding this issue on November 9th, 2021

About 6h later they replied with this

I was at first confused, then somewhat disappointed. It is a small thing to fix to improve overall performance, rather than relying on/expecting customers to do what you say in documentation. But well, let’s just say we have different opinions.

Don’t share HttpContext between threads

HttpContext is not designed to be thread-safe, which means you can run into nasty things if you share a HttpContext between thread. For example, this is one of the things that can happen: One (or more) thread is stuck in this:

mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)+ed
EPiServer.Web.Routing.Segments.Internal.RequestSegmentContext.GetOrResolveContextMode(System.Web.HttpContextBase)+9b
EPiServer.DataAbstraction.Internal.ProjectResolver.IsInEditMode()+17
EPiServer.DataAbstraction.Internal.ProjectResolver.GetCurrentProjects()+17
EPiServer.Core.Internal.ProjectPipeline.Pipe(EPiServer.Core.ContentReference, System.Globalization.CultureInfo, EPiServer.Core.LoaderOptions)+1c
EPiServer.Core.Internal.ProviderPipelineImplementation.GetItem(EPiServer.Core.ContentProvider, EPiServer.Core.ContentReference, EPiServer.Core.LoaderOptions)+109
EPiServer.Core.Internal.DefaultContentLoader.TryGet[[System.__Canon, mscorlib]](EPiServer.Core.ContentReference, EPiServer.Core.LoaderOptions, System.__Canon ByRef)+11e
[[StubHelperFrame]]
EPiServer.Core.Internal.DefaultContentLoader.Get[[System.__Canon, mscorlib]](EPiServer.Core.ContentReference, EPiServer.Core.LoaderOptions)+63

If you don’t know why this is scary, this is any infinite loop, meaning your CPU will be spent 100% in to this Dictionary.FindEntry, unable to do anything else. The only way to solve this problem is to restart the instance.

That is caused by unsafe accessing of a Dictionary – if you have a thread that is enumerating it, and another thread trying to write to it, it is possible to run into a dead end like this.

And HttpContext just happens to have many Dictionary properties and sub-properties. HttpContext.Request.RequestContext.RouteData.DataTokens is one of them (And a reason the code above ended in a disaster), making it vulnerable for this kind of problem. Which is exactly why it is not recommended to share a HttpContext between threads.

Sometimes, you can just set HttpContext.Current to null. Sometimes, you need to take a step back and ask yourself that do you really need to run things in parallel?