The downside of being too fast

Today when I was tracking down some changes, I came across this commit comment

The bugfix for COM-xxxx seems to make the importing
of metafields too fast and causing too many events raised,
potentially flood the event system. This avoids raising the events
unnecessarily.

Who wrote this? Me, almost 5 years ago. I did have a chuckle in my head, for my younger self being brutally honest (I was the one who fixed the original bug, so I was responsible for making it “too fast”.). Because it was sending too many events in a short time, the event system was overwhelmed and some other events (like cache invalidation of other parts) did not get through.

Well, the bug fix makes it even faster by reducing the number of events raised. In the end, everyone was happy.

Moral of the story:

The system can only be as fast as the slowest component. Keep that in mind when you put your effort. In most cases, the slowest part is where you should spend your time.
It is important to validate any optimization is end-to-end. Faster is not always better if some other parts (that part) could not keep up with this part becomes unexpectedly faster.
You can’t predict everything, something will happen in the most unexpected way. Move fast, break things, learn from it, fix it. Rinse and Repeat.

Solving the mystery of high memory usage

Sometimes, my work is easy, the problem could be resolved with one look (when I’m lucky enough to look at where it needs to be looked, just like this one Varchar can be harmful to your performance – Quan Mai’s blog (vimvq1987.com) ). Sometimes, it is hard. Can’t count number of times that I stared blankly at the screen, and decided I’d better take a nap, roast a batch of coffee, or take a walk (that is lying, however, I don’t walk), because I’m out of idea and this is going nowhere. The life of a software diagnostic engineer is like that, sometimes you are solving the mystery of “what do I need to solve this mystery”. There are usually more dots scattered around in all places, your job is to figure out which dots make senses, which dots do not, and how to connect those that are relevant to solve the problem, and to tell a story.

The story today is about a customer complaining about their scheduled instance on DXP keeps having high memory after running Find indexing job. They have a custom job that was built to optimize performance for their language settings, but the idea is the same – load content, serialize it and send it to the server endpoint for indexing. It is, indeed a memory heavy job, especially when you have a lot of content that needs to be indexed (basically, number of content x number of languages x the complexity of the content). It is normal to have an increase in memory usage during such job – the application (or rather, the runtime, depending on which way you look at it) is doing it job – content needs to be loaded in memory, and if there is available memory it will be a huge waste if it is not used for something useful. And the application will not immediately release that memory, as the content is cached. The memory will only be reclaimed only if the cache expired, or the application has memory pressure (i.e. it asks the operating system for more memory and the OS refuses “there is nothing left”). Even if the cache is expired, the application will not always compact and release the memory back to the OS (LOH etc.)

Now what is problematic is that the customer application retains 25GB of memory for indefinitely. They waited for 24h but the memory usage is still high. The application appears to be fine, it does not crash because of memory issues (like Out of Memory), but it causes confusion s and worries to our customer. Game’s on.

One thing that does not make senses in this case is that even thought they have a custom index job, it is still a scheduled job. And for scheduled jobs, the contents are supposed to have a very short sliding expiration time (default to 1 minute). However, the cache entries in the memory dumps tell a different story. A majority of the cache entries have 12h sliding expiration time. Which does explain – in part at least – why the memory remains high. When you have a longer sliding time, chance is higher that the cache is hit at least once before it expires, which reset the expiration. If you have sufficient hit, the cache will effectively remain in memory forever, until you actively evict it (by editing the content for example)

0000753878028910                        0.77kb          0                           12:00:00                    2/16/2024 5:58:43 AM +00:00    EPPageData:601596:en__CatalogContent
0000753878029DC0                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1345603:es-pr__CatalogContent
00007538781C7F48                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351986:es-pr__CatalogContent
00007538781C8058                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1346230:es-pr__CatalogContent
00007538781C8168                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351988:es-pr__CatalogContent
00007538786FA8E8                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049433:no__CatalogContent
00007538786FC598                        0.78kb          0                           12:00:00                    2/16/2024 9:32:28 AM +00:00    EPPageData:1088026:es-pr__CatalogContent
00007538786FD9E0                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049435:no__CatalogContent
0000753878700770                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029725:da__CatalogContent
0000753878706528                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1351990:es-pr__CatalogContent
0000753878706638                        0.78kb          0                           12:00:00                    2/16/2024 2:59:39 PM +00:00    EPPageData:1350104:es-pr__CatalogContent
00007538787A2F80                        0.77kb          0                           12:00:00                    2/16/2024 8:14:53 AM +00:00    EPPageData:1049439:no__CatalogContent
00007538787A3FD0                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029729:da__CatalogContent
00007538787A6B48                        0.77kb          0                           12:00:00                    2/16/2024 7:52:53 AM +00:00    EPPageData:1029731:da__CatalogContent
00007538787A74C0                        0.77kb          0                           12:00:00                    2/16/2024 6:21:34 AM +00:00    EPPageData:690644:en__CatalogContent
00007538787A9CC8                        0.78kb          0                           12:00:00                    2/16/2024 5:43:57 AM +00:00    EPPageData:181410:cs-cz__CatalogContent
00007538787ACDD8                        0.82kb          0                           12:00:00                    2/16/2024 2:17:38 PM +00:00    EPPageData:1343746__CatalogContent
00007538787ACFF8                        0.83kb          0                           12:00:00                    2/16/2024 2:17:25 PM +00:00    EPPageData:1343746:en__CatalogContent
00007538787AE658                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1350160:da__CatalogContent
00007538787AE768                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1350162:da__CatalogContent
00007538787AEA98                        0.39kb          0                           00:00:00                    2/16/2024 2:17:38 PM +00:00    EPiAnc:ContentAssetAware1343745__CatalogContent
00007538787AF058                        0.77kb          0                           12:00:00                    2/16/2024 2:59:37 PM +00:00    EPPageData:1347560:da__CatalogContent
00007538787B29A0                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329806:da__CatalogContent
00007538787B2E68                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329808:da__CatalogContent
00007538787B31E8                        0.77kb          0                           12:00:00                    2/16/2024 2:17:07 PM +00:00    EPPageData:1329810:da__CatalogContent

It is not what it should be, however, as the default value for sliding expiration timeout of a content loaded by a scheduled job is 1 minute – i.e. it is considered to be load once and be done item. Was it set to 12h by mistake. Nope

Timeout is set to 600.000.000 ticks which is 60 second, which is the default value.

I have been pulling my hairs over this for quite a while. What if the cache entries were not added by the scheduled job, but by some other way not affected by the limitation of scheduled job? In short, we were deceived by customer’s statement regarding Find indexing job. It was merely a victim of same issue. It was resetting the last access to the cache entry but that’s about it.

Time to dig a bit more. While Windbg is extremely powerful, it does not let you know where is the code that load a specific content into cache (not unless you catch it red handed). So the only way to know is to look around and check if there are any suspicious call the IContentLoader.GetItems or IContentLoader.GetChildren . A colleague of mine worked with the customer to obtain their source code, and another deep dive.

Fortunately for us, the customer has a custom built Find indexer we helped to built in a previous problem, and that was shown in the search for GetItems. It struck me that it could be the culprit. The job itself is … fine, however it was given wrong data so it keeps loading content to index.

If my hypothesis is correct, then these things must be true:

The app’s memory usage will raise to 25GB regardless of the indexing job running or not. And it remains there without much fluctuation
There are a lot of row in tblFindIndexQueue

It turned out both of those were correct: there were more than 4 millions of rows in tblFindIndexQueue, and this is the memory consumption of the app over 24 hours

One we figured out the source of content loading, the fix was pretty straightforward. One thing we could do from our side is to shorten caching time of content loaded by the event-driven indexer. You should upgrade to Find 16.2.0 which contains the fix for FIND-12436 which is a nice improvement for memory usage.

Moral of story:

I’m a workaholic. I definitely should not work on weekends, but sometimes I need to because that’s when my mind is clearest
Keep looking. But as always, know when to give up and admit defeat
Take breaks. Long, shorts. Refresh your mind and look at different angles.
The sliding cache expiration time can be quite unexpected. if a content is already in cache with long sliding expiration, then a cache hit (via ISynchronizedObjectInstanceCache.ReadThrough to get that content with short sliding expiration will not change that value, only refresh the last access time, and vice versa)

Fix your Search & Navigation (Find) indexing job, please

Once upon a time, a colleague asked me to look into a customer database with weird spikes in database log usage. (You might start to wonder why I am always the one who looks into weird things, is there a pattern here)

Upon reviewing the query store, I noticed a very high logical reads related to tblScheduledItem. From past experience, it was likely because of fragmentation of indexes in this table (which has only one clustered). I did a quick look at the table, and confirmed the index indeed has high fragmentation. I suggested to do a rebuild of that index and see what happen. Well, it could have been one of the daily simple quick questions, and I almost forgot about it.

A few days passed, the colleague pinged me again. Apparently they rebuilt it but it does not really help. That raised my eye brows a little bit, so I dug deeper.

To my surprised, the problem was not really fragmentation (it definitely contributed). The problem is that the column has a column of type nvarchar(max) , and it’s for recording the last text from the Execute method of the scheduled job. It was meant for something short like “The job failed successfully”, or “Deleted 12345 versions from 1337 contents”. But because it’s nvarchar(max) it could be very, very long. You can, in theory, store the entire book content of a a few very big libraries in there.

Of course because of you can, does not mean you should. When your column is long, each read from the table will be a burden to SQL Server. And the offending job was nothing less than our S&N indexing job.

In theory, any job could have caused this issue, however it’s much more likely happen with S&N indexing job for a reason – it keeps track of every exception thrown during indexing process, and because it indexes each and every content of your website (except the ones you specifically, explicitly tell it not too), the chance of its running into a recurring issue that affects multiple (reads, a lot) of content is much higher than any built-in job.

I asked, this time, to trim the column, and most importantly, fix any exceptions that might be thrown during the index. I was on my day off when my colleague noticed me that the job is running for 10h without errors, as they fixed it. Curious, so I did check some statistic. Well, let those screenshots speak for themselves:

The query itself went from 16,000ms to a mere 2.27ms. Even better, each call to get the list of scheduled job before results in 3.5GB logical reads. Now? 100KB. A lot of resource saved!

So, make sure your job is not throwing a lot of errors. And fix your S&N indexing job.

P/S: I do think S&N indexing job should have simpler return result. Maybe “Indexed 100.000 content with 1234 errors”, and the exceptions could have been logged. But that’s debatable. For now, you can do your parts!

Varchar can be harmful to your performance

As string is the most common data type in an application, nvarchar and its variant varchar are probably the most common column types in your database. (We almost always use nvarchar because nchar is meant for fixed length columns which we don’t have). The difference is that nvarchar has encoding of UTF-16/USC-2 while varchar has UTF-8

Starting with SQL Server 2012 (11.x), when a Supplementary Character (SC) enabled collation is used, these data types store the full range of Unicode character data and use the UTF-16 character encoding. If a non-SC collation is specified, then these data types store only the subset of character data supported by the UCS-2 character encoding.

But varchar can be harmful in a way that you don’t expect it to. Let’s assume we have this simple table with two columns (forgive naming, I can’t come up with better names)

CREATE TABLE [dbo].[Demo](
	[varcharColumn] [varchar](50) NULL,
	[nvarcharColumn] [nvarchar](50) NULL
)

Each will be inserted with same random values, almost unique. We will add a non clustered index on each of these columns, and as we know, the index should be very efficient on querying based on those values.

Let’s try with out varchar column first. It should work pretty well right. Nope!

SELECT *
  FROM dbo.[Demo]
  where varcharColumn = N'0002T9'

Instead of a highly efficient Index seek, it does an Index scan on the entire table. This is of course not what you want to.

But, why? Good question. You might have noticed that I used N’0002T9′ which is a nvarchar type – which is what .NET would pass to your query if your parameter is of type string. If you look closer to the execution plan, you’ll see that SQL Server has to do a CONVERT_IMPLICIT on each row of this column, effectively invalidates the index.

If we pass ‘0002T9’ without the notion though, it works as it should, this can cause the confusion as it works during development, but once deployed it is much slower

To see the difference we can run the queries side by side. Note that this is for a very simple table with 130k rows. If you have a few millions or more rows, the difference will be even bigger.

(1 row affected)
Table 'Demo'. Scan count 1, logical reads 4, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.


(1 row affected)
Table 'Demo'. Scan count 1, logical reads 422, physical reads 0, page server reads 0, read-ahead reads 14, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

What’s about the vice versa? If we have data as nvarchar(100) but the parameter is passed as varchar ? SQL Server can handle it with ease. It simply converts the parameters to nvarchar and does an index seek, as it should

So moral of the story? Unless you have strong reasons to use varchar (or char ), stick with nvarchar (or nchar ) to avoid complications with data type conversion which can, and will hurt your database performance.

Migrate Catalog content properties

A colleague asked me yesterday – how do we migrate properties of catalog content. There is, unfortunately, no official way to do it. There are several unofficial ways to do it, however. Today we will explore the way I personally recommend – for its safety and backward compatible.

Let’s say we have FashionProduct with a MSRP property with type of Money, now we would want to change it to Decimal . There are a some hacky ways to do this, but all of them require direct database manipulation which we should try to avoid – if possible.

First we will need this piece of code. it was “stolen” from a colleague of mine and has been used for countless times. You probably want to bookmark it as it’ll likely be useful in the future (I should probably do so myself as I have to find it every time I need). It is a snippet to traverse the catalog structure based on the content type you’d want.

public virtual IEnumerable<T> GetEntriesRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : EntryContentBase
    {
        foreach (var nodeContent in LoadChildrenBatched<NodeContent>(parentLink, defaultCulture))
        {
            foreach (var entry in GetEntriesRecursive<T>(nodeContent.ContentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
        {
            yield return entry;
        }
    }

    private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
    {
        var start = 0;

        while (true)
        {
            var batch = _contentLoader.GetChildren<T>(parentLink, defaultCulture, start, 50);
            if (!batch.Any())
            {
                yield break;
            }

            foreach (var content in batch)
            {
                // Don't include linked products to avoid including them multiple times when traversing the catalog
                if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                {
                    continue;
                }

                yield return content;
            }
            start += 50;
        }
    }

To make sure we don’t load to many content at once, the batch is set size 50 but that is of course configurable (up to you)!

Now the fun part, where it actually does the work. Once we have the content, we will need to actually migrate the data, it is can be simple as this

private void MigrateProperty<T>(IEnumerable<T> contents) where T: EntryContentBase
{
      var batch = new List<T>();
      foreach(var content in contents)
      {
           var writeableClone = content.CreateWriteableClone<T>();
           Transform(writeableClone);
           batch.Add(writeableClone);
      }
      _contentRepository.Publish(batch, PublishAction.SyncDraft);
}

With the Transform method you can do whatever you want with the property value. As you might just want to rename it – it can do nothing except assign value to the new property. Or in the case we mentioned at the beginning, convert Money to Decimal is an easy task (Money is the less precision version of Decimal). Note that if you convert between data types, for example from double to int , there are potential data loss, but you are probably aware of that already.

The final step is to publish the change. For performance reasons, it is probably the best that you the Publish extension method of IContentRepository and save multiple content in one batch – may of of size 50 or 100. Those will skip things like creating new versions for optimal performance. You can read it about here New simple batch saving API for Commerce | Optimizely Developer C

The remaining question is where to put it. In a perfect world, I’d say in a migration step (i.e. a class that implement IMigrationStep ), so you ensure that your data will be properly migrated before anything else run, for example your new code that access the new property, or indexing of your content after migration. But if you have a sizeable catalog, this will take time and it might not be a good idea to let your users wait for it to complete. For that, it makes senses to do this in a schedule job and when it completes, you make a switch.

Migrating properties is not an easy or quick task, but it can be done with relative ease. It also reminds us about modeling – try to get it right from beginning so we don’t have to migrate. In the end, the fastest code is the code that does not need to be run!

Switching away from serializable cart mode

If you are using Optimizely Customized Commerce, the common wisdom is that you should be using serializable cart mode. It’s not perfect (we discussed the drawbacks in, um, my book), but generally it has performance benefits. But for any reason that you need to use the legacy cart mode, there is a switch you can use – IFeatureSwitch which can be used to change between those modes

It is important to remember that IFeatureSwitch only switch between to modes, it does not migrate any carts between those two. there is a one time cart migration job but that’s it.

To change the cart mode, it is simply as this

GetInstance<IFeatureSwitch>().DisableFeature(SerializedCarts.FeatureSerializedCarts);

However, there is a catch here.

To let IOrderRepository use the correct cart system, there is an ICartProvider which will be either CartProvider or SerializableCartProvider . The problem is that happens much earlier in the pipeline than IInitializationModule. In fact it is determined in IConfigurableModule.ConfigureContainer , which means before any IInitializationModule.Initialize. Even if we call DisableFeatures in another ConfigureContainer, there is no warranty that our code will be called before the framework code (the order of ConfigureContainer execution is indeterministic )

But fortunately, we can do that inside Startup.Configure. Due to how the feature switch data structure, it’s not as simple as adding a setting in appsettings.json, but it can be done easily in code:

            services.Configure<ApplicationOptions>(x =>
            {
                x.Features[SerializedCarts.FeatureSerializedCarts] = new ApplicationFeature
                {
                    Feature = "SerializedCarts",
                    State = FeatureState.Disabled,
                    Type = "Mediachase.Commerce.Core.Features.SerializedCarts, Mediachase.Commerce"
                };
            });

Of course, this is a workaround. The feature switch should be done as documented. It will be fixed in the future.

Command timeout for Commerce 14

While we always want to have fast database queries, it is not doable all the time. Sometimes we need to run slow queries, and we need to tell the underlying framework that this query can take some time to complete, and we’re fine with it. Otherwise, it will try to terminate the query after 30 seconds (the default time out limit)

There is a different between connection timeout and command timeout. Connection timeout is the time .NET will try to connect to the database before giving up. Command timeout is the time .NET will try to execute a command before giving up.

With Commerce 13, we have a setting added in 9.23.1, as we talked here Episerver Commerce commandTimeout configuration – Quan Mai’s blog (vimvq1987.com) , however, in Commerce 14, it’s … different.

Things are a bit complicated when it comes to command timeout with .NET 5 and up. With later versions of Microsoft.Data.SqlClient, it is possible to set command timeout directly using connection string. It is indeed a simple way to do it, but with a caveat.

The new setting is not recognized by Entity Framework/Entity Framework Core, and it will throw exception if you try to access a connection string with command timeout setting. It has another way to set the command timeout itself by each DbContext , but it does not accept the setting via Connection string. It will throw “Keyword not supported: ‘command timeout'” if such setting is present.

The workaround is to configure the command timeout for EcfSqlConnection connection string, and another different connection string without command timeout just for Entity Framework.

However, that’s with a caveat: using command timeout in connection string means that value applies to all database queries. As we discussed in the previous post above, doing so is not without drawbacks – it hides slow queries rather than let it fails. A failed query might not bring down your website, but an overloaded database will likely do.

In Commerce 14.15.6 which should be released shortly, we introduce a new setting SqlDataProviderOptions.CommandTimeout which let you set the command timeout for queries that are using SqlDataProvider – most notably the MetaDataPlus system like orders.

The important advantage of the new setting is that you can set it on the fly. If you know that some operation will be slow, you can set the command timeout to a higher value just for that operation, then set it back to default value. In most cases, you can leave it to default value (30 seconds), and do optimization on application level (reduce batch size for example) or database layer (rewrite the query, adding/optimizing indexes etc.). But sometimes you know the query would be slow and you acknowledge that – this is the tool.

Performance optimization – the hardcore series – part 4

Let’s take a break from the memory allocation, and do some optimization on another aspect, yet as important (if not even more important) – database.

We all know that database queries play an essential part in any serious app. It’s almost a given that if you want your app to perform well, your database queries must also perform well. And for them to perform well, you need things like proper design (normalization, references etc.), properly written queries, and proper indexes. In this post, we will explore how an index can improve query performance, and how can we do it better.

Let’s start with a fairly simple table design

CREATE TABLE [dbo].[UniqueCoupon](
	[Id] [int] identity primary key clustered, 
	[PromotionId] [int] NOT NULL,
	[Code] [nvarchar](10) NOT NULL,
	[ExpiredOn] [datetime] NULL,
	[Redeemed] [bit] NULL
) ON [PRIMARY]

Nothing extraordinary here, pretty common if you ask me. Now for testing purpose, let’s insert 1.000.000 rows into it

INSERT INTO  dbo.[UniqueCoupon] (PromotionId, Code)
SELECT

FLOOR(RAND()*(100)+1),
SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 7)

GO 1000000

We need to query data by the code, so let’s create an user defined type

CREATE TYPE CouponTable AS TABLE (
    Code NVARCHAR(10));

Time to run some query against data, let’s go with this

SELECT Id, PromotionId, Code, ExpiredOn, Redeemed FROM dbo.UniqueCoupons
                                                                    WHERE PromotionId = @PromotionId AND Code in (SELECT Code FROM @Data)

This is the complete query as we need some data

	declare @data CouponTable
	insert into @data 
	select top 10 code from dbo.UniqueCoupon 
	where promotionid = 36

	SELECT Id, PromotionId, Code, ExpiredOn, Redeemed FROM dbo.UniqueCoupon
                                                                    WHERE PromotionId = 36 AND Code in (SELECT Code FROM @Data)

As we learned that execution plan is not a good way to compare performance, let’s use the statistics, our trusted friends

																	set statistics io on
																	set statistics time on

And this is how it takes with our default setting (i.e. no index)

(10 rows affected)
Table '#AEDEED61'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'UniqueCoupon'. Scan count 9, logical reads 7070, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

If you are somewhat experienced with SQL Server, you might guess it would not be exactly happy because of, obviously an index is needed. As we query on PromotionId, it does makes sense to add an index for it, SQL Server does give you that

If we just blindly add the index suggested by SQL Server

(10 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'UniqueCoupon'. Scan count 1, logical reads 53, physical reads 0, page server reads 0, read-ahead reads 5, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#A7AA9B2B'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

But can we do better?

If we look at the index, there’s something not very optimized about it – we are query by both PromotionId and Code, so not really makes senses to have Code as included. How’s about we have the index on both PromotionId and Code?

(10 rows affected)
Table 'UniqueCoupon'. Scan count 10, logical reads 30, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#A1F9F38F'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Yet we can make it better! From 53 to 30 logical reads might not sound a lot, but if you have thousands of queries every hour, it will be fairly significant.

Prepare yourself for some pleasant surprises – when we eventually applied the change on an actual database, the change was staggering, much more than what we hoped for. The query that were run for 24h in total, every day, now takes less than 10 minutes (yes you read it right, 10 minutes).

At this point you can certainly be happy and move on. But can we do better? For the sake of curiosity ? Yes we do.

SQL Server is rather smart that it knows we are getting the other columns as well, so those will be included in the index, to avoid a key lookup. Let’s see if we can remove that and see how it performs

(10 rows affected)
Table 'UniqueCoupon'. Scan count 10, logical reads 60, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#B1996E94'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

So it was indeed worse, a key lookup is performed for every row (SQL Server uses the index to track down the rows and read the other columns from there)

There are two way to get rid of those key lookup – includes the columns in the index itself, or, more dramatic, make the index the clustered. As we can see the data should be accessed by PromotionId and Code, it makes perfect senses.

It is a commonly belief that Identity column should be clustered index – it is unique, it is not null. However, it only makes senses if it is the most heavily accessed column. In this case, Id only serves as an Identity column, it does not need to be the clustered index (although being an unique means it will has a non clustered index for it)

ALTER TABLE [dbo].[UniqueCoupon] DROP CONSTRAINT [PK__UniqueCo__3214EC0744C2FF38] WITH ( ONLINE = OFF )
GO

ALTER TABLE [dbo].[UniqueCoupon] ADD PRIMARY KEY NONCLUSTERED 
(
	[Id] ASC
)

Does this bring dramatically performance change? Unlikely. My test show no improvement in statistic. However, there is one critical impact here: we significantly reduced the size of indexes in the table. (data incoming)

Moral of the story

Indexes are crucial.
You can almost always do better than the auto suggested indexes.
Real test is the only true validation.

Performance optimization – the hardcore series – part 3

“In 99% of the cases, premature optimization is the root of all devil”

This quote is usually said to be from Donald Knuth, usually regarded as “father of the analysis of algorithms”. His actual quote is a bit difference

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%.

If you have read my posts, you know that I always ask for measuring your application before diving in optimization. But that’s not all of the story. Without profiling, your optimization effort might be futile. But there are things you can “optimize” right away without any profiling – because – they are easy to do, they make your code simpler, easier to follow, and you can be certain they are faster.

Let’s see if you can spot the potential problematic piece of code from this snippet

public Something GetData()
{
var market = list.FirstOrDefault(x => x.MarketId == GetCurrentMarket().MarketId)
{
//do some stuffs
}

}

If you are writing similar code, don’t be discouraged. It’s easy to overlook the problem – when you call FirstOrDefault, you actually iterate over the list until you find the first matching element. And for each and every of that, GetCurrentMarket() will be called.

Because we can’t be sure when we will find the matching element, it might be the first element, or the last, or it does not exist, or anywhere in between. The median is that GetCurrentMarket will be half called half the size of list

We don’t know if GetCurrentMarket is a very lightweight implementation, or list is a very small set, but we know that if this is in one very hot path, the cost can be (very) significant. These are the allocations made by said GetCurrentMarket

This is a custom implementation of IMarketService – the default implementation is much more lightweight and should not be of concern. Of course, fewer calls are always better – no matter how quick something is.

In this specific example, a simple call to get the current market and store it in a local variable to be used in the scope of the entire method should be enough. You don’t need profiling to make such “optimization” (and as we proved, profiling only confirm our suspect )

Moral of the story

For optimization, less is almost always, more
You definitely should profile before spending any considerable amount optimizing your code. But there are things that can be optimized automatically. Make them your habit.

Performance optimization – the hardcore series – part 2

Earlier we started a new series about performance optimization, here Performance optimization – the hardcore series – part 1 – Quan Mai’s blog (vimvq1987.com) . There are ton of places where things can go wrong. A seasoned developer can, from experience, avoid some obvious performance errors. But as we will soon learn, a small thing can make a huge impact if it is called repeatedly, and a big thing might be OK to use as long as it is called once.

Let’s take this example – how would you think about this snippet – CategoryIds is a list of string converted from ContentReference

            if (CategoryIds.Any(x => new ContentReference(x).ToReferenceWithoutVersion() == contentLink))
            {
                //do stuff
            }

If this is in any “cool” path that run a few hundred times a day, you will be fine. It’s not “elegant”, but it works, and maybe you can get away with it. However, if it is in a hot path that is executed every time a visitor visits a product page in your website, it can create a huge problem.

And can you guess what it is?

new ContentReference(string) is fairly lightweight, but if it is called a lot, this is what happen. This is allocations from the constructor alone, and only within 220 seconds of the trace

A lot of allocations which should have been avoided if CategoryIds was just an IEnumerable<ContentReference> instead of IEnumerable<string>

For comparison, this is how 10.000 and 1000.000 new ContentReference would allocate

Thing is similar if you use .ToReferenceWithoutVersion() to compare to another ContentReference (although to a lesser extend as ToReferenceWithoutVersion would return the same ContentReference if the WorkId is 0, and it use cloning instead of new). The correct way to compare two instances of ContentReference without caring about versions, is to use .Compare with ContentReferenceComparer.IgnoreVersion

Moral of the story

It is not only what you do, but also how you do it
Small things can make big impacts, don’t guess, measure!