Dynamic data store is slow, (but) you can do better.

If you have been developing with Episerver CMS for a while, you probably know about its embedded “ORM”, called Dynamic Data Store, or DDS for short. It allows you to define strongly typed types which are mapped to database directly to you. You don’t have to create the table(s), don’t have to write stored procedures to insert/query/delete data. Sounds very convenient, right? The fact is, DDS is quite frequently used, and more often than you might think, mis-used.

As Joel Spolsky once said Every abstraction is leaky, an ORM will likely make you forget about the nature of the RDBMS under neath, and that can cause performance problems, sometime severe problems.

Let me make it clear to you

DDS is slow, and it is not suitable for big sets of data.

If you want to store a few settings for your website, DDS should be fine. However, if you are thinking about hundreds of items, it is probably worth looking else. Thousands and more items, then it would be a NO.

I did spend some time trying to bench mark the DDS to see how bad it is. A simple test is to add 10.000 items to a store, then query by each item, then deleted by each item, to see how long does it take

The item is defined like this, this is just another boring POCO:

internal class ShippingArea : IDynamicData
{
    public Identity Id { get; set; }

    public string PostCode { get; set; }

    public string Area { get; set; }

    public DateTime Expires { get; set; }
}

The store is defined like this

    public class ShippingAreaStore
    {
        private const string TokenStoreName = "ShippingArea";

        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            GetStore().Save(token);
            return token;
        }

        internal virtual IEnumerable<ShippingArea> LoadAll()
        {
            return GetStore().LoadAll<ShippingArea>();
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return GetStore().Find<ShippingArea>(parameters);
        }

        internal virtual void Delete(ShippingArea shippingArea)
        {
            GetStore().Delete(shippingArea);
        }

        internal virtual ShippingArea Get(Identity tokenId)
        {
            return GetStore().Load<ShippingArea>(tokenId);
        }

        private static DynamicDataStore GetStore()
        {
            return DynamicDataStoreFactory.Instance.CreateStore(TokenStoreName, typeof(ShippingArea));
        }
    }

Then I have some quick and dirty code in QuickSilver ProductController.Index to measure the time (You will have to forgive some bad coding practices here ;). As usual StopWatch should be used on demonstration only, it should not be used in production. If you want a good break down of your code execution, use tools like dotTrace. If you want to measure production performance, use some monitoring system like NewRelic or Azure Application Insights ):

        var shippingAreaStore = ServiceLocator.Current.GetInstance<ShippingAreaStore>();
        var dictionary = new Dictionary<string, string>();
        for (int i = 0; i < 10000; i++)
        {
            dictionary[RandomString(6)] = RandomString(10);
        }
        var identities = new List<ShippingArea>();
        var sw = new Stopwatch();
        sw.Start();
        foreach (var pair in dictionary)
        {
            shippingAreaStore.CreateNew(pair.Key, pair.Value);
        }
        sw.Stop();
        _logger.Error($"Creating 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();
        foreach (var pair in dictionary)
        {
            Dictionary<string, object> parameters = new Dictionary<string, object>();
            parameters.Add("PostCode", pair.Key);
            parameters.Add("Area", pair.Value);
            identities.AddRange(shippingAreaStore.Find(parameters));
        }

        sw.Stop();
        _logger.Error($"Querying 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();

        foreach (var id in identities)
        {
            shippingAreaStore.Delete(id);
        }
        sw.Stop();
        _logger.Error($"Deleting 10000 items took {sw.ElapsedMilliseconds}");

Everything is ready. So a few tries gave us a fairly stable result:

2019-12-02 13:33:01,574 Creating 10000 items took 11938

2019-12-02 13:34:59,594 Querying 10000 items took 118009

2019-12-02 13:35:24,728 Deleting 10000 items took 25131

And this is strictly single-threaded, the site will certainly perform worse when it comes to real site with a lot of traffic, and then insert-query-delete at the same time, it will certainly be worse.

Can we do better?

There is a little better attribute that many people don’t know about DDS: you can mark a field as indexed, by adding [EPiServerDataIndex] attribute to the properties. The new class would look like this.

    [EPiServerDataStore]
    internal class ShippingArea : IDynamicData
    {
        public Identity Id { get; set; }

        [EPiServerDataIndex]
        public string PostCode { get; set; }

        [EPiServerDataIndex]
        public string Area { get; set; }

        public DateTime Expires { get; set; }
    }

If you peek into the database during the test, you can see that the data is now being written to Indexed_String01 and Indexed_String02 columns, instead of String01 and String02 as without the attributes. Such changes give us quite drastic improvement:

2019-12-02 15:38:16,376 Creating 10000 items took 7741

2019-12-02 15:38:19,245 Querying 10000 items took 2867

2019-12-02 15:38:44,266 Deleting 10000 items took 25019

The querying benefits greatly from the new index, as it no longer has to do a Clustered Index Scan, it can now do a non clustered index seek + Key look up. Deleting is still equally slow, because the delete is done by a Clustered Index delete on the Id column, which we already have, and the index on an Uniqueidentifier column is not the most effective one.

Before you are happy which such improvement, keep in mind that there are two indexes added for Indexed_String01 and Indexed_String02 separately. Naturally, we would want a combination, clustered even, on those columns, but we just can’t.

What if we want to go bare metal and create a table ourselves, write the query ourselves? Our repository would look like this

public class ShippingAreaStore2
    {
        private readonly IDatabaseExecutor _databaseExecutor;

        public ShippingAreaStore2(IDatabaseExecutor databaseExecutor)
        {
            _databaseExecutor = databaseExecutor;
        }

        /// <summary>
        /// Creates and stores a new token.
        /// </summary>
        /// <param name="blobId">The id of the blob for which the token is valid.</param>
        /// <returns>The id of the new token.</returns>
        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Add";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Id", token.Id.ExternalId));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", token.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", token.Area));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Expires", token.Expires));
                cmd.ExecuteNonQuery();
            });

            return token;
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return _databaseExecutor.Execute<IEnumerable<ShippingArea>>(() =>
            {
                var areas = new List<ShippingArea>();
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Find";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", parameters.Values.First()));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", parameters.Values.Last()));
                var reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    areas.Add(new ShippingArea
                    {
                        Id = (Guid)reader["Id"],
                        PostCode = (string)reader["PostCode"],
                        Area = (string)reader["Area"],
                        Expires = (DateTime)reader["Expires"]
                    });
                }
                return areas;
            });
        }

        /// <summary>
        /// Deletes a token from the store.
        /// </summary>
        /// <param name="token">The token to be deleted.</param>
        internal virtual void Delete(ShippingArea area)
        {
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Delete";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", area.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", area.Area));
                cmd.ExecuteNonQuery();
            });
        }
    }

And those would give us the results:

2019-12-02 16:44:14,785 Creating 10000 items took 2977

2019-12-02 16:44:17,114 Querying 10000 items took 2315

2019-12-02 16:44:20,307 Deleting 10000 items took 3190

Moral of the story? DDS is slow and you should be avoid using it if you are working with fairly big set of data. If you have to use DDS for whatever reason, make sure to at least try to index the columns that you query the most.

And in the end of the days, hand-crafted custom table + query beats everything. Remember that you can use some tools like Dapper to do most of the works for you.

Fixing ASP.NET Membership performance – part 1

Even though it is not the best identity management system in the .NET world, ASP.NET Membership provider is still fairly widely used, especially for systems that have been running for quite long time with a significant amount of users: migrating to a better system like AspNetIdentity does not comes cheap. However, built from early days of ASP.NET mean Membership provider has numerous significant limitations: beside the “architecture” problems, it also has limited performance. Depends on who you ask, the ultimate “maximum” number of customers that ASP.NET membership provider can handle ranges from 30.000 to 750.000. That does not sound great. Today if you start a new project, you should be probably better off with AspNetIdentity or some other solutions, but if your website is using ASP.NET membership provider and there is currently no plan to migrate, then read on.

The one I will be used for this blog post has around 950.000 registered users, and the site is doing great – but that was achieved by some very fine grained performance tuning, and a very high end Azure subscription.

A performance overview 

I have been using ASP.NET membership provider for years, but I have never looked into it from performance aspects. (Even though I have done some very nasty digging to their table structure). And now I have the chance, I realize how bad it is.

It’s a fairly common seen in the aspnet_* tables that the indexes have ApplicationId as the first column. It does not take a database master to know it is a very ineffective way to create an index – in most of the cases, you only have on ApplicationId in your website, making those indexes useless when you want to, for example, query by UserId. This is a rookie mistake – a newbie tends to make order of columns in the index as same as they appear in the table, thinking, that that SQL Server will just do magic to exchange the order for the best performance. It’s not how SQL Server – or in general – RDBMS systems work.

It is OK to be a newbie or to misunderstand some concepts. I had the very same misconception once, and learned my lessons. However, it should not be OK for a framework to make that mistake, and never correct it.

That is the beginning of much bigger problems. Because of the ineffective order of columns, the builtin indexes are as almost useless. That makes the queries, which should be very fast, become unnecessarily slow, wasting resources and increasing your site average response time. This is of course bad news. But good news is it’s in database level, so we can change it for the better. It if were in the application level then our chance of doing that is close to none.

Missing indexes

If you use Membership.GetUserNameByEmail on your website a lot, you might notice that it is … slow. It leads to this query:

        SELECT  u.UserName
        FROM    dbo.aspnet_Applications a, dbo.aspnet_Users u, dbo.aspnet_Membership m
        WHERE   LOWER(@ApplicationName) = a.LoweredApplicationName AND
                u.ApplicationId = a.ApplicationId    AND
                u.UserId = m.UserId AND
                LOWER(@Email) = m.LoweredEmail

Let’s just ignore the style for now (INNER JOIN would be a much more popular choice), and look into the what is actually done here. So it joins 3 tables by their keys. The join with aspnet_Applications would be fairly simple, because you usually have just one application. The join between aspnet_Users and aspnet_Membership is also simple, because both of them have index on UserId – clustered on aspnet_Users and non-clustered on aspnet_Membership

The last one is actually problematic. The clustered index on aspnet_Membership actually looks like this

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([ApplicationId] ASC, [LoweredEmail] ASC);

Uh oh. Even if this contains LoweredEmail, it’s the worst possible kind of index. By using the least distinctive column in the first, it defeats the purpose of the index completely. Every request to get user name by email address will need to perform a full table scan (oops!)

This is a the last thing you want to see in a execution plan, especially with a fairly big table. 

It should have been just

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([LoweredEmail] ASC);

which helps SQL Server to use the optimal execution plan

If you look into Azure SQL Database recommendation, it suggest you to create a non clustered index on LoweredEmail. That is not technically incorrect, and it still helps. However, keep in mind that each non clustered index will have to “duplicate” the clustered index, for the purpose of identify the rows, so keeping the useless clustered index actually increases wastes and slows down performance (even just a little, because you have to perform more reads to get the same data). However, if your database is currently performing badly, adding a non clustered index is a much quicker and safer option. The change to clustered index should be done with caution at low traffic time.

Tested the stored procedure on database above, without any additional index

Table 'aspnet_Membership'. Scan count 9, logical reads 20101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 237 ms,  elapsed time = 182 ms.

With new non clustered index


(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 89 ms.

With new clustered index:

(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 89 ms.

Don’t we have a clear winner?

Speed up catalog routing if you have multiple children under catalog

A normal catalog structure is like this: you have a few high level categories under the catalog, then each high level category has a few lower level categories under it, then each lower level category has their children, so on and so forth until you reach the leaves – catalog entries.

However it is not uncommon that you have multiple children (categories and entries) directly under catalog. Even though that is not something you should do, it happens. 

But that is not without drawbacks. You might notice it is slow to route to a product. It might not be visible to naked eyes, but if you use some decent profilers (which I personally recommend dotTrace), it can be fairly obvious that your site is not routing optimally.

Why?

To route to a specific catalog content, for example http://commerceref/en/fashion/mens/mens-shirts/p-39101253/, the default router have to figure out which content is mapped to an url segment. So with default registration where the catalog root is the default routing root, we will start with the catalog which maps to the first part of route (fashion ). How do it figure out which content to route for the next part (mens ) ? 

Until recently, what it does it to call GetChildren on the catalog ContentReference . Now you can see the problem. Even with a cached result, that is still too much – GetChildren with a big number of children is definitely expensive.

We noticed this behavior, thanks to Erik Norberg. An improvement have been made in Commerce 12.10 to make sure even with a number of children directly under Catalog, the router should perform adequately efficient.

If you can’t upgrade to 12.10 or later (you should!), then you might have a workaround that improve the performance. By adding your own implementation of HierarchicalCatalogPartialRouter, you can override how you would get the children content – by using a more lightweight method (GetBySegment)

    public class CustomHierarchicalCatalogPartialRouter : HierarchicalCatalogPartialRouter
    {
        private readonly IContentLoader _contentLoader;

        public CustomHierarchicalCatalogPartialRouter(Func<ContentReference> routeStartingPoint, CatalogContentBase commerceRoot, bool enableOutgoingSeoUri) : base(routeStartingPoint, commerceRoot, enableOutgoingSeoUri)
        {
        }

        public CustomHierarchicalCatalogPartialRouter(Func<ContentReference> routeStartingPoint, CatalogContentBase commerceRoot, bool supportSeoUri, IContentLoader contentLoader, IRoutingSegmentLoader routingSegmentLoader, IContentVersionRepository contentVersionRepository, IUrlSegmentRouter urlSegmentRouter, IContentLanguageSettingsHandler contentLanguageSettingsHandler, ServiceAccessor<HttpContextBase> httpContextAccessor) : base(routeStartingPoint, commerceRoot, supportSeoUri, contentLoader, routingSegmentLoader, contentVersionRepository, urlSegmentRouter, contentLanguageSettingsHandler, httpContextAccessor)
        {
            _contentLoader = contentLoader;
        }

        protected override CatalogContentBase FindNextContentInSegmentPair(CatalogContentBase catalogContent, SegmentPair segmentPair, SegmentContext segmentContext, CultureInfo cultureInfo)
        {
            return _contentLoader.GetBySegment(catalogContent.ContentLink, segmentPair.Next, cultureInfo) as CatalogContentBase;
        }
    }

And then instead of using CatalogRouteHelper.MapDefaultHierarchialRouter , you register your router directly

 var referenceConverter = ServiceLocator.Current.GetInstance<ReferenceConverter>();
            var contentLoader = ServiceLocator.Current.GetInstance<IContentLoader>();
            var commerceRootContent = contentLoader.Get<CatalogContentBase>(referenceConverter.GetRootLink());
            routes.RegisterPartialRouter(new HierarchicalCatalogPartialRouter(startingPoint, commerceRootContent, enableOutgoingSeoUri));

(ServiceLocator is just to make it easier to understand the code. You should do this in an IInitializationModule, so use context.Locate.Advanced instead.

This is applicable from 9.2.0 and newer versions. 

Moral of the story:

  • Catalog structure can play a big role when it comes to performance.
  • You should do profiling whenever you can
  • We do that too, and we make sure to include improvements in later versions, so keeping your website up to date is a good way to tune performance.

Commerce batching performance – part 2: Loading prices and inventories

UPDATE: When looked into it, I realize that I have a lazy loading collection of entry codes, so each test had to spent time to resolve the entry code(s) from the content links. That actually costs quite a lot of time, and therefore causing the performance tests to return incorrect results. That was corrected and the results are now updated.

In previous post we talked about how loading orders in batch can actually improve your website performance, and we came to a conclusion that 1000-3000 orders per batch probably yields the best performance result.

But orders are not the only thing you would need to load on your website. A more common scenario is to load prices and inventories for product. So If you are displaying a product listing page, it’s quite common to load prices and inventories for all products in that page. How should it be loaded?

(more…)

Commerce batching performance – part 1: Loading orders

One of best practices for better performance – not just with Commerce or Episerver Commerce, is to batch your calls to load data. In theory, if you want to load a lot of data, loading by both end will be problematic: if you load each record one by one, the overhead for opening the connection and retrieve data will be too much. But if you load all of them, then it is likely that you will end up with either time out exception in database end, or out of memory exception in your application. The better way is to of course, loading them by smaller batch: either 10, 20, or 50 records at one and repeat until the end.

That is the theory, but is it really better in practice? And if it is, which size of batch works best? As they usually say, reality is the golden test for theory, so let’s do it.

(more…)

Index or no index, that’s the question

If you do (and you should) care about your Episerver Commerce site performance, you probably know that database access is usually the bottleneck. Allowing SQL Server works smoothly and effectively is a very important key to the great performance.

We are of course, very well aware of this fact, and we have spent a considerable amount of time making sure Commerce database works as fast as we could. Better table schema, better stored procedures, better indexes, … we have done all of that and will continue doing so when we have the chances. (And if you find anything that can be improved, you are very welcome to share your finding with us)

But there are places where the database performance improvement is in your hand.

(more…)

Useful T-SQL snippets for development and troubleshooting

This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting

SET STATISTICS IO ON

Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.

SET STATISTICS TIME ON

Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total

(more…)

Speed up your Catalog incremental indexing

As your products are being constantly updated, you would naturally want them to be properly (and timely) indexed – as that’s crucial to have the search results that would influence your customers into buying stuffs. For example, if you just drop the prices of your products , you would want those products to appear in new price segment as soon as possible.

This should be very easy with Find.Commerce – so if you are using Find (which you should) – stop reading, nothing for you here. Things, however, can be more complicated if you are using the more “traditional” SearchProvider.

(more…)

Mass update catalog entries

This is something you don’t do daily, but you will probably need one day, so it might come in handy.

Recently we got a question on how to update the code of all entries in the catalog. This is interesting, because even thought you don’t update the codes that often (if at all, as the code is the identity to identify the entries with external system, such as ERPs or PIMs), it raises a question on how to do mass update on catalog entries.

    • Update the code directly via database query. It is supposedly the fastest to do such thing. If you have been following my posts closely, you must be familiar with my note regarding how Episerver does not disclose the database schema. I list it here because it’s an option, but not the good one. It easily goes wrong (and cause catastrophes), you have to deal with versions and cache, and those can be hairy to get right. Direct data manipulation should be only used as the last resort when no other option is available.

(more…)

A curious case of SQL Server function

This time, we will talk about ecfVersion_ListFiltered, again.

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

declare @s [udttIdTable]
insert into @s values(6)
exec dbo.ecfVersion_ListFiltered @Statuses = @s, @StartIndex = 0, @MaxRows = 2147483646

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

(more…)