Dynamic data store is slow, (but) you can do better.

If you have been developing with Episerver CMS for a while, you probably know about its embedded “ORM”, called Dynamic Data Store, or DDS for short. It allows you to define strongly typed types which are mapped to database directly to you. You don’t have to create the table(s), don’t have to write stored procedures to insert/query/delete data. Sounds very convenient, right? The fact is, DDS is quite frequently used, and more often than you might think, mis-used.

As Joel Spolsky once said Every abstraction is leaky, an ORM will likely make you forget about the nature of the RDBMS under neath, and that can cause performance problems, sometime severe problems.

Let me make it clear to you

DDS is slow, and it is not suitable for big sets of data.

If you want to store a few settings for your website, DDS should be fine. However, if you are thinking about hundreds of items, it is probably worth looking else. Thousands and more items, then it would be a NO.

I did spend some time trying to bench mark the DDS to see how bad it is. A simple test is to add 10.000 items to a store, then query by each item, then deleted by each item, to see how long does it take

The item is defined like this, this is just another boring POCO:

internal class ShippingArea : IDynamicData
{
    public Identity Id { get; set; }

    public string PostCode { get; set; }

    public string Area { get; set; }

    public DateTime Expires { get; set; }
}

The store is defined like this

    public class ShippingAreaStore
    {
        private const string TokenStoreName = "ShippingArea";

        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            GetStore().Save(token);
            return token;
        }

        internal virtual IEnumerable<ShippingArea> LoadAll()
        {
            return GetStore().LoadAll<ShippingArea>();
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return GetStore().Find<ShippingArea>(parameters);
        }

        internal virtual void Delete(ShippingArea shippingArea)
        {
            GetStore().Delete(shippingArea);
        }

        internal virtual ShippingArea Get(Identity tokenId)
        {
            return GetStore().Load<ShippingArea>(tokenId);
        }

        private static DynamicDataStore GetStore()
        {
            return DynamicDataStoreFactory.Instance.CreateStore(TokenStoreName, typeof(ShippingArea));
        }
    }

Then I have some quick and dirty code in QuickSilver ProductController.Index to measure the time (You will have to forgive some bad coding practices here ;). As usual StopWatch should be used on demonstration only, it should not be used in production. If you want a good break down of your code execution, use tools like dotTrace. If you want to measure production performance, use some monitoring system like NewRelic or Azure Application Insights ):

        var shippingAreaStore = ServiceLocator.Current.GetInstance<ShippingAreaStore>();
        var dictionary = new Dictionary<string, string>();
        for (int i = 0; i < 10000; i++)
        {
            dictionary[RandomString(6)] = RandomString(10);
        }
        var identities = new List<ShippingArea>();
        var sw = new Stopwatch();
        sw.Start();
        foreach (var pair in dictionary)
        {
            shippingAreaStore.CreateNew(pair.Key, pair.Value);
        }
        sw.Stop();
        _logger.Error($"Creating 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();
        foreach (var pair in dictionary)
        {
            Dictionary<string, object> parameters = new Dictionary<string, object>();
            parameters.Add("PostCode", pair.Key);
            parameters.Add("Area", pair.Value);
            identities.AddRange(shippingAreaStore.Find(parameters));
        }

        sw.Stop();
        _logger.Error($"Querying 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();

        foreach (var id in identities)
        {
            shippingAreaStore.Delete(id);
        }
        sw.Stop();
        _logger.Error($"Deleting 10000 items took {sw.ElapsedMilliseconds}");

Everything is ready. So a few tries gave us a fairly stable result:

2019-12-02 13:33:01,574 Creating 10000 items took 11938

2019-12-02 13:34:59,594 Querying 10000 items took 118009

2019-12-02 13:35:24,728 Deleting 10000 items took 25131

And this is strictly single-threaded, the site will certainly perform worse when it comes to real site with a lot of traffic, and then insert-query-delete at the same time, it will certainly be worse.

Can we do better?

There is a little better attribute that many people don’t know about DDS: you can mark a field as indexed, by adding [EPiServerDataIndex] attribute to the properties. The new class would look like this.

    [EPiServerDataStore]
    internal class ShippingArea : IDynamicData
    {
        public Identity Id { get; set; }

        [EPiServerDataIndex]
        public string PostCode { get; set; }

        [EPiServerDataIndex]
        public string Area { get; set; }

        public DateTime Expires { get; set; }
    }

If you peek into the database during the test, you can see that the data is now being written to Indexed_String01 and Indexed_String02 columns, instead of String01 and String02 as without the attributes. Such changes give us quite drastic improvement:

2019-12-02 15:38:16,376 Creating 10000 items took 7741

2019-12-02 15:38:19,245 Querying 10000 items took 2867

2019-12-02 15:38:44,266 Deleting 10000 items took 25019

The querying benefits greatly from the new index, as it no longer has to do a Clustered Index Scan, it can now do a non clustered index seek + Key look up. Deleting is still equally slow, because the delete is done by a Clustered Index delete on the Id column, which we already have, and the index on an Uniqueidentifier column is not the most effective one.

Before you are happy which such improvement, keep in mind that there are two indexes added for Indexed_String01 and Indexed_String02 separately. Naturally, we would want a combination, clustered even, on those columns, but we just can’t.

What if we want to go bare metal and create a table ourselves, write the query ourselves? Our repository would look like this

public class ShippingAreaStore2
    {
        private readonly IDatabaseExecutor _databaseExecutor;

        public ShippingAreaStore2(IDatabaseExecutor databaseExecutor)
        {
            _databaseExecutor = databaseExecutor;
        }

        /// <summary>
        /// Creates and stores a new token.
        /// </summary>
        /// <param name="blobId">The id of the blob for which the token is valid.</param>
        /// <returns>The id of the new token.</returns>
        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Add";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Id", token.Id.ExternalId));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", token.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", token.Area));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Expires", token.Expires));
                cmd.ExecuteNonQuery();
            });

            return token;
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return _databaseExecutor.Execute<IEnumerable<ShippingArea>>(() =>
            {
                var areas = new List<ShippingArea>();
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Find";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", parameters.Values.First()));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", parameters.Values.Last()));
                var reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    areas.Add(new ShippingArea
                    {
                        Id = (Guid)reader["Id"],
                        PostCode = (string)reader["PostCode"],
                        Area = (string)reader["Area"],
                        Expires = (DateTime)reader["Expires"]
                    });
                }
                return areas;
            });
        }

        /// <summary>
        /// Deletes a token from the store.
        /// </summary>
        /// <param name="token">The token to be deleted.</param>
        internal virtual void Delete(ShippingArea area)
        {
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Delete";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", area.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", area.Area));
                cmd.ExecuteNonQuery();
            });
        }
    }

And those would give us the results:

2019-12-02 16:44:14,785 Creating 10000 items took 2977

2019-12-02 16:44:17,114 Querying 10000 items took 2315

2019-12-02 16:44:20,307 Deleting 10000 items took 3190

Moral of the story? DDS is slow and you should be avoid using it if you are working with fairly big set of data. If you have to use DDS for whatever reason, make sure to at least try to index the columns that you query the most.

And in the end of the days, hand-crafted custom table + query beats everything. Remember that you can use some tools like Dapper to do most of the works for you.

Useful T-SQL snippets for development and troubleshooting

This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting

SET STATISTICS IO ON

Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.

SET STATISTICS TIME ON

Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total

(more…)

A curious case of SQL Server function

This time, we will talk about ecfVersion_ListFiltered, again.

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

declare @s [udttIdTable]
insert into @s values(6)
exec dbo.ecfVersion_ListFiltered @Statuses = @s, @StartIndex = 0, @MaxRows = 2147483646

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

(more…)

Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

SELECT COUNT(Id) FROM MySecretTable

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

(more…)

The art of paging

No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.

In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.

(more…)

Fixing a stored procedure

At Episerver development team, we understand the importance of good performance. Who would not like a lightning fast website? We work hard to ensure the framework is fast, and we seize (almost) every opportunity to make it faster.

You know in Commerce 10.2 we introduced a new cart mode – serializable cart, and it’s proven to bring great performance compared to the “old/traditional” approach. Our own tests showed an improvement of 3-5x times faster. But can it be even faster? Probably yes.

And actually we did some improvements in later versions. In the scope of this blog post, we will just focus into a specific aspect – and to learn a little more about SQL Server performance optimization.

(more…)

Please, rebuild your database indexes, now

I will make it quick and to the point: if you are expecting a lot of customers visiting your site tomorrow (and you should) for Black Friday, you should rebuild your database indexes, now.

On average, it will help you to serve more customers and they will be happier with a more responsive, faster website. On best cases it will help prevent catastrophes.

(more…)

A curious case of SQL execution plan, part 2

Recently I wrote about how to look into, identify and solve the problem with a SQL Server execution plan – as you can read here: http://vimvq1987.com/curious-case-sql-execution-plan/

I have some more time to revisit the query now, and I realized I made a “small” mistake. The “optimized” query is using a Clustered Index Scan

So it’s not as fast as it should be, and it will perform quite poorly in no cache scenario (when the buffer is empty, for example) – it takes about 40s to complete. Yes it’s still better than the original one, both in non cached and cached cases. But it’s not good enough. An index scan, even cached, is not only slower, but also more prone to deadlocks. It’s also worse in best case scenario, when the original one can use the proper index seek.

(more…)

A curious case of SQL execution plan

I said this already, and I will say it again: SQL Server optimizer is smart. I can even go further and say, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart 🙂 ). So most of the cases, you leave it to do whatever it thinks is the best.

But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.

Today I face one case like that, as reported here: http://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2017/10/database-timeout-on-productvariant-update/

(more…)

Maintaining your indexes

Indexes are crucial to SQL Server performance. Having the right indexes might make the difference of day and night with your application performance – as I once talked here.

However, even having the right indexes is not everything. You have to keep them healthy. Indexes, as any other kinds of storage, is subjected to fragmentation. SQL Server works best if the index structure is compact and continuous, but with all of the inserts/updates/deletes, it’s inevitable to get fragmented. When the fragmentation grows, it starts affecting the performance of SQL Server: Instead of having to read just one page, it now have to read two, which increases both time and resource needed, and so on and so forth.

(more…)