A curious case of SQL Server indexes

If you have been working with SQL Server, you would likely know the importance of indexes. Without proper indexes, your databases will not be able to utilize it’s true potential and your sites/applications will almost certainly succumb when the data size is big enough, and/or the load is high enough.

Of course you would not want to happen.

But adding a proper index is not an easy task. Finding a good index which is already a difficult task. Making the index works in 99.99% of the case is a real challenge.

It is even more so for a framework. We can’t really control what data is put in the table. We can enforce some rules, yes, but those are just not enough, and we certainly not want to limit the extensibility and flexibility of the system.

A good index has (or needs to have) several characteristics, and two of them are selectivity and distribution. The first one is quite obvious. Adding index to a “Status” column where you likely have 5 or 6 values across 100 thousands of rows is almost always a bad idea. That index does not really help the queries on that column, while introduces more overhead on insert, update and delete. But even if a column has a very good selectivity, it is still tricky when it comes to distribution.

Recently I was asked to help a customer with a performance problem (as usual). They saw a very high CPU usage on SQL Server instance, and ​ecfVersion_ListMatchingSegments is to blame.

This is how this SP looks like

Well, it’s indeed fairly long, but you just need to focus on this statement:

It looks innocent enough, right? Just a filter statement on an indexed column. What can be wrong about it?

It’s not something wrong with the statement, it’s something wrong with the index, or, more precisely, the distribution of the data in the column the index covers. As a matter of fact, SeoUriSegment allows null and empty values.

A mistake. And rather a big one.

The database I looked into has around 180k rows where SeoUriSegment is empty. And that’s disaster waiting to happen. In bad case, SQL Server will cache the plan which uses index scan on SeoUriSegment for every call to ecfVersion_ListMatchingSegments, which is much less effective than an index seek. In worst case, it will use the index seek on every call with SeoUriSegment is empty. 180k index seeks per query would bring your SQL Server to its knees and effectively kill your website performance.

If you recall, I talked about a more or less same problem before, here and here . You might be surprised, it was about the wrong order of join. Now it’s about the wrong kind of execution plan! But it’s all because of same root cause: SQL Server tries to be smart and helpful, by optimizing the query the way it sees the best, only to be fooled by the data (or the distribution of it), and choose a poorly optimized execution plan.

It’s smart, not just enough in every case.

Now a big question still remains: how can the problem be fixed?

Well, we always should fix problem at its root. In this case, the fault is in the data, so we should fix them by adding the real values to the empty/null rows. The best value for SeoUriSegment should be the normalized name, so a product named “Crochet Playsuit” should have SeoUriSegment to be “croched-playsuit”, while “Příliš žluťoučký kůň úpěl ďábelské ódy” should be “prilis-zlutoucky-kun-upel-dabelske-ody”

But fixing the data can take time, and if your site is affecting by the problem, it’s probably better to have an immediate solution, by turning on the RECOMPILE option for ecfVersion_ListMatchingSegments

Now what can we do to prevent future incidents?

The bug is fixed to ensure no empty value value will be added to SeoUriSegment when copying a not published content version. But Episerver is in a difficult position now. We can’t just go ahead and add a non null/empty constraint on that column, because that can (and will) blow up existing implementation. We can forcefully update the column with null/empty value in the upgrade script before enabling such constraint, but that would be a bold move.

Morals of the story:

  • An indexed column should not allow null or empty values. Yes there is no guarantee that the values can’t be duplicated multiple times, but in such case it’s easier to shift the blame to the ones who inserted the data ;).
  • Parameter sniffing is a real problem, big times. Watch out for it.

A curious case of SQL Server function

This time, we will talk about ecfVersion_ListFiltered, again.

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

Continue reading “A curious case of SQL Server function”

Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

Continue reading “Optimizing T-SQL COUNT”

The art of paging

No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.

In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.

Continue reading “The art of paging”

Fixing a stored procedure

At Episerver development team, we understand the importance of good performance. Who would not like a lightning fast website? We work hard to ensure the framework is fast, and we seize (almost) every opportunity to make it faster.

You know in Commerce 10.2 we introduced a new cart mode – serializable cart, and it’s proven to bring great performance compared to the “old/traditional” approach. Our own tests showed an improvement of 3-5x times faster. But can it be even faster? Probably yes.

And actually we did some improvements in later versions. In the scope of this blog post, we will just focus into a specific aspect – and to learn a little more about SQL Server performance optimization.

Continue reading “Fixing a stored procedure”

Please, rebuild your database indexes, now

I will make it quick and to the point: if you are expecting a lot of customers visiting your site tomorrow (and you should) for Black Friday, you should rebuild your database indexes, now.

On average, it will help you to serve more customers and they will be happier with a more responsive, faster website. On best cases it will help prevent catastrophes.

Continue reading “Please, rebuild your database indexes, now”

A curious case of SQL execution plan, part 2

Recently I wrote about how to look into, identify and solve the problem with a SQL Server execution plan – as you can read here: https://vimvq1987.com/2017/10/curious-case-sql-execution-plan/

I have some more time to revisit the query now, and I realized I made a “small” mistake. The “optimized” query is using a Clustered Index Scan

So it’s not as fast as it should be, and it will perform quite poorly in no cache scenario (when the buffer is empty, for example) – it takes about 40s to complete. Yes it’s still better than the original one, both in non cached and cached cases. But it’s not good enough. An index scan, even cached, is not only slower, but also more prone to deadlocks. It’s also worse in best case scenario, when the original one can use the proper index seek.

Continue reading “A curious case of SQL execution plan, part 2”

A curious case of SQL execution plan

I said this, and I will repeat it here: SQL Server optimizer is smart, and I can even say that, generally, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart 🙂 ). So most of the cases, you leave it to do whatever it thinks is the best. But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.

Today I face one case like that, as reported here: http://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2017/10/database-timeout-on-productvariant-update/

(As a side note, I don’t answer direct questions, nor provide any personal support service (I would charge plenty for that 😉 ). I would appreciate if you go through World forums, or contact Episerver Developer support service. There are several reasons for that, including knowledge sharing, and work item tracking. I can make exceptions when I know the problem is highly urgent and is hurting your business, by jumping into it sooner than I’m expected to/before it’s escalated through several level of supports. But all in all, it should be registered with Deverloper support. We at development team are supposed to be the final line of support, not the front line. )

Continue reading “A curious case of SQL execution plan”

Maintaining your indexes

Indexes are crucial to SQL Server performance. Having the right indexes might make the difference of day and night with your application performance – as I once talked here.

However, even having the right indexes is not everything. You have to keep them healthy. Indexes, as any other kinds of storage, is subjected to fragmentation. SQL Server works best if the index structure is compact and continuous, but with all of the inserts/updates/deletes, it’s inevitable to get fragmented. When the fragmentation grows, it starts affecting the performance of SQL Server: Instead of having to read just one page, it now have to read two, which increases both time and resource needed, and so on and so forth.

Continue reading “Maintaining your indexes”

Import a bacpac to SQL Server

This is more of a note-to-self.

I sometimes have to import a bacpac file from customer’s database (usually from Azure SQL database) to my local machine – . For most of the time it’ll be very easy when the databases are in .bak format, but for .bacpac file it can be pretty complicated.

Sqlpackage.exe is the standard tool to import the .bacpac file, and it can be found with the installation of Visual Studio (for example C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\130) or SQL Server ( C:\Program Files (x86)\Microsoft SQL Server\130\DAC\bin ). Latest version should be used because they can support the older formats (.bacpac exported from older SQL Server version), but not the way around (older version might not support .bacpac files exported from newer SQL Server versions)

Continue reading “Import a bacpac to SQL Server”