Mass update catalog entries

This is something you don’t do daily, but you will probably need one day, so it might come in handy.

Recently we got a question on how to update the code of all entries in the catalog. This is interesting, because even thought you don’t update the codes that often (if at all, as the code is the identity to identify the entries with external system, such as ERPs or PIMs), it raises a question on how to do mass update on catalog entries.

    • Update the code directly via database query. It is supposedly the fastest to do such thing. If you have been following my posts closely, you must be familiar with my note regarding how Episerver does not disclose the database schema. I list it here because it’s an option, but not the good one. It easily goes wrong (and cause catastrophes), you have to deal with versions and cache, and those can be hairy to get right. Direct data manipulation should be only used as the last resort when no other option is available.

Continue reading “Mass update catalog entries”

A curious case of SQL Server indexes

If you have been working with SQL Server, you would likely know the importance of indexes. Without proper indexes, your databases will not be able to utilize it’s true potential and your sites/applications will almost certainly succumb when the data size is big enough, and/or the load is high enough.

Of course you would not want to happen.

But adding a proper index is not an easy task. Finding a good index which is already a difficult task. Making the index works in 99.99% of the case is a real challenge.

It is even more so for a framework. We can’t really control what data is put in the table. We can enforce some rules, yes, but those are just not enough, and we certainly not want to limit the extensibility and flexibility of the system.

A good index has (or needs to have) several characteristics, and two of them are selectivity and distribution. The first one is quite obvious. Adding index to a “Status” column where you likely have 5 or 6 values across 100 thousands of rows is almost always a bad idea. That index does not really help the queries on that column, while introduces more overhead on insert, update and delete. But even if a column has a very good selectivity, it is still tricky when it comes to distribution.

Recently I was asked to help a customer with a performance problem (as usual). They saw a very high CPU usage on SQL Server instance, and ​ecfVersion_ListMatchingSegments is to blame.

This is how this SP looks like

Well, it’s indeed fairly long, but you just need to focus on this statement:

It looks innocent enough, right? Just a filter statement on an indexed column. What can be wrong about it?

It’s not something wrong with the statement, it’s something wrong with the index, or, more precisely, the distribution of the data in the column the index covers. As a matter of fact, SeoUriSegment allows null and empty values.

A mistake. And rather a big one.

The database I looked into has around 180k rows where SeoUriSegment is empty. And that’s disaster waiting to happen. In bad case, SQL Server will cache the plan which uses index scan on SeoUriSegment for every call to ecfVersion_ListMatchingSegments, which is much less effective than an index seek. In worst case, it will use the index seek on every call with SeoUriSegment is empty. 180k index seeks per query would bring your SQL Server to its knees and effectively kill your website performance.

If you recall, I talked about a more or less same problem before, here and here . You might be surprised, it was about the wrong order of join. Now it’s about the wrong kind of execution plan! But it’s all because of same root cause: SQL Server tries to be smart and helpful, by optimizing the query the way it sees the best, only to be fooled by the data (or the distribution of it), and choose a poorly optimized execution plan.

It’s smart, not just enough in every case.

Now a big question still remains: how can the problem be fixed?

Well, we always should fix problem at its root. In this case, the fault is in the data, so we should fix them by adding the real values to the empty/null rows. The best value for SeoUriSegment should be the normalized name, so a product named “Crochet Playsuit” should have SeoUriSegment to be “croched-playsuit”, while “Příliš žluťoučký kůň úpěl ďábelské ódy” should be “prilis-zlutoucky-kun-upel-dabelske-ody”

But fixing the data can take time, and if your site is affecting by the problem, it’s probably better to have an immediate solution, by turning on the RECOMPILE option for ecfVersion_ListMatchingSegments

Now what can we do to prevent future incidents?

The bug is fixed to ensure no empty value value will be added to SeoUriSegment when copying a not published content version. But Episerver is in a difficult position now. We can’t just go ahead and add a non null/empty constraint on that column, because that can (and will) blow up existing implementation. We can forcefully update the column with null/empty value in the upgrade script before enabling such constraint, but that would be a bold move.

Morals of the story:

  • An indexed column should not allow null or empty values. Yes there is no guarantee that the values can’t be duplicated multiple times, but in such case it’s easier to shift the blame to the ones who inserted the data ;).
  • Parameter sniffing is a real problem, big times. Watch out for it.

A curious case of SQL Server function

This time, we will talk about ecfVersion_ListFiltered, again.

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

Continue reading “A curious case of SQL Server function”

Loading carts in a load balancing environment

UPDATE 1: Apparently HttpContext.Current.Request.AnonymousID already uses the cookie internally, so there might be something that makes it stop working. I’ll update when I found out.

Today we received a support ticket as customers seeing corrupted carts data being lost – line items with invalid data, duplicated line items etc. “Corrupted data” is one of the alarming words that we take very seriously, so I decided to jump on it right away.

The setup is a load balancing environment, and the problem only happens with anonymous users. However, it can be “fixed” by turning on the sticky sessions mode. So basically, instead of having sessions on the memory of a server (so sessions on server A can’t be seen by server B, and vice versa), they need a mechanism (can be a database) to share sessions between servers.

Continue reading “Loading carts in a load balancing environment”

Announcing a new book: Episerver Commerce: A problem – solution approach

More than one year ago, I announced that I was working on a book – a first Episerver Commerce book ever. It has been a work in progress until recently – and I am still adding updates here and there. The book has received quite positive feedback (I’m happy to say that everyone is nice enough to not tell me “Your book sucks”. Thanks, everyone). Am I happy with it? Yes, of course, proud even.

But to be completely honest,

I know something was missing.

Continue reading “Announcing a new book: Episerver Commerce: A problem – solution approach”

The art of asking questions

This is the second part of a series about most important skills for developer. The first part, about searching for answer skill, can be read here.

Searching for the answer is usually the fastest way to solve a problem

But searching on Google might not be enough to find you the answers, you might be the first to encounter the problem, or you might be searching for the wrong keyword. Sometimes, you have to ask the questions, hoping that some one, some where does know about the problem, and is kind enough to spend some time reading your questions, and typing the answers.

For free.

Continue reading “The art of asking questions”

Why do games need more Elena (Fisher)

Elena’s a strong, independent woman, with her own motivations and thoughts. She’s indeed attractive (very attractive if you think about Uncharted 4), but in terms of being healthy and fit, not being overly sexy as a “fan service” (Japanese games, I’m looking at you)


She’s not some female characters who are over-confident, and/or over-powered, to the point they don’t even need men. She does not work alone. She works with Nathan Drake to overcome the odds.

Continue reading “Why do games need more Elena (Fisher)”

The most important skill (of a good programmer)

Being programmer(*) is hard.

Being a good programmer is, of course, even harder. Unlike countless other jobs where the daily work is a routine, and being good at your job is to be efficient at that routine, being programmer is all about constantly learning and doing new things. Being a good programmer is about being fast at learning, and doing new things well. The process might stay for a while, but the content of the job is constantly changing. (If you keep doing same content over and over again, you are doing it wrong)

Continue reading “The most important skill (of a good programmer)”

Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

Continue reading “Optimizing T-SQL COUNT”

Beware of unwanted subscriptions

Updated December 23rd 2017: My wife talked with JustFab UK over the phone last week, and they told her to write an email to them. They promised to refund all of the monthly subscriptions, which they did, today. We are of course happy to get our money back, and I think JustFab UK appears to less “scam-y” then they does before , so I updated this post title to reflect that. Still, beware of your unwanted subscriptions.

The original post as below:

Later tonight I was checking my bank statements – to see how much I have spent and how much I still have in my account – well, for the upcoming Holiday seasons, of course.

And this got my attention:

Normally I would assume this is my wife using my credit card buying something. She did that before, and I was unhappy about it, but she didn’t stop doing that. But there was something telling I might have seen this before, it must be some kind of Déjà vu.

So I checked a little further back, and realized there was a similar transaction last month

Continue reading “Beware of unwanted subscriptions”