As string is the most common data type in an application, nvarchar and its variant varchar are probably the most common column types in your database. (We almost always use nvarchar because nchar is meant for fixed length columns which we don’t have). The difference is that nvarchar has encoding of UTF-16/USC-2 while varchar has UTF-8
Starting with SQL Server 2012 (11.x), when a Supplementary Character (SC) enabled collation is used, these data types store the full range of Unicode character data and use the UTF-16 character encoding. If a non-SC collation is specified, then these data types store only the subset of character data supported by the UCS-2 character encoding.
But varchar can be harmful in a way that you don’t expect it to. Let’s assume we have this simple table with two columns (forgive naming, I can’t come up with better names)
Each will be inserted with same random values, almost unique. We will add a non clustered index on each of these columns, and as we know, the index should be very efficient on querying based on those values.
Let’s try with out varchar column first. It should work pretty well right. Nope!
SELECT *
FROM dbo.[Demo]
where varcharColumn = N'0002T9'
Instead of a highly efficient Index seek, it does an Index scan on the entire table. This is of course not what you want to.
But, why? Good question. You might have noticed that I used N’0002T9′ which is a nvarchar type – which is what .NET would pass to your query if your parameter is of type string. If you look closer to the execution plan, you’ll see that SQL Server has to do a CONVERT_IMPLICIT on each row of this column, effectively invalidates the index.
If we pass ‘0002T9’ without the notion though, it works as it should, this can cause the confusion as it works during development, but once deployed it is much slower
To see the difference we can run the queries side by side. Note that this is for a very simple table with 130k rows. If you have a few millions or more rows, the difference will be even bigger.
What’s about the vice versa? If we have data as nvarchar(100) but the parameter is passed as varchar ? SQL Server can handle it with ease. It simply converts the parameters to nvarchar and does an index seek, as it should
So moral of the story? Unless you have strong reasons to use varchar (or char ), stick with nvarchar (or nchar ) to avoid complications with data type conversion which can, and will hurt your database performance.
Let’s take a break from the memory allocation, and do some optimization on another aspect, yet as important (if not even more important) – database.
We all know that database queries play an essential part in any serious app. It’s almost a given that if you want your app to perform well, your database queries must also perform well. And for them to perform well, you need things like proper design (normalization, references etc.), properly written queries, and proper indexes. In this post, we will explore how an index can improve query performance, and how can we do it better.
Let’s start with a fairly simple table design
CREATE TABLE [dbo].[UniqueCoupon](
[Id] [int] identity primary key clustered,
[PromotionId] [int] NOT NULL,
[Code] [nvarchar](10) NOT NULL,
[ExpiredOn] [datetime] NULL,
[Redeemed] [bit] NULL
) ON [PRIMARY]
Nothing extraordinary here, pretty common if you ask me. Now for testing purpose, let’s insert 1.000.000 rows into it
INSERT INTO dbo.[UniqueCoupon] (PromotionId, Code)
SELECT
FLOOR(RAND()*(100)+1),
SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 7)
GO 1000000
We need to query data by the code, so let’s create an user defined type
CREATE TYPE CouponTable AS TABLE (
Code NVARCHAR(10));
Time to run some query against data, let’s go with this
SELECT Id, PromotionId, Code, ExpiredOn, Redeemed FROM dbo.UniqueCoupons
WHERE PromotionId = @PromotionId AND Code in (SELECT Code FROM @Data)
This is the complete query as we need some data
declare @data CouponTable
insert into @data
select top 10 code from dbo.UniqueCoupon
where promotionid = 36
SELECT Id, PromotionId, Code, ExpiredOn, Redeemed FROM dbo.UniqueCoupon
WHERE PromotionId = 36 AND Code in (SELECT Code FROM @Data)
As we learned that execution plan is not a good way to compare performance, let’s use the statistics, our trusted friends
set statistics io on
set statistics time on
And this is how it takes with our default setting (i.e. no index)
If you are somewhat experienced with SQL Server, you might guess it would not be exactly happy because of, obviously an index is needed. As we query on PromotionId, it does makes sense to add an index for it, SQL Server does give you that
If we just blindly add the index suggested by SQL Server
If we look at the index, there’s something not very optimized about it – we are query by both PromotionId and Code, so not really makes senses to have Code as included. How’s about we have the index on both PromotionId and Code?
Yet we can make it better! From 53 to 30 logical reads might not sound a lot, but if you have thousands of queries every hour, it will be fairly significant.
Prepare yourself for some pleasant surprises – when we eventually applied the change on an actual database, the change was staggering, much more than what we hoped for. The query that were run for 24h in total, every day, now takes less than 10 minutes (yes you read it right, 10 minutes).
At this point you can certainly be happy and move on. But can we do better? For the sake of curiosity ? Yes we do.
SQL Server is rather smart that it knows we are getting the other columns as well, so those will be included in the index, to avoid a key lookup. Let’s see if we can remove that and see how it performs
So it was indeed worse, a key lookup is performed for every row (SQL Server uses the index to track down the rows and read the other columns from there)
There are two way to get rid of those key lookup – includes the columns in the index itself, or, more dramatic, make the index the clustered. As we can see the data should be accessed by PromotionId and Code, it makes perfect senses.
It is a commonly belief that Identity column should be clustered index – it is unique, it is not null. However, it only makes senses if it is the most heavily accessed column. In this case, Id only serves as an Identity column, it does not need to be the clustered index (although being an unique means it will has a non clustered index for it)
ALTER TABLE [dbo].[UniqueCoupon] DROP CONSTRAINT [PK__UniqueCo__3214EC0744C2FF38] WITH ( ONLINE = OFF )
GO
ALTER TABLE [dbo].[UniqueCoupon] ADD PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)
Does this bring dramatically performance change? Unlikely. My test show no improvement in statistic. However, there is one critical impact here: we significantly reduced the size of indexes in the table. (data incoming)
Moral of the story
Indexes are crucial.
You can almost always do better than the auto suggested indexes.