Yesterday, I was working on a database that was having some performance
issue. I asked to one of team member that what type of problem they are facing in their database. He told me that some time query takes a lot of time to give result.
Although we have already created indexes but still it takes much time,
looks like there is something else which is degrading the performance of these
queries. I asked him to expand a table and then index related to same in SSMS. He
expanded a table and said see We have non-clustered index for this table on ID
field. But when we try to run any query related to this table, it takes so
much time. He ran a query and it was taking time.
I
told him that it’s not what you are thinking about Non-clustered Index. I
explained him about Non-clustered index using execution plans that how non-clustered
index work. And that’s where I got a topic for my blog post:
Although
there are so many differences between clustered index and Non-Clustered Index
but the point on which I’m going to focus is:
For
a clustered index, the leaf page of the index is the same as the data page of
the table. Therefore, when reading the values of the clustered index key
columns, the database engine can also read the values of other columns without
any navigation from the index row.
But
in case of Non-clustered index, it has only information about the column on
which it is created. So, if a query refers a column which is not part of
non-clustered index then it has to navigate from a non-clustered index row to the
corresponding data row in the clustered index or the heap. This process is
known as Bookmark Lookup or RID lookups.
So
this is all about lookups which makes query slower in case of non-clustered
index. If we have smaller number of rows in result then it will not having much
impact but non-clustered index becomes ineffective as the number of rows
retrieved increases.
Let’s
create the environment then we will examine the same:
Use master
Go
Create database
NCI
GO
use NCI
GO
CREATE TABLE
TEST1 (ID INT IDENTITY ,NAME CHAR(40), AGE INT)
GO
INSERT INTO
TEST1 VALUES ('VIMAL',34)
GO 199000
CREATE NONCLUSTERED
INDEX ID ON
TEST1 (ID)
Now
let’s execute the below query then we will see the number of logical read and execution plan for same: Include the actual execution plan by pressing ctrl+M
SET STATISTICS IO ON -- Will display the number of logical and physical reads
SELECT * FROM TEST1 WHERE ID=5
Number of logical reads: 6
Execution
Plan:
You
can see the output columns of these lookups. Just put your mouse cursor over
this lookup as below:
So
it is clear from the screenshot that the output column from this lookup is Name
and Age (From total of 3 columns). As we have NON-CLUSTERED INDEX on column ID
so it has the information only about ID. For other column it has to navigate to
heap/cluster. So the 50% of total cost is used in just getting the other
columns except ID. i.e. Name and Age. This is just an example as there is 50-50
% cost. But in production you can see the situation where all the cost is taken
by these lookups.
Let’s
create a cluster index on this table then we will see that this lookup gets
changed or not.
CREATE
CLUSTERED INDEX
Index_Age ON TEST1 (Age)
Now
let’s execute the same query again:
SET STATISTICS IO ON -- Will display the number of logical and physical reads
SELECT * FROM TEST1 WHERE ID=5
Let’s
see the execution plan:
From
above screen shot we can see that RID lookup has been changed to Key Lookup. So, it
is clear now that if we do not have any cluster index on table then it will do
RID lookup.
Now run the below query. Please note that now we are only selecting the ID column
SET STATISTICS IO ON -- Will display the number of logical and physical reads
SELECT ID FROM
TEST1 WHERE ID=5
Number of Logical reads: 3
Execution Plan:
Index
seek with no look-up operator. Here we need only ID and this NON-CLUSTERED INDEX
has information about this. So this does not required any look-up.
Resolving Lookups:
There are different
ways in which you can resolve these lookups which are as follows:
Using a Clustered
Index:
You can read about this on MSDN. But
just want to mention here that there may be situation where we have already
clustered index and we cannot change as it’s very critical.
Using a Covering
Index
Using a Covering
Index Using Include
Using an Index Join
You
can go through above methods to resolve lookups. I will post about same in
future post. For now in our example we will use INCLUDE columns in our NON-CLUSTERED
INDEX and then will see the plan for same:
DROP INDEX
[Index_id] ON [dbo].[TEST1]
WITH ( ONLINE = OFF )
GO
USE [NCI]
GO
CREATE NONCLUSTERED
INDEX [Index_id] ON
[dbo].[TEST1]
(
[ID] ASC
)
INCLUDE
( [NAME],
[AGE])
GO
Now
lets execute the first query and see number of logical read and the query plan
:
SET STATISTICS IO ON -- Will display the number of logical and physical reads
SELECT * FROM TEST1 WHERE ID=5
Number
of logical read is reduced to 3 from 6. It was a test environment, in
production these number will be so different.
Execution Plan:
So, this was the answer to my friend who was
asking "why my query is running slow although I’m having Non-Clustered index on
that table". This all was because of lookups.
The lookup step associated with a non-clustered index can make data
retrieval through a non-clustered index very costly. The SQL Server optimizer
takes this into account when generating an execution plan, and if it finds the
overhead cost of using a non-clustered index to be very high, it discards the index
and performs a table scan (or a clustered index scan if the table is stored as
a clustered index).
Therefore,
to improve the effectiveness of a non-clustered index, it makes sense to
analyze the cause of a lookup and consider whether you can avoid it completely
by adding fields to the index key or to the INCLUDE column (or index join) and
creating a covering index.
Thanks For Reading This Blog Post!!!