Ranking
Functions
Ranking
functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive
the same value as other rows.
T-SQL provides the following ranking
functions:
1)
RANK()
2)
DENSE_RANK()
3)
ROW_NUMBER()
4)
NTILE()
Now let’s
first discuss about Rank Function:
The Rank() Function:
It
assigns rank to each record starting with 1. If it encounters two or more
records to have the same ORDER BY <columns> values, it is said to be a
tie and all these records get the same rank.
When
it finds a tie it assigns the same rank to the all the records in the tie, but
still keeps incrementing the record counter so the next record will get a new
rank which would be (previous rank + no of records in the current tie + 1).
Syntax: RANK () OVER ([<partition_by_clause>]
<order_by_clause>)
OVER clause is required in all the ranking functions and
with that you specify the partitioning and ordering of records before the
ranking functions are evaluated. If you don't specify it, you will get an error
similar to "Incorrect syntax near 'ROW_NUMBER', expected 'OVER'." The
Over () clause allows you to combine aggregated and non aggregated data.
PARTITION BY clause is not mandatory and if you don't specify it
all the records of the result-set will be considered as a part of single record
group or a single partition and then ranking functions are applied.
Example:
In below figure we have 4 records in the current tie and the previous rank is
4. So, next Rank will be 4+4+1= 9.
We
have a table Rank_test in our Database. First let’s see all the records from
this table as follows using select statement:
Table
schema : CREATE TABLE Rank_Test (EmpId INT IDENTITY(1,1), NAME VARCHAR(23), salary MONEY);
We have 10 records in this
table.
Now let’s use the Rank()
Function and see the output:
SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Display FROM
dbo.Rank_Test
The Dense_Rank() Function:
The
Dense_Rank() Function is similar to Rank() function except when tied values are
present in result set. The difference between the RANK and DENSE_RANK functions
is in how values are assigned to rows following a tie. The Dense_Rank ()
count ties and uses sequential number sequencing (no skipping, no gaps). Dense ranking
closes the number gap caused by multiple row having the same rank number. Confused?????
Let’s see the
below figure then it will be more clear.
SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary)
AS Dense_Ranking FROM
dbo.Rank_Test
Syntax:
DENSE_RANK () OVER ([<partition_by_clause>]
<order_by_clause>)
From above
figure it is clear that there is no gap in case of dense_rank. But in case of
rank() some rank are skipped because of tie.
The Row_Number() Function:
Unlike Rank ()
and Dense_rank(), the Row_Number() function ignores ties in the result set. Because of this, the numbering sequence values of ROW_NUMBER( )
are all continuous
and unique ( no ties means no skipped numbers and no
repeat values).
Syntax: ROW_NUMBER () OVER ([<partition_by_clause>]
<order_by_clause>)
Let’s take a
view at below figure:
SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary) AS Dense_Ranking,
ROW_NUMBER()OVER(ORDER BY salary)
AS ROW_NUMBERR FROM
dbo.Rank_Test
So, this is
how we can use the function ROW_NUMBER ().
The NTILE( ) Function:
The NTILE ( )
is a SQL function that distributes the rows into a specific number of groups.
It ignores values, including ties. All records are handed in order and assigned
to their respective groups.
For NTILE( )
you must specify a lower number of groups/tiles/pieces than your record count.
If you are tiling 50 records then highest number of groups you can have is 50.
If groups don’t
divide it in equal with NTILE () then the lowest group have fewer members. For
example if you have 10 records and you perform an NTILE (3) then the first
group will have 4 records and the last group will have three records, you can
see this in below figure.
SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary) AS Dense_Ranking,
ROW_NUMBER()OVER(ORDER BY salary) AS ROW_NUMBERR ,
NTILE(3)OVER(ORDER BY salary) AS Tiling_data FROM dbo.Rank_Test
In this figure you have the result set for
all four function numbered as 1, 2 3 and 4.
Now we have completed all four functions.
One common interview questions is asked to
candidate and that is “ How to find the n’th highest salary”. Although there are so many ways you can find it and that are
using TOP or using Sub queries etc. Let’s see how can we solve this problem
using Rankin function.
Suppose we want to know the 5th
highest salary. To do that we will use Dense_Rank() function . Let’s see the
query for same:
SELECT * FROM (SELECT *, DENSE_RANK() OVER(ORDER BY salary)AS Highest_Salary FROM
dbo.Rank_Test)
AS RankS WHERE Highest_Salary=5
You can also create a store procedure for
same. Every time you just need to give the value for n and you will get the
result.
We can create a store procedure as follows:
CREATE PROCEDURE HIGH_SALARY @NUM INT
AS
BEGIN
SELECT * FROM (SELECT *, DENSE_RANK() OVER(ORDER BY salary)AS Highest_Salary
FROM dbo.Rank_Test) AS RankS WHERE
Highest_Salary=@NUM
END
Now Lets execute our store procedure that is HIGH_SALARY. Now we just have to
give the value for n. See the below figure for same.
Thanks For reading this article.
No comments:
Post a Comment