Wednesday, September 25, 2013

Ranking Functions


Ranking Functions
Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows.
T-SQL provides the following ranking functions:
1)    RANK()
2)    DENSE_RANK()
3)    ROW_NUMBER()
4)    NTILE()
Now let’s first discuss about Rank Function:

The Rank() Function:
It assigns rank to each record starting with 1. If it encounters two or more records to have the same ORDER BY <columns> values, it is said to be a tie and all these records get the same rank.

When it finds a tie it assigns the same rank to the all the records in the tie, but still keeps incrementing the record counter so the next record will get a new rank which would be (previous rank + no of records in the current tie + 1).

Syntax: RANK () OVER ([<partition_by_clause>] <order_by_clause>)

OVER clause is required in all the ranking functions and with that you specify the partitioning and ordering of records before the ranking functions are evaluated. If you don't specify it, you will get an error similar to "Incorrect syntax near 'ROW_NUMBER', expected 'OVER'." The Over () clause allows you to combine aggregated and non aggregated data.

PARTITION BY clause is not mandatory and if you don't specify it all the records of the result-set will be considered as a part of single record group or a single partition and then ranking functions are applied.

Example: In below figure we have 4 records in the current tie and the previous rank is 4. So, next Rank will be 4+4+1= 9.  

We have a table Rank_test in our Database. First let’s see all the records from this table as follows using select statement:

Table schema : CREATE TABLE Rank_Test (EmpId INT IDENTITY(1,1), NAME VARCHAR(23), salary MONEY);


We have 10 records in this table.

Now let’s use the Rank() Function and see the output:
SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Display FROM dbo.Rank_Test


The Dense_Rank() Function:
The Dense_Rank() Function is similar to Rank() function except when tied values are present in result set. The difference between the RANK and DENSE_RANK functions is in how values are assigned to rows following a tie. The Dense_Rank () count ties and uses sequential number sequencing (no skipping, no gaps). Dense ranking closes the number gap caused by multiple row having the same rank number.  Confused?????

Let’s see the below figure then it will be more clear.

SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary) AS Dense_Ranking FROM dbo.Rank_Test


Syntax:
DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>)
From above figure it is clear that there is no gap in case of dense_rank. But in case of rank() some rank are skipped because of tie.


The Row_Number() Function:
Unlike Rank () and Dense_rank(), the Row_Number() function ignores ties in the result set. Because of this, the numbering sequence values of ROW_NUMBER( ) are all continuous and unique ( no ties means no skipped numbers and no repeat values).

Syntax: ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>)

Let’s take a view at below figure:

SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary) AS Dense_Ranking,
ROW_NUMBER()OVER(ORDER BY salary) AS ROW_NUMBERR FROM dbo.Rank_Test


So, this is how we can use the function ROW_NUMBER ().

  

The NTILE( ) Function:
The NTILE ( ) is a SQL function that distributes the rows into a specific number of groups. It ignores values, including ties. All records are handed in order and assigned to their respective groups.

For NTILE( ) you must specify a lower number of groups/tiles/pieces than your record count. If you are tiling 50 records then highest number of groups you can have is 50.

If groups don’t divide it in equal with NTILE () then the lowest group have fewer members. For example if you have 10 records and you perform an NTILE (3) then the first group will have 4 records and the last group will have three records, you can see this in below figure.

SELECT *, RANK() OVER(ORDER BY salary) AS Rank_Function,
DENSE_RANK()OVER(ORDER BY salary) AS Dense_Ranking,
ROW_NUMBER()OVER(ORDER BY salary) AS ROW_NUMBERR ,
NTILE(3)OVER(ORDER BY salary) AS Tiling_data FROM dbo.Rank_Test


In this figure you have the result set for all four function numbered as 1, 2 3 and 4.
  

Now we have completed all four functions.

One common interview questions is asked to candidate and that is “ How to find the n’th highest salary”. Although there are so many ways you can find it and that are using TOP or using Sub queries etc. Let’s see how can we solve this problem using Rankin function.
Suppose we want to know the 5th highest salary. To do that we will use Dense_Rank() function . Let’s see the query for same:

SELECT * FROM (SELECT *, DENSE_RANK() OVER(ORDER BY salary)AS Highest_Salary FROM dbo.Rank_Test)
AS RankS WHERE Highest_Salary=5

You can also create a store procedure for same. Every time you just need to give the value for n and you will get the result.
We can create a store procedure as follows:

CREATE PROCEDURE HIGH_SALARY @NUM INT
AS
BEGIN
SELECT * FROM (SELECT *, DENSE_RANK() OVER(ORDER BY salary)AS Highest_Salary
FROM dbo.Rank_Test) AS RankS WHERE Highest_Salary=@NUM
END

Now Lets execute our store procedure that is HIGH_SALARY. Now we just have to give the value for n. See the below figure for same.

Thanks For reading this article.

No comments:

Post a Comment