Thursday, February 27, 2014

What Actually Happens Inside SQL Server When a simple select Query is Executed?

Hi All... In this blog I'm going to explain that "What Actually Happens inside SQL Server When a simple select Query is Submitted?"

When you execute any query via SSMS or through application within a second we get the result but do you really know how things work behind the scene?????



Suppose a user want to book a movie ticket. So, First he will see the number of seats available. To do so he will fill the request form to full fill his request.

So from here the STORY begins:

He has filled the request form to see the number of available seats. And click on "OK".


Now, application will communicates with the Database Engine, the application programming interfaces (APIs) exposed by the protocol layer formats the communication using a Microsoft-defined format called a tabular data stream (TDS) packet. The SQL Server Network Interface (SNI) protocol layer on both the server and client computers encapsulates the TDS packet inside a standard communication protocol, such as TCP/IP or Named Pipes or shared memory or VDI.

In simple words his request puted into an envelope. So that No one else can see that request. Security reasons :)

The request goes to SQL server relational engine. Relational engine have 3 main components:
    1)  CMD Parser
    2)  Query Optimizer
    3)  Query Executor

First The cmd parser processes T-SQL language events sent to SQL Server. It checks for proper syntax and spelling of keywords. If you type SEELCT instead of SELECT, then parsing stops and SQL Server returns an error to the query source. After a query is parsed, a binding component performs name resolution to convert the object names into their unique object ID values.Here the table name alias are replaced with actual value.
After the parsing and binding is done, the command is converted into an internal format that can be operated on. This internal format is known as a query tree. If object name can’t be resolved, an error is immediately raised that identifies where the error occurred.

Then it search for execution plan in plan cache if it already exist then it will be reused via Query executor. 

If it does not exist in Plan cache then the query tree will be passed to query optimizer.

The Query Optimizer determines what are the possible ways to execute query which will be best.

In other words, the optimizer figures out how best to implement the request represented by the T-SQL query you submitted. It decides if it can access the data through indexes, what types of joins to use and much more. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O. Hence, this is a cost-based plan.

It compiles the entire command batch and optimizes queries that are optimizable. The query optimization 
and compilation result in an execution plan.

The Query Optimizer also uses pruning heuristics to ensure that optimizing a query doesn’t take longer than required to simply choose a plan and execute it. Suppose If optimizer spends more than a minute to select a best plan then you can think the performance of that query.."More than worst".. :( To avoid this pruning heuristics algorithm is used.

                                                                                        Image Source: Quest Software (Kevin Kline)


Query execution:
Once the optimizer has generated an execution plan, or retrieved one from cache, the action switches to the storage engine, which usually executes the query according to the plan. It will store the newly generated plan in plan cache for future use. The relational engine communicate to storage engine via OLE DB.the storage engine is composed of three main areas:
     Access methods, 
    Locking and transaction services, 
   Utility commands

Access Methods:
Now SQL Server needs to locate data, it calls the access methods code, which sets up and requests scans of data pages and index pages and prepares the OLE DB rowsets to return to the relational engine.

The access methods code contains components to open a table, retrieve qualified data, and update data. It doesn’t actually retrieve the pages; instead, it makes the request to the buffer manager, which ultimately serves up the page in its cache or reads it to cache from disk. When the scan starts, a look-ahead mechanism qualifies the rows or index entries on a page. The retrieving of rows that meet specified criteria is
known as a qualified retrieval. The access methods code is used not only for SELECT statements but also for qualified UPDATE and DELETE statements.

So if the data is not available in data cache then data will brings to disk and we will go ahead. Sql server do not work directly with disk in case of data file, it does all changes in cache. In case of log file it directly write to disk. 

So, finally rowsets will be returns to relational engine and the required data sent back to the user using same network protocol.

So, the four important steps which are performed inside sql server are as follows:

----->----Parse-------->----Bind-->--------Optimize--->-------Execute

Please correct me in case of missing things....Comments and questions are welcome..... :)

Thanks You!!!!!!

References: 
SQL Server Internals
SQL Server Execution Plans










No comments:

Post a Comment