Interview Questions and Answers

What are Indexes in SQl Server? What is their advantages and disadvantages?

Why indexes?

Indexes are used by queries to find data from tables quickly. Indexes are created on tables and views. Index on a table or a view, is very similar to an index that we find in a book.

If you don't have an index in a book, and I ask you to locate a specific chapter in that book, you will have to look at every page starting from the first page of the book.

On, the other hand, if you have the index, you lookup the page number of the chapter in the index, and then directly go to that page number to locate the chapter.

Obviously, the book index is helping to drastically reduce the time it takes to find the chapter.

In a similar way, Table and View indexes, can help the query to find data quickly.

In fact, the existence of the right indexes, can drastically improve the performance of the query. If there is no index to help the query, then the query engine, checks every row in the table from the beginning to the end. This is called as Table Scan. Table scan is bad for performance.

If we can reduce the I/O required to find the information we want, we can effectively give the bottle a bigger neck.

This reduction will also lead to secondary benefits, such as reduced CPU time, waits, cache use and more.

The goal of an index on a database table then is to reduce I/O.

The following are the different types of indexes in SQL Server
1. Clustered
2. Nonclustered
3. Unique
4. Filtered
5. XML
6. Full Text
7. Spatial
8. Columnstore
9. Index with included columns
10. Index on computed columns

Clustered Index:
A clustered index determines the physical order of data in a table. For this reason, a table can have only one clustered index.

Create tblEmployees table using the script below.
CREATE TABLE [tblEmployee]
(
[Id] int Primary Key,
[Name] nvarchar(50),
[Salary] int,
[Gender] nvarchar(10),
[City] nvarchar(50)
)

Note that Id column is marked as primary key. Primary key, constraint create clustered indexes automatically if no clustered index already exists on the table and a nonclustered index is not specified when you create the PRIMARY KEY constraint.

To confirm this, execute sp_helpindex tblEmployee, which will show a unique clustered index created on the Id column.

Now execute the following insert queries. Note that, the values for Id column are not in a sequential order.
Insert into tblEmployee Values(3,'John',4500,'Male','New York')
Insert into tblEmployee Values(1,'Sam',2500,'Male','London')
Insert into tblEmployee Values(4,'Sara',5500,'Female','Tokyo')
Insert into tblEmployee Values(5,'Todd',3100,'Male','Toronto')
Insert into tblEmployee Values(2,'Pam',6500,'Female','Sydney')

Execute the following SELECT query
Select * from tblEmployee

Inspite, of inserting the rows in a random order, when we execute the select query we can see that all the rows in the table are arranged in an ascending order based on the Id column. This is because a clustered index determines the physical order of data in a table, and we have got a clustered index on the Id column.

Because of the fact that, a clustered index dictates the physical storage order of the data in a table, a table can contain only one clustered index. If you take the example of tblEmployee table, the data is already arranged by the Id column, and if we try to create another clustered index on the Name column, the data needs to be rearranged based on the NAME column, which will affect the ordering of rows that's already done based on the ID column.

For this reason, SQL server doesn't allow us to create more than one clustered index per table. The following SQL script, raises an error stating 'Cannot create more than one clustered index on table 'tblEmployee'. Drop the existing clustered index PK__tblEmplo__3214EC0706CD04F7 before creating another.'
Create Clustered Index IX_tblEmployee_Name
ON tblEmployee(Name)

A clustered index is analogous to a telephone directory, where the data is arranged by the last name. We just learnt that, a table can have only one clustered index. However, the index can contain multiple columns (a composite index), like the way a telephone directory is organized by last name and first name.

Let's now create a clustered index on 2 columns. To do this we first have to drop the existing clustered index on the Id column.

Drop index tblEmployee.PK__tblEmplo__3214EC070A9D95DB

When you execute this query, you get an error message stating 'An explicit DROP INDEX is not allowed on index 'tblEmployee.PK__tblEmplo__3214EC070A9D95DB'. It is being used for PRIMARY KEY constraint enforcement.' We will talk about the role of unique index in the next session. To successfully delete the clustered index, right click on the index in the Object explorer window and select DELETE.

Now, execute the following CREATE INDEX query, to create a composite clustered Index on the Gender and Salary columns.
Create Clustered Index IX_tblEmployee_Gender_Salary
ON tblEmployee(Gender DESC, Salary ASC)

Now, if you issue a select query against this table you should see the data physically arranged, FIRST by Gender in descending order and then by Salary in ascending order. The result is shown below.

Non Clustered Index:
A nonclustered index is analogous to an index in a textbook. The data is stored in one place, the index in another place. The index will have pointers to the storage location of the data. Since, the nonclustered index is stored separately from the actual data, a table can have more than one non clustered index, just like how a book can have an index by Chapters at the beginning and another index by common terms at the end.

In the index itself, the data is stored in an ascending or descending order of the index key, which doesn't in any way influence the storage of data in the table.

The following SQL creates a Nonclustered index on the NAME column on tblEmployee table:
Create NonClustered Index IX_tblEmployee_Name
ON tblEmployee(Name)

Difference between Clustered and NonClustered Index:
1. Only one clustered index per table, where as you can have more than one non clustered index
2. Clustered index is faster than a non clustered index, because, the non-clustered index has to refer back to the table, if the selected column is not present in the index.
3. Clustered index determines the storage order of rows in the table, and hence doesn't require additional disk space, but where as a Non Clustered index is stored seperately from the table, additional storage space is required.

Unique index is used to enforce uniqueness of key values in the index. Let's understand this with an example.

Create the Employee table using the script below
CREATE TABLE [tblEmployee]
(
[Id] int Primary Key,
[FirstName] nvarchar(50),
[LastName] nvarchar(50),
[Salary] int,
[Gender] nvarchar(10),
[City] nvarchar(50)
)

Since, we have marked Id column, as the Primary key for this table, a UNIQUE CLUSTERED INDEX gets created on the Id column, with Id as the index key.

We can verify this by executing the sp_helpindex system stored procedure as shown below.
Execute sp_helpindex tblEmployee

Output:

Since, we now have a UNIQUE CLUSTERED INDEX on the Id column, any attempt to duplicate the key values, will throw an error stating 'Violation of PRIMARY KEY constraint 'PK__tblEmplo__3214EC07236943A5'. Cannot insert duplicate key in object dbo.tblEmployee'

Example: The following insert queries will fail
Insert into tblEmployee Values(1,'Mike', 'Sandoz',4500,'Male','New York')
Insert into tblEmployee Values(1,'John', 'Menco',2500,'Male','London')

Now let's try to drop the Unique Clustered index on the Id column. This will raise an error stating - 'An explicit DROP INDEX is not allowed on index tblEmployee.PK__tblEmplo__3214EC07236943A5. It is being used for PRIMARY KEY constraint enforcement.'
Drop index tblEmployee.PK__tblEmplo__3214EC07236943A5

So this error message proves that, SQL server internally, uses the UNIQUE index to enforce the uniqueness of values and primary key.

Expand keys folder in the object explorer window, and you can see a primary key constraint. Now, expand the indexes folder and you should see a unique clustered index. In the object explorer it just shows the 'CLUSTERED' word. To, confirm, this is infact an UNIQUE index, right click and select properties. The properties window, shows the UNIQUE checkbox being selected.

SQL Server allows us to delete this UNIQUE CLUSTERED INDEX from the object explorer. so, Right click on the index, and select DELETE and finally, click OK. Along with the UNIQUE index, the primary key constraint is also deleted.

Now, let's try to insert duplicate values for the ID column. The rows should be accepted, without any primary key violation error.

Insert into tblEmployee Values(1,'Mike', 'Sandoz',4500,'Male','New York')
Insert into tblEmployee Values(1,'John', 'Menco',2500,'Male','London')

So, the UNIQUE index is used to enforce the uniqueness of values and primary key constraint.

UNIQUENESS is a property of an Index, and both CLUSTERED and NON-CLUSTERED indexes can be UNIQUE.

Creating a UNIQUE NON CLUSTERED index on the FirstName and LastName columns.
Create Unique NonClustered Index UIX_tblEmployee_FirstName_LastName
On tblEmployee(FirstName, LastName)

This unique non clustered index, ensures that no 2 entires in the index has the same first and last names. In Part 9, of this video series, we have learnt that, a Unique Constraint, can be used to enforce the uniqueness of values, across one or more columns. There are no major differences between a unique constraint and a unique index.

In fact, when you add a unique constraint, a unique index gets created behind the scenes. To prove this, let's add a unique constraint on the city column of the tblEmployee table.
ALTER TABLE tblEmployee
ADD CONSTRAINT UQ_tblEmployee_City
UNIQUE NONCLUSTERED (City)

At this point, we expect a unique constraint to be created. Refresh and Expand the constraints folder in the object explorer window. The constraint is not present in this folder. Now, refresh and expand the 'indexes' folder. In the indexes folder, you will see a UNIQUE NONCLUSTERED index with name UQ_tblEmployee_City.

Also, executing EXECUTE SP_HELPCONSTRAINT tblEmployee, lists the constraint as a UNIQUE NONCLUSTERED index.

So creating a UNIQUE constraint, actually creates a UNIQUE index. So a UNIQUE index can be created explicitly, using CREATE INDEX statement or indirectly using a UNIQUE constraint. So, when should you be creating a Unique constraint over a unique index.To make our intentions clear, create a unique constraint, when data integrity is the objective. This makes the objective of the index very clear. In either cases, data is validated in the same manner, and the query optimizer does not differentiate between a unique index created by a unique constraint or manually created.

Note:
1. By default, a PRIMARY KEY constraint, creates a unique clustered index, where as a UNIQUE constraint creates a unique nonclustered index. These defaults can be changed if you wish to.

2. A UNIQUE constraint or a UNIQUE index cannot be created on an existing table, if the table contains duplicate values in the key columns. Obviously, to solve this,remove the key columns from the index definition or delete or update the duplicate values.

3. By default, duplicate values are not allowed on key columns, when you have a unique index or constraint. For, example, if I try to insert 10 rows, out of which 5 rows contain duplicates, then all the 10 rows are rejected. However, if I want only the 5 duplicate rows to be rejected and accept the non-duplicate 5 rows, then I can use IGNORE_DUP_KEY option. An example of using IGNORE_DUP_KEY option is shown below.
CREATE UNIQUE INDEX IX_tblEmployee_City
ON tblEmployee(City)
WITH IGNORE_DUP_KEY

Ref:

https://www.sqlshack.com/top-10-questions-answers-sql-server-indexes/

0 Comments