Showing posts from January, 2017Show All
Describe the Join operations : What is the self join? Please explain in detail using one example.

Table 1 - 4 Records , Table 2- 2 Records, Table 3- 0 Records What will be the output for CROSS JOIN?

Describe the Join operations : What is the self join? Please explain in detail using one example.

 Joins in SQL server are used to query (retrieve) data from 2 or more related tables. In general tables are related to each other using foreign key constraints.

In SQL server, there are different types of JOINS.
1. CROSS JOIN
2. INNER JOIN 
3. OUTER JOIN 

Outer Joins are again divided into 3 types
1. Left Join or Left Outer Join
2. Right Join or Right Outer Join
3. Full Join or Full Outer Join

Now let's understand all the JOIN types, with examples and the differences between them.
Employee Table (tblEmployee)


Departments Table (tblDepartment)


SQL Script to create tblEmployee and tblDepartment tables

Create table tblDepartment
(
     ID int primary key,
     DepartmentName nvarchar(50),
     Location nvarchar(50),
     DepartmentHead nvarchar(50)
)
Go

Insert into tblDepartment values (1, 'IT', 'London', 'Rick')
Insert into tblDepartment values (2, 'Payroll', 'Delhi', 'Ron')
Insert into tblDepartment values (3, 'HR', 'New York', 'Christie')
Insert into tblDepartment values (4, 'Other Department', 'Sydney', 'Cindrella')
Go

Create table tblEmployee
(
     ID int primary key,
     Name nvarchar(50),
     Gender nvarchar(50),
     Salary int,
     DepartmentId int foreign key references tblDepartment(Id)
)
Go

Insert into tblEmployee values (1, 'Tom', 'Male', 4000, 1)
Insert into tblEmployee values (2, 'Pam', 'Female', 3000, 3)
Insert into tblEmployee values (3, 'John', 'Male', 3500, 1)
Insert into tblEmployee values (4, 'Sam', 'Male', 4500, 2)
Insert into tblEmployee values (5, 'Todd', 'Male', 2800, 2)
Insert into tblEmployee values (6, 'Ben', 'Male', 7000, 1)
Insert into tblEmployee values (7, 'Sara', 'Female', 4800, 3)
Insert into tblEmployee values (8, 'Valarie', 'Female', 5500, 1)
Insert into tblEmployee values (9, 'James', 'Male', 6500, NULL)
Insert into tblEmployee values (10, 'Russell', 'Male', 8800, NULL)
Go


General Formula for Joins
SELECT      ColumnList
FROM           LeftTableName
JOIN_TYPE  RightTableName
ON                 JoinCondition

CROSS JOIN
CROSS JOIN, produces the cartesian product of the 2 tables involved in the join. For example, in the Employees table we have 10 rows and in the Departments table we have 4 rows. So, a cross join between these 2 tables produces 40 rows. Cross Join shouldn't have ON clause. 

CROSS JOIN Query:
SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
CROSS JOIN tblDepartment

JOIN or INNER JOIN
Write a query, to retrieve Name, Gender, Salary and DepartmentName from Employees and Departments table. The output of the query should be as shown below.


SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
INNER JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

OR

SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

Note: JOIN or INNER JOIN means the same. It's always better to use INNER JOIN, as this explicitly specifies your intention.

If you look at the output, we got only 8 rows, but in the Employees table, we have 10 rows. We didn't get JAMES and RUSSELL records. This is because the DEPARTMENTID, in Employees table is NULL for these two employees and doesn't match with ID column in Departments table.



So, in summary, INNER JOIN, returns only the matching rows between both the tables. Non matching rows are eliminated.

LEFT JOIN or LEFT OUTER JOIN
Now, let's say, I want all the rows from the Employees table, including JAMES and RUSSELL records. I want the output, as shown below.


SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
LEFT OUTER JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

OR

SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
LEFT JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

Note: You can use, LEFT JOIN or LEFT OUTER JOIN. OUTER keyowrd is optional

LEFT JOIN, returns all the matching rows + non matching rows from the left table. In reality, INNER JOIN and LEFT JOIN are extensively used.

RIGHT JOIN or RIGHT OUTER JOIN 
I want, all the rows from the right table. The query output should be, as shown below.


SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
RIGHT OUTER JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

OR

SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
RIGHT JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

Note: You can use, RIGHT JOIN or RIGHT OUTER JOIN. OUTER keyowrd is optional

RIGHT JOIN, returns all the matching rows + non matching rows from the right table.

FULL JOIN or FULL OUTER JOIN
I want all the rows from both the tables involved in the join. The query output should be, as shown below.



SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
FULL OUTER JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

OR

SELECT Name, Gender, Salary, DepartmentName
FROM tblEmployee
FULL JOIN tblDepartment
ON tblEmployee.DepartmentId = tblDepartment.Id

Note: You can use, FULLJOIN or FULL OUTER JOIN. OUTER keyowrd is optional

FULL JOIN, returns all rows from both the left and right tables, including the non matching rows.

Joins Summary



SELF JOIN
joining a table with itself is called as 
SELF JOIN. SELF JOIN is not a different type of JOIN. It can be classified under any type of JOIN - INNER, OUTER or CROSS Joins. 

Have you ever thought of a need to join a table with itself. Consider tblEmployees table shown below.


Write a query which gives the following result.



Self Join Query:
A MANAGER is also an EMPLOYEE. Both the, EMPLOYEE and MANAGER rows, are present in the same table. Here we are joining tblEmployee with itself using different alias names, E for Employee and M for Manager. We are using LEFT JOIN, to get the rows with ManagerId NULL. You can see in the output TODD's record is also retrieved, but the MANAGER is NULL. If you replace LEFT JOIN with INNER JOIN, you will not get TODD's record.

Select E.Name as Employee, M.Name as Manager
from tblEmployee E
Left Join tblEmployee M
On E.ManagerId = M.EmployeeId


In short, joining a table with itself is called as SELF JOIN. SELF JOIN is not a different type of JOIN. It can be classified under any type of JOIN - INNER, OUTER or CROSS Joins. The above query is, LEFT OUTER SELF Join.

Inner Self Join tblEmployee table:
Select E.Name as Employee, M.Name as Manager
from tblEmployee E
Inner Join tblEmployee M
On E.ManagerId = M.EmployeeId

Cross Self Join tblEmployee table:
Select E.Name as Employee, M.Name as Manager
from tblEmployee
Cross Join tblEmployee

What are the 3 ways to get number of rows from table?

I found this good article SQL Server–HOW-TO: quickly retrieve accurate row count for table from martijnh1 which gives a good recap for each scenarios.

I need this to be expanded where I need to provide a count based on a specific condition and when I figure this part, I'll update this answer further.

In the meantime, here are the details from article:

Method 1:

Query:

SELECT COUNT(*) FROM Transactions 

Comments:

Performs a full table scan. Slow on large tables.

Method 2:

Query:

SELECT CONVERT(bigint, rows) 
FROM sysindexes 
WHERE id = OBJECT_ID('Transactions') 
AND indid < 2 

Comments:

Fast way to retrieve row count. Depends on statistics and is inaccurate.

Run DBCC UPDATEUSAGE(Database) WITH COUNT_ROWS, which can take significant time for large tables.

Method 3:

Query:

SELECT CAST(p.rows AS float) 
FROM sys.tables AS tbl 
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and
idx.index_id < 2 
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int) 
AND p.index_id=idx.index_id 
WHERE ((tbl.name=N'Transactions' 
AND SCHEMA_NAME(tbl.schema_id)='dbo')) 

Comments:

The way the SQL management studio counts rows (look at table properties, storage, row count). Very fast, but still an approximate number of rows.

Method 4:

Query:

SELECT SUM (row_count) 
FROM sys.dm_db_partition_stats 
WHERE object_id=OBJECT_ID('Transactions')    
AND (index_id=0 or index_id=1); 

Comments:

Quick (although not as fast as method 2) operation and equally important, reliable.



Index

Query

Comment

1

SELECT COUNT(*) FROM Transactions

Performs a full table scan. Slow on large tables.

2

SELECT CONVERT(bigint, rows)

FROM sysindexes

WHERE id = OBJECT_ID('Transactions')

AND indid < 2

Fast way to retrieve row count. Depends on statistics and is inaccurate.

Run DBCC UPDATEUSAGE(Database) WITH COUNT_ROWS, which can take significant time for large tables.

3

SELECT CAST(p.rows AS float)

FROM sys.tables AS tbl

INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and idx.index_id < 2

INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int)

AND p.index_id=idx.index_id

WHERE ((tbl.name=N'Transactions'

AND SCHEMA_NAME(tbl.schema_id)='dbo'))

The way the SQL management studio counts rows (look at table properties, storage, row count). Very fast, but still an approximate number of rows.

4

SELECT SUM (row_count)

FROM sys.dm_db_partition_stats

WHERE object_id=OBJECT_ID('Transactions')   

AND (index_id=0 or index_id=1);

Quick (although not as fast as method 2) operation and equally important, reliable.

Ref:

https://docs.microsoft.com/en-us/archive/blogs/martijnh/sql-serverhow-to-quickly-retrieve-accurate-row-count-for-table