Data Analysis & Business Intelligence: Ombir Rathee
Data Analysis & Business Intelligence: Ombir Rathee
SQL
For
SQL stands for Structured query language. SQL is a standard language that was designed to query and manage
the data in RDBMS i.e Relational Database Management Systems. Microsoft provides T-SQL (Transact-SQL) as
dialect of SQL in Microsoft SQL Server Data management software. In this document I mainly focused on T-
SQL. However, most of SQL syntax remains same for other RDBMS software such as Oracle, MySQL, IBM DB2,
PostgreSQL etc.
b. DML: Data Manipulation Language (DML) affect the information stored in the database.
• SELECT • INSERT • UPDATE • DELETE • TRUNCATE • MERGE
c. DCL: Data Control Language (DCL) deals with permissions on various objects.
• GRANT • REVOKE
• CHECK : Specify data values that are acceptable in a Column (Ex: Salary >1000)
• FOREIGN KEY : Used to establish and enforce a link between the data in two tables to control the data that
can be stored in the foreign key table
Ans.
1. FROM
2. JOIN
3. WHERE
4. GROUP BY
5. HAVING
6. SELECT
7. DISTINCT
8. ORDER BY
9. TOP/OFFSET-FETCH
Table A Table B
ID Name ID Sale
1 John 1 10
1 Peter 1 20
2 Veronica 2 30
5 Natasha 7 40
Ans.
1. INNER JOIN: 5 Rows
2. LEFT JOIN: 6 Rows
3. RIGHT JOIN: 6 Rows
4. FULL JOIN: 7 Rows
5. CROSS JOIN: 16 Rows
Both Self-Contained and Correlated subqueries can be Single-valued, Multi valued or table-valued.
UNION unifies the results of two input queries and eliminates the duplicate rows.
UNION ALL also unifies the results of two input queries but doesn’t remove duplicates. Hence it is
faster than UNION.
2. INTERSECT (A ∩ B)
INTERSECT operator first eliminates duplicate rows from two input queries and then returns the
common rows between two input queries.
3. EXCEPT (A-B)
EXCEPT operator first eliminates duplicate rows from two input queries and return rows that appear in
first query but not in second
The two queries involved in SET operators must have the same number of columns and corresponding
columns must have compatible data types. Compatible means, data type that has lower precedence must be
implicitly convertible to higher data type. The names of column in result set are determined by first query. Set
operator considers two NULLs as equal. The two Input queries cannot contain ORDER BY clause.
TRUNCATE DELETE
Remove all rows from Table Can remove selected rows based on condition in WHERE
clause
It is Minimally logged. Performance is faster than It is fully logged. Hence slower than Truncate.
delete.
Reset the Identity column if present to seed value. Doesn't affect the Identity column.
Executed using Table Lock. Executed using Row level Lock.
ALTER permission is required to Truncate the Table. DELETE permission is required to perform the Delete
operation.
DDL for Oracle; DML for MS SQL Server DML command
Can be Rolled back only in Transaction if Can be rolled back
Transaction is not Committed or Transaction
Session is not closed. Can’t be Rolled back in Oracle
even if it is in Transaction.
Ques 10. What is the difference between Store Procedure and Function?
Ans.
Ques 11. What is the difference between Table variable and Temp Table?
Ans.
Ques 12. What is the difference between View and Store Procedure?
Ans.
Contains only Select query Can contains several statements, IF-ELSE, Loop etc.
Cannot perform modification to any Can perform modification to one or several tables
table
a. Composite index: An index that contains more than one column. You can include up to 16 columns in
an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered
indexes can be composite indexes.
b. Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index
is a composite, the uniqueness is enforced across the columns as a whole, not on the individual
columns. For example, if you were to create an index on the FirstName and LastName columns in a
table, the names together must be unique, but the individual names can be duplicated.
c. Covering Index: When all of the required columns in SELECT list are part of the index, it is called a
covering index. It is created using INCLUDE statement and can be created only with Non-Clustered
Index. It can include Non-key columns in in the Index to significantly improves the query performance
because the query optimizer can locate all the column values within the index, table or clustered index
data is not accessed resulting in fewer disk I/O operations.
Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered
index is referred to as a heap.
Benefits:
1. You can transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a
data collection
2. You can perform maintenance operations on one or more partitions more quickly. The operations are
more efficient because they target only these data subsets, instead of the whole table.
It is mostly intended to aid in maintenance on larger tables and to offer fast ways to load and remove
large amounts of data from a table. Partitioning can enhance query performance, but there is no
guarantee.
Ques 20. What is the difference between PRIMARY AND UNIQUE KEY?
Ans.
PRIMARY UNIQUE
Primary Key is used to identify a Unique-key is used to prevent
row (record) in a table. duplicate values in a column (with
the exception of a null entry)
There can be only one primary Can be more than one unique key
key in a table. in a table
By Default Primary Key create By Default Unique Key creates
clustered Index. Non-Clustered Index.
Cannot contains NULL values Can contains NULL values.
For any feedback please contact: www.linkedin.com/in/ombir
10
Case sensitivity: A and a, B and b, etc. are treated in the same way then it is case-insensitive
Accent sensitivity: If a and á, o and ó are treated in the same way, then it is accent-insensitive
Kana Sensitivity: When Japanese kana characters Hiragana and Katakana are treated differently, it is
called Kana sensitive
Width sensitivity: When a single-byte character (half-width) and the same character when represented
as a double-byte character (full-width) are treated differently then it is width sensitive.
The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data
types. SQL Server allows the users to create databases, tables and columns in different collations.
Ques 25. Which operator is used for Pattern Matching?
Ans. LIKE operator is used for pattern matching. It supports below wildcards.
Declare Cursor
Open Cursor
Retrieve row from the Cursor
Process the row
Loop until last row
Close Cursor
Deallocate Cursor
Ques 28. What is the difference between WHERE and HAVING clause?
Ans.
Both are used to filter the dataset but WHERE is applied first and HAVING is applied at later stage of
query execution.
WHERE can be used in any SELECT query, while HAVING clause is only used in SELECT queries, which
contains aggregate function or group by clause.
Apart from SELECT queries, you can use WHERE clause with UPDATE and DELETE clause but HAVING
clause can only be used with SELECT query
Ques 32. What are the different Ranking Window functions in SQL Server?
Ans.
SYNTAX:
ROW_NUMBER|RANK|DENSE_RANK|NTILE () OVER ([PARTITION BY <partition_column>]
ORDER BY <order_by_column>)
The PARTITION BY clause is optional. If not used, data will be ranked based on a single partition.
1. ROW_NUMBER(): Always generate unique values without any gaps, even if there are ties.
2. RANK(): Ranks each row in the result set. Can have gaps in its sequence and when values are the same,
they get the same rank.
3. DENSE_RANK(): It also returns the same rank for ties, but doesn’t have any gaps in the sequence.
4. NTILE(): Divides the rows in roughly equal sized buckets. Suppose you have 20 rows and you specify
NTILE(2). This will give you 2 buckets with 10 rows each. When using NTILE(3), you get 2 buckets with 7
rows and 1 bucket with 6 rows.
SELECT a, b
FROM (VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10) ) AS MyTable(a, b)
Ans. CASE statement has two formats. Both allows up-to 10 levels of Nesting.
1. Simple CASE: It allows only equality operator.
CASE YEAR(OrderDT)
WHEN 2014 THEN 'Year 1'
WHEN 2013 THEN 'Year 2'
WHEN 2012 THEN 'Year 3'
ELSE 'Year 4 and beyond' END AS YearType
For any feedback please contact: www.linkedin.com/in/ombir
13
2. Searched CASE: It also supports other operators such as >, < ,>=, <=, <>
CASE
WHEN YEAR(OrderDT) = 2014 THEN 'Year 1'
WHEN YEAR(OrderDT) = 2013 THEN 'Year 2'
WHEN YEAR(OrderDT) = 2012 THEN 'Year 3'
WHEN YEAR(OrderDT) < 2012 THEN 'Year 4 and beyond'
END AS YearType
Ques 35. How to check the number of rows affected by Last statement?
2. LAST_VALUE(): Returns the Last value in an ordered set of values. If Partition By clause is specified
then it returns Last Value in each partition after ordering the partition by Order By Clause.
3. LAG(): Provides access to a row at a given physical offset that comes before the current row. Use this
function in a SELECT statement to compare values in the current row with values in a previous row as
specified by offset. Default offset is 1 if not specified. If Partition By clause is specified then it returns
the offset Value in each partition after ordering the partition by Order By Clause.
4. LEAD(): Provides access to a row at a given physical offset that comes after the current row. Use this
function in a SELECT statement to compare values in the current row with values in a subsequent row
as specified by offset. Default offset is 1 if not specified. If Partition By clause is specified then it
returns the offset Value in each partition after ordering the partition by Order By Clause.
5. PERCENT_RANK(): Used to evaluate the relative standing of a value within a query result set or
partition. The range of values returned by PERCENT_RANK is greater than 0 and less than or equal to 1.
Ques 3. How can you Create a Table without Create table statement?
Ans. SELECT * INTO NEWTABLE FROM SOURCETABLE
Ques 4. How can you Create a Table without Create table statement without any Data?
Ans. SELECT * INTO NEWTABLE FROM SOURCETABLE WHERE 1=2
Ques 6. Write a query to calculate the exact age in years from DOB?
Ans.
DECLARE @DOB DATE='1990-01-06' --YYYY-MM-DD
SELECT DATEDIFF(hour,@DOB,CAST(GETDATE() as Date))/8766.0
Ans.
Ans.
SELECT * FROM TblName ORDER BY NEWID()
Ques 9. Write a SQL query to generate Random Number between two numbers?
Ans.
DECLARE @TOP INT = 10
DECLARE @BOTTOM INT = 20
SELECT CAST(ROUND((@TOP-@BOTTOM)* RAND() +@BOTTOM,0) as INT)
Ques 10. How can you add a column with default value be added to existing table?
Ans.
ALTER TABLE TblName
ADD ColName INT NULL
CONSTRAINT Constraint_Name
DEFAULT (1)
WITH VALUES
Ans.
ALTER TABLE TblName
ADD CONSTRAINT SomeName
DEFAULT 'Value' FOR ColumnName
Ans.
b. Temp Table
IF OBJECT_ID('tempdb..#TblName') IS NOT NULL
BEGIN
-- Table Exists
END
UPDATE A
SET ColB = ColB + B.ColB,
ColC= B.ColC
FROM Table1 AS A
INNER JOIN Table2 AS B
ON A.ColA = B.ColA
Ques 18. Write a query to sort a column in a user defined order? For ex: If a field contains name of different
colors then Red to appear first, Blue to appear second and rest of colors in ascending order of Color. IF NULL
is available in column then it should appear last.
Ans.
Ques 19. Write a query to find the employee with Nth highest salary?
Ans.
;WITH CTE AS
(
SELECT *,
DENSE_RANK() OVER (ORDER BY SALARY DESC) AS RANKING
FROM TblName
)
SELECT
A.EMPID,
B.MANID AS [MANAGER OF MANAGER]
FROM Table1 AS A
LEFT JOIN Table1 AS B
ON A.MANID=B.EMPID
Ques 21. Write a query to find the Top N students in each Subject?
Ans.
;WITH CTE AS
(
SELECT *,
DENSE_RANK() OVER(PARTITION BY SUB ORDER BY MARKS DESC) AS RANKING
FROM TblName
)
SELECT * FROM CTE WHERE RANKING<=2 ---Top 2 in each Subject
Ques 22. Write a query to Delete the Duplicate records from a table?
Ans.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID, AGE, COUNTRY
ORDER BY ( SELECT 0)) As RowNo
FROM TblName
)
DELETE FROM CTE WHERE RowNo > 1
Ques 24. Write a Query to join rows values with comma as Delimiter?
Ans.
INPUT ID Category OUTPUT ID Category
1 Apple 1 Apple, Banana, Orange
1 Orange 2 Chair, Table
1 Banana 3 Eraser, Paper, Pencil
3 Chair
3 Table
2 Paper
2 Pencil
2 Eraser
For any feedback please contact: www.linkedin.com/in/ombir
21
SELECT DISTINCT
ID,
STUFF((SELECT ','+CATEGORY
FROM TblName B WHERE A.ID=B.ID
ORDER BY CATEGORY
FOR XML PATH('')
),1,1,'') AS CATEGORY
FROM TblName AS A
GROUP BY ID
ORDER BY ID
Expert Level Questions
Ques 1. Write a query to find the number of days of Gap between a Date range?
Ans:
SELECT
GapStart = DATEADD(DAY,1,[current]),
GapEnd = DATEADD(DAY,-1,[next]),
[Days] = DATEDIFF(day,DATEADD(DAY,1,[Current]),DATEADD(DAY,-1,[Next]))+1
FROM
(
SELECT
[Current] = [DayOnline],
[Next] = LEAD([DayOnline]) OVER (ORDER BY [DayOnline])
FROM TblName
)A
WHERE DATEDIFF(DAY,[Current],[Next]) > 1
For any feedback please contact: www.linkedin.com/in/ombir
22
Ques 2. Write a query to find the number of days between continuous dates in a Date Range?
Ans:
;WITH GroupedDates AS
(
SELECT UniqueDate = SomeDate,
DateGroup = DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY SomeDate), SomeDate)
FROM TblName
GROUP BY SomeDate
)
SELECT StartDate = MIN(UniqueDate),
EndDate = MAX(UniqueDate),
Days = DATEDIFF(day,MIN(UniqueDate),MAX(UniqueDate))+1
FROM GroupedDates
GROUP BY DateGroup
ORDER BY StartDate
Ques 3. Write a query to create a Pivot with Rows and Columns Total?
Ans:
Row
Input ID Category Value Output A B C Total
1 A 16 1 36 12 19 67
1 A 20 2 13 25 15 53
1 B 12 3 17 18 23 58
Col
1 C 19 Total 66 55 57 178
2 A 13
2 B 15
2 B 10
2 C 15
3 A 17
3 B 18
3 C 11
3 C 12
SELECT
[ ] = ISNULL(ID, 'Col Total'), --ID is of Varchar type
A= SUM(A),
B= SUM(B),
C= SUM(C),
[Row Total]= SUM(A+B+C)
FROM TblName AS A
PIVOT
(
SUM(VALUE) FOR CATEGORY IN([A],[B],[C])
) AS A
GROUP BY ROLLUP(ID)
Ques 4. Write a Query to split the concatenated values with comma as Delimiter into rows?
Ans:
SELECT A.ID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
(
SELECT ID,
CAST ('<M>' + REPLACE(CATEGORY, ',', '</M><M>') + '</M>' AS
XML) AS Data
FROM TblName
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a)