0% found this document useful (0 votes)
19 views122 pages

SQL

The document provides an overview of SQL, including its definition, types of statements, and key concepts such as tables, primary keys, and joins. It covers various levels of SQL knowledge from beginner to advanced, detailing operations like filtering, aggregation, and transaction handling. Additionally, it includes practice questions and explanations of important SQL functions and clauses.

Uploaded by

vishakha chavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
19 views122 pages

SQL

The document provides an overview of SQL, including its definition, types of statements, and key concepts such as tables, primary keys, and joins. It covers various levels of SQL knowledge from beginner to advanced, detailing operations like filtering, aggregation, and transaction handling. Additionally, it includes practice questions and explanations of important SQL functions and clauses.

Uploaded by

vishakha chavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 122

Beginner Level

1. What is SQL?

SQL (Structured Query Language) is a standard language used to interact with databases for tasks such
as querying, inserting, updating, and deleting data.

2. What are the different types of SQL statements?

1. DDL (Data Definition Language): CREATE, ALTER, DROP


2. DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
3. DCL (Data Control Language): GRANT, REVOKE
4. TCL (Transaction Control Language): COMMIT, ROLLBACK
5. DQL (Data Query Language): SELECT

3. What are tables and fields?

 A table is a collection of rows and columns.


 Fields are columns in a table, defining attributes of the data.

4. How do you fetch data from a database?

Using the SELECT statement:

SELECT column1, column2 FROM table_name;


5. How do you filter data in SQL?

Using the WHERE clause:

SELECT * FROM employees WHERE age > 30;


6. What is the difference between WHERE and HAVING?

 WHERE: Filters rows before grouping.


 HAVING: Filters groups after aggregation.

7. What is a primary key?

A unique identifier for each row in a table. It cannot contain NULL.

8. What is a foreign key?

A field that links to the primary key of another table to establish relationships.

9. How do you sort data?

1
Using ORDER BY:

SELECT * FROM employees ORDER BY salary DESC;


10. How do you remove duplicate rows?

Using DISTINCT:

SELECT DISTINCT department FROM employees;

Intermediate Level
11. What is a join? List its types.

Combines rows from two or more tables based on a related column. Types:

1. INNER JOIN
2. LEFT JOIN (or LEFT OUTER JOIN)
3. RIGHT JOIN (or RIGHT OUTER JOIN)
4. FULL JOIN (or FULL OUTER JOIN)
5. CROSS JOIN

12. How do you perform an inner join?


SELECT employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.department_id = departments.id;
13. What is normalization?

The process of organizing data to reduce redundancy and dependency by dividing tables into smaller,
related tables.

14. What are the normal forms in normalization?

1. 1NF: No repeating groups.


2. 2NF: No partial dependency.
3. 3NF: No transitive dependency.
4. BCNF: Boyce-Codd Normal Form.

15. What is denormalization?

The process of combining normalized tables to improve read performance.

16. How do you use GROUP BY?


SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

2
17. How do you filter groups in SQL?

Using HAVING:

SELECT department, AVG(salary) AS avg_salary


FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;
18. What is a subquery?

A query nested inside another query. Example:

SELECT name FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
19. What is a view in SQL?

A virtual table based on a query. Example:

CREATE VIEW high_salary_employees AS


SELECT name, salary FROM employees WHERE salary > 50000;
20. How do you update data in SQL?
UPDATE employees SET salary = salary * 1.10 WHERE department = 'HR';

Advanced Level
21. What is an index in SQL?

An index improves the speed of data retrieval but may slow down INSERT and UPDATE operations.

22. What are the types of indexes?

1. Clustered Index: Alters the physical order of the data.


2. Non-clustered Index: Does not affect physical order.

23. How do you create an index?


CREATE INDEX idx_name ON employees (name);
24. What is a stored procedure?

A stored procedure is a precompiled set of SQL statements. Example:

CREATE PROCEDURE GetEmployeeCount


AS
SELECT COUNT(*) FROM employees;

3
25. What is a trigger?

A trigger executes a block of code automatically in response to certain events on a table (e.g., INSERT,
UPDATE).

26. How do you handle transactions in SQL?

Using COMMIT and ROLLBACK:

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
27. What is the difference between DELETE and TRUNCATE?

 DELETE: Removes rows; can use WHERE.


 TRUNCATE: Removes all rows; faster, no WHERE.

28. What are window functions?

Functions that operate on a subset of rows related to the current row:

SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank


FROM employees;
29. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

 RANK(): Leaves gaps in rankings.


 DENSE_RANK(): No gaps in rankings.
 ROW_NUMBER(): Assigns unique numbers.

30. How do you optimize SQL queries?

1. Use proper indexes.


2. Avoid SELECT *.
3. Use joins instead of subqueries when possible.
4. Limit results with LIMIT or TOP.

Practice Questions

Here are additional practice questions categorized by topic:

4
Basic Queries

31. Write a query to fetch the highest salary.


32. Write a query to count employees in each department.
33. Fetch rows where salary is between 50,000 and 100,000.

Joins

34. Write a query using a left join.


35. Fetch data from three tables using a join.
36. Demonstrate a self-join.

Aggregations

37. Calculate total salary by department.


38. Find the maximum and minimum salary for each job title.
39. Count rows where employees have more than 5 years of experience.

Advanced

40. Write a query using a CTE (Common Table Expression).


41. Create a recursive query.
42. Write a query to pivot data.

Q 1. What is SQL?

SQL stands for Structured Query Language. It is a programming language used for managing and manipulating
relational databases. Ans:

Q 2. What is a database?

A database is an organized collection of data stored and accessed electronically. It provides a way to store,
organize, and retrieve large amounts of data efficiently. Ans:

Q 3. What is a primary key?

A primary key is a column or combination of columns that uniquely identifies each row in a table. It enforces
the entity integrity rule in a relational databa

Q 4. What is a foreign key?

A foreign key is a column or combination of columns that establishes a link between data in two tables. It
ensures referential integrity by enforcing relationships between tables.

Q 5. What is the difference between a primary key and a unique key?

5
A primary key is used to uniquely identify a row in a table and must have a unique value. On the other hand, a
unique key ensures that a column or combination of columns has a unique value but does not necessarily
identify the row.

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form
(BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) or Project-Join Normal Form (PJNF) The different
types of normalization are:

Q 6. What is normalization?

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It
involves breaking down a table into smaller tables and establishing relationships between them.

Q 8. What is a join in SQL?

A join is an operation used to combine rows from two or more tables based on related columns. It allows you
to retrieve data from multiple tables simultaneously.

Q 9. What is the difference between DELETE and TRUNCATE in SQL?

The DELETE statement is used to remove specific rows from a table based on a condition. It can be rolled back
and generates individual delete operations for each row. TRUNCATE, on the other hand, is used to remove all
rows from a table. It cannot be rolled back, and it is faster than DELETE as it deallocates the data pages instead
of logging individual row deletions. Ans: 04

Q 10. What is the difference between UNION and UNION ALL?

UNION and UNION ALL are used to combine the result sets of two or more SELECT statements. UNION
removes duplicate rows from the combined result set. whereas UNION ALL includes all rows, including
duplicates. Ans:

Q 11. What is the difference between the HAVING clause and the WHERE clause?

The WHERE clause is used to filter rows based on a condition before the data is grouped or aggregated. It
operates on individual rows. The HAVING clause, on the other hand, is used to filter grouped rows based on a
condition after the data is grouped or aggregated using the GROUP BY clause.

Q 12. What is a transaction in SQL?

A transaction is a sequence of SQL statements that are executed as a single logical unit of work. It ensures
data consistency and integrity by either committing all changes or rolling them back if an error occurs.

Q 14. What is ACID in the context of database transactions?

Q 15. What is a deadlock?

A deadlock occurs when two or more transactions are waiting for each other to release resources, resulting in
a circular dependency. As a result, none of the transactions can proceed, and the system may become
unresponsive.

6
Q 16. What is the difference between a database and a schema?

A database is a container that holds multiple objects, such as tables, views, indexes, and procedures. It
represents a logical grouping of related data. A schema, on the other hand, is a container within a database
that holds objects and defines their ownership. It provides a way to organize and manage database objects.

Q 19. What is the difference between CHAR and VARCHAR data types?

Q 20. What is a stored procedure?

A stored procedure is a set of SQL statements that are stored in the database and can be executed repeatedly.
It provides code reusability and better performance.

Q 21. What is a subquery?

A subquery is a query nested inside another query. It is used to retrieve data based on the result of an inner
query.

Q 23. What is the difference between a cross join and an inner join?

A cross join (Cartesian product) returns the combination of all rows from two or more tables. An inner join
returns only the matching rows based on a join condition.

Q 24. What is the purpose of the COMMIT statement?

The COMMIT statement is used to save changes made in a transaction permanently. It ends the transaction
and makes the changes visible to other users.

Q 25. What is the purpose of the ROLLBACK statement?

The ROLLBACK statement is used to undo changes made in a transaction. It reverts the database to its previous
state before the transaction started.

Q 26. What is the purpose of the NULL value in SQL?

NULL represents the absence of a value or unknown value. It is different from zero or an empty string and
requires special handling in SQL queries.

Q 27. What is the difference between a view and a materialized view?

A materialized view is a physical copy of the view's result set stored in the database, which is updated
periodically. It improves query performance at the cost of data freshness.

Q 28. What is a correlated subquery?

A correlated subquery is a subquery that refers to a column from the outer query. It executes once for each
row processed by the outer query.

Q 29. What is the purpose of the DISTINCT keyword?

7
The DISTINCT keyword is used to retrieve unique values from a column or combination of columns in a SELECT
statement.

Q 30. What is the difference between the CHAR and VARCHAR data types?

CHAR stores fixed-length character strings, while VARCHAR stores variable-length character strings. The
storage size of CHAR is constant, while VARCHAR adjusts dynamically.

Q 31. What is the difference between the IN and EXISTS operators?

The IN operator checks for a value within a set of values or the result of a subquery. The EXISTS operator
checks for the existence of rows returned by a subquery.

Q 33. What is the difference between a unique constraint and a unique index?

A unique constraint ensures the uniqueness of values in one or more columns, while a unique index enforces
the uniqueness and also improves query performance.

Q 34. What is the purpose of the TOP or LIMIT clause?

UNION combines the result sets of two or more SELECT statements vertically, while JOIN combines columns
from two or more tables horizontally based on a join condition.

Q 35. What is the difference between the UNION and JOIN operators?

UNION combines the result sets of two or more SELECT statements vertically, while JOIN combines columns
from two or more tables horizontally based on a join condition.

Q 36. What is a data warehouse?

A data warehouse is a large, centralized repository that stores and manages data from various sources. It is
designed for efficient reporting, analysis, and business intelligence purposes.

Q 37. What is the difference between a primary key and a candidate key?

A primary key is a chosen candidate key that uniquely identifies a row in a table. A candidate key is a set of one
or more columns that could potentially become the primary key.

Q 38. What is the purpose of the GRANT statement?

The GRANT statement is used to grant specific permissions or privileges to users or roles in a database.

Q 39. What is a correlated update?

A correlated update is an update statement that refers to a column from the same table in a subquery. It
updates values based on the result of the subquery for each row.

Q 40. What is the purpose of the CASE statement?

8
The CASE statement is used to perform conditional logic in SQL queries. It allows you to return different values
based on specified conditions.

Q 41. What is the purpose of the COALESCE function?

The COALESCE function returns the first non-null expression from a list of expressions. It is often used to
handle null values effectively.

Q 42. What is the purpose of the ROW_NUMBER() function?

The ROW_NUMBER() function assigns a unique incremental number to each row in the result set. It is
commonly used for pagination or ranking purposes.ll values effectively.

Q 43. What is the difference between a natural join and an inner join?

A natural join is an inner join that matches rows based on columns with the same name in the joined tables. It
is automatically determined by the database.

Q 44. What is the purpose of the CASCADE DELETE constraint?

The CASCADE DELETE constraint is used to automatically delete related rows in child tables when a row in the
parent table is deleted.

Q 45. What is the purpose of the ALL keyword in SQL?

Q 46. What is the difference between the EXISTS and NOT EXISTS operators?

The EXISTS operator returns true if a subquery returns any rows, while the NOT EXISTS operator returns true if
a subquery returns no rows.

Q 47. What is the purpose of the CROSS APPLY operator?

The CROSS APPLY operator is used to invoke a tablevalued function for each row of a table expression. It
returns the combined result set.

Q 48. What is a self-join?

A self-join is a join operation where a table is joined with itself. It is useful when you want to compare rows
within the same table based on related columns. It requiresbined result set.

Q 49. What is an ALIAS command?

ALIAS command in SQL is the name that can be given to any table or a column. This alias name can be referred
in WHERE clause to identify a particular table or a column.

1.How to use aggregate functions as window functions?

9
Syntax for Window Functions

SELECT column_name,
AGGREGATE_FUNCTION(column_name) OVER (PARTITION BY column_name ORDER BY
column_name) AS alias_name FROM table_name;

Common Aggregate Functions Used as Window Functions

1. SUM(): Calculates the sum of values.

2. AVG(): Calculates the average of values.

3. COUNT(): Counts the number of rows or non-null values.

4. MIN(): Finds the minimum value.

5. MAX(): Finds the maximum value.

4. What is the order of execution of a SQL query?

Answer: Here’s the simplified order of execution for a SQL query:

1. FROM/JOIN: Identifies the table(s) to retrieve data from. Joins between tables are also processed here.

2. WHERE: Filters rows based on specified conditions. Only rows that match these conditions proceed.

3. GROUP BY: Groups rows that have the same values in specified columns. This is useful for aggregation.

4. HAVING: Filters groups based on conditions, similar to WHERE but works on grouped data.

5. SELECT: Selects the columns or expressions to return in the result. Performs any calculations needed.

6. ORDER BY: Sorts the result set based on one or more columns or expressions.

7. LIMIT: Limits the number of rows returned by the query.

5. What are DDL and DML languages? Give examples.

Answer:

1. DDL (Data Definition Language)


0 Purpose: DDL is used to define and modify the structure of database objects such as tables,
indexes, and schemas. It deals with the schema of the database.

10
○ Examples of DDL Commands:
■ CREATE: Used to create new database objects like tables and indexes.
■ ALTER: Used to modify existing database objects (e.g., adding a column to a table).
■ DROP: Used to delete existing database objects (e.g., dropping a table or index).
■ TRUNCATE: Used to remove all rows from a table but keep its structure intact.

2. DML (Data Manipulation Language)


0 Purpose: DML is used to manipulate data within the database. It deals with the data itself,
allowing you to perform operations such as inserting, updating, deleting, and retrieving data.
○ Examples of DML Commands:
■ INSERT: Used to add new records (rows) into a table.
■ UPDATE: Used to modify existing records in a table.
■ DELETE: Used to remove records from a table.
■ SELECT: Used to retrieve data from one or more tables.

6. What are the prerequisites to use the UNION operator in SQL?

Answer:To use the UNION operator in SQL, which combines the result sets of two or more SELECT statements
into a single result set, certain prerequisites must be met:

1. Same Number of Columns:


0 Each SELECT statement involved in the UNION must return the same number of columns. The
columns should match in count across all queries.

2. Same Data Types:


0 The corresponding columns in each SELECT statement must have compatible data types. For
example, if the first column in the first query is an integer, the first column in the other queries
should also be of an integer type or a compatible type.

3. Order of Columns:
0 The order of columns in each SELECT statement must match. The first column in each query
should correspond to the same kind of data, and so on.

4. Column Names:
0 The column names in the final result set are taken from the first
SELECT statement. Subsequent SELECT statements do not need
to have the same column names, but the types a d order must still match.

11
5. Usage of UNION vs UNION ALL:
0 UNION removes duplicate records from the combined result set, while UNION ALL includes all
records, including duplicates. Make sure to choose the appropriate one based on whether you
want duplicates removed.

7. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() window functions
in SQL?

Answer:

RANK(), DENSE_RANK(), and ROW_NUMBER() are all window functions in SQL that assign a unique rank or
number to rows within a partition of the result set. They are often used to rank or order data within groups.
Here’s how they differ:

1. RANK()
0 Purpose: Assigns a rank to each row within a partition of the result set, with gaps in ranking if
there are ties.

2. DENSE_RANK()
0 Purpose: Similar to RANK(), but without gaps. Assigns consecutive ranks to rows within a
partition of the result set.

3. ROW_NUMBER()
0 Purpose: Assigns a unique sequential integer to each row within a partition of the result set,
without considering ties.
8. What are subqueries, and where can we use them?

Answer: Subqueries:

● Definition: A subquery is a query nested inside another query (the main query). It is also known as an
inner query or a nested query. Subqueries are used to perform operations that would otherwise
require multiple steps or complex logic.

9. Which is better to use, CTE or subquery?

Answer:

1. CTE (Common Table Expression)


0 Definition: Temporary result set defined using the WITH clause.

12
○ Pros: Improves readability, easier to maintain, can be reused multiple times, supports recursion.
○ Use When: Query is complex, needs to be broken into steps, or results need to be reused.
○ Example: Easier to read and understand in complex queries.

2. Subquery
0 Definition: Query nested inside another query.
○ Pros: Simple and concise, good for single-use calculations or filters.
○ Use When: Query is straightforward, result is used only once.
○ Example: Suitable for one-time filtering and calculations.

1.Show first name, last name, and gender of patients whose gender is 'M'.
SELECT first_name, last_name, gender
FROM patients where gender = "M";

2.Show first name and last name of patients who does not have allergies. (null).
SELECT first_name, last_name
FROM patients

where allergies is null;

3.Show first name of patients that start with the letter 'C'.
SELECT first_name FROM patients
where first_name like "c%";

4.Show first name and last name of patients that weight within the range of 100 to 120 (inclusive).
SELECT first_name, last_name FROM patients
where weight between 100 and 120;

5.Update the patients table for the allergies column. If the patient's allergies is null then replace it with
'NKA'.
update patients set allergies = "NKA"
where allergies is null;

6.Show first name and last name concatinated into one column to show their full name.
select concat(first_name, " ", last_name) as full_name from patients;

7.Show first name, last name, and the full province name of each patient.
select first_name, last_name, province_name from patients
join province_names ON patients.province_id =
province_names.province_id;

8.Show how many patients have a birth_date with 2010 as the birth year.

13
select count(birth_date) from patients
where birth_date like "%2010%";

9.Show the first_name, last_name, and height of the patient with the greatest height.
select first_name, last_name, max(height) from patients group by first_name, last_name
order by max(height) desc limit 1; OR

select first_name, last_name, max(height) as height from patients


group by first_name, last_name order by height
desc limit 1;

10.Show all columns for patients who have one of these patient_ids: 1,45,534,879,1000
select * from patients
where patient_id in (1,45,534,879,1000);

11.Show the total number of admissions.


select count(admission_date) from admissions;

12.Show all the columns from admissions where the patient was admitted and discharged on the same day.
select * from admissions
where admission_date = discharge_date;

13.Show the patient id and the total number of admissions for patient_id 579.
select patient_id, count(admission_date) from admissions where patient_id = 579;

14.Based on the cities that our patients live in, show unique cities that are in province_id 'NS'?
select distinct(city) from patients
where province_id = "NS";

15.Write a query to find the first_name, last name and birth date of patients who has height greater than
160 and weight greater than 70.
select first_name, last_name, birth_date from patients
where height > 160 and weight > 70;

16.Write a query to find list of patients first_name, last_name, and allergies where allergies are not null and
are from the city of 'Hamilton'
select first_name, last_name, allergies from patients
where allergies is not null and city = "Hamilton";

17.Show unique birth years from patients and order them by ascending.
select distinct(year(birth_date)) as birth_year from patients
order by birth_year;

14
18.Show unique first names from the patients table which only occurs once in the list. For example, if two or
more people are named 'John' in the first_name column then don't include their name in the output list. If
only 1 person is named 'Leo' then include them in the output.
select first_name from patients group by first_name
having count(first_name ="Leo") = 1;

19.Show patient_id and first_name from patients where their first_name start and ends with 's' and is at
least 6 characters long.
select patient_id, first_name
from patients

where first_name like "s%" and first_name like "%s" and first_name like "%______%"; OR

select patient_id, first_name from patients


where first_name like "s%s" and first_name like "%______%";

OR
SELECT patient_id, first_name
FROM patients

WHERE first_name LIKE "s____%s";

20.Show patient_id, first_name, last_name from patients whos diagnosis is 'Dementia'. Primary diagnosis is
stored in the admissions table.
select p.patient_id, p.first_name, p.last_name from patients as p
join admissions as a

on p.patient_id = a.patient_id

where diagnosis = "Dementia";

21.Display every patient's first_name. Order the list by the length of each name and then by alphabetically.
select first_name from patients
order by len(first_name), first_name asc;

22.Show the total amount of male patients and the total amount of female patients in the patients table.
Display the two results in the same row.
select count(gender = "M") as Male,
count(gender = "F") as Female

from patients;

23.Show first and last name, allergies from patients which have allergies to either

15
'Penicillin' or 'Morphine'. Show results ordered ascending by allergies then by first_name then by
last_name. select first_name ,last_name, allergies from patients

where allergies = "Penicillin" or allergies = "Morphine" order by allergies, first_name, last_name;

24.Show patient_id, diagnosis from admissions. Find patients admitted multiple times for the same
diagnosis.
select patient_id, diagnosis from admissions group by patient_id, diagnosis having count(patient_id =
diagnosis) > 1;

25.Show the city and the total number of patients in the city. Order from most to least patients and then by
city name ascending.
select city, count(*) as number_of_patients from patients group by city
order by number_of_patients desc, city;

26.Show first name, last name and role of every person that is either patient or doctor. The roles are either
"Patient" or "Doctor"
select first_name, last_name, "Patient" as role from patients union all
select first_name, last_name, "Doctor" as role from doctors;

27.Show all allergies ordered by popularity. Remove NULL values from query.
select allergies, count(*) as popular_allergies from patients where allergies is not null group by allergies
order by popular_allergies desc;

28.Show all patient's first_name, last_name, and birth_date who were born in the 1970s decade. Sort the
list starting from the earliest birth_date.
select first_name, last_name, birth_date from patients where birth_date like "%197%"
order by birth_date asc; OR

select first_name, last_name, birth_date from patients


where Year(birth_date) between 1970 and 1979 order by birth_date asc;

29.We want to display each patient's full name in a single column. Their last_name in all upper letters must
appear first, then first_name in all lower case letters. Separate the last_name and first_name with a comma.
Order the list by the first_name in decending order. EX: SMITH,jane
select concat(upper(last_name), "," ,lower(first_name)) as full_name from patients
order by first_name desc;

30.Show the province_id(s), sum of height; where the total sum of its patient's height is greater than or
equal to 7,000.
Select province_id, sum(height)
From patients

Group By province_id

Having sum(height) >= 7000;

16
31.Show the difference between the largest weight and smallest weight for patients with the last name
'Maroni'
select (max(weight) - min(weight)) as weight_diff from patients
where last_name = "Maroni";

32.Show all of the days of the month (1-31) and how many admission_dates occurred on that day. Sort by
the day with most admissions to least admissions.
select day(admission_date) as day_num, count(patient_id) as num_of_addmission from admissions group by
day_num
order by num_of_addmission Desc;

33.Show all columns for patient_id 542's most recent admission_date.


select * from admissions where patient_id = 542
order by admission_date desc

limit 1;

34.Show patient_id, attending_doctor_id, and diagnosis for admissions that match one of the two criteria:
(A). patient_id is an odd number and attending_doctor_id is either 1, 5, or (B). attending_doctor_id contains
a 2 and the length of patient_id is 3 characters.
select patient_id, attending_doctor_id, diagnosis
from admissions

where patient_id % 2 = 1 and attending_doctor_id in (1,5,19) or attending_doctor_id like "%2%" and


len(patient_id) = 3;

35.Show first_name, last_name, and the total number of admissions attended for each doctor. Every
admission has been attended by a doctor.
select first_name, last_name, count(admission_date) as admissions_attended from admissions a
join doctors d

on a.attending_doctor_id = d.doctor_id

group by doctor_id;

36.For each doctor, display their id, full name, and the first and last admission date they attended.
select doctor_id,
concat("first_name", " ", "last_name") as full_name, min(admission_date) as
first_date_attended, max(admission_date) as last_date_attended from
admissions a

join doctors d

on a.attending_doctor_id = d.doctor_id

group by doctor_id;

17
37.Display the total amount of patients for each province. Order by descending.
select pr.province_name, count(p.patient_id) as total_patients from patients as p
join province_names as pr on p.province_id = pr.province_id group by
pr.province_name

order by total_patients desc;

38.For every admission, display the patient's full name, their admission diagnosis, and their doctor's full
name who diagnosed their problem.
select concat(p.first_name, " ", p.last_name) as patient_full_name, a.diagnosis, concat(d.first_name, " ",
d.last_name) as doc_full_name from patients as p
join admissions as a on p.patient_id = a.patient_id

join doctors as d

on d.doctor_id = a.attending_doctor_id;

39.display the first name, last name and number of duplicate patients based on their first name and last
name.
select first_name, last_name, count(*) as num_of_duplicates from patients
group by first_name, last_name

having count(*) > 1;

40.Display patient's full name, height in the units feet rounded to 1 decimal, weight in the unit pounds
rounded to 0 decimals, birth_date, gender non abbreviated. Convert CM to feet by dividing by 30.48.
Convert KG to pounds by multiplying by 2.205.
select concat(first_name, " ", last_name) as patient_full_name, round((height/30.48), 1) as height,
round((weight*2.205), 0) as weight, birth_date,
case

when gender = "M" then "Male"

when gender = "F" then "Female" end as gender

from patients;

41.Show patient_id, first_name, last_name from patients who do not have any records in the admissions
table. (Their patient_id does not exist in any admissions.patient_id rows.)
select p.patient_id, p.first_name, p.last_name
from patients as p

Left join admissions as a on p.patient_id =


a.patient_id where a.patient_id is null;

42.Show all of the patients grouped into weight groups. Show the total amount of patients in each weight
group. Order the list by the weight group decending. For example, if they weight 100 to 109 they are placed
in the 100 weight group, 110-119 = 110 weight group, etc.

18
select (weight/10) * 10 as weight_group, count(*) as no_of_patients_in_grp from patients group by
weight_group order by weight_group desc;
43.Show patient_id, weight, height, isObese from the patients table. Display isObese as a boolean 0 or 1.
Obese is defined as weight(kg)/(height(m)2) >= 30. Weight is in units kg. Height is in units cm.
// Comment: To convert height (CM) to height (M): divide the height by 100.00 (height/100.00) //
select patient_id, weight, height, Case
when weight/power(height/100.00,2) > 30 then 1 else 0 End as
isObese

from patients;

44.Show patient_id, first_name, last_name, and attending doctor's specialty. Show only the patients who
has a diagnosis as 'Epilepsy' and the doctor's first name is 'Lisa' Check patients, admissions, and doctors
tables for required information.
select p.patient_id, p.first_name, p.last_name, d.specialty from patients as p
join admissions as a on p.patient_id = a.patient_id
join doctors as d on d.doctor_id =
a.attending_doctor_id

where a.diagnosis = "Epilepsy" and d.first_name = "Lisa";

45.All patients who have gone through admissions, can see their medical documents on our site. Those
patients are given a temporary password after their first admission. Show the patient_id and
temp_password. The password must be the following, in order: (A).
patient_id (B). the numerical length of patient's last_name (C). year of patient's birth_date.

select distinct(p.patient_id), concat(p.patient_id,len(p.last_name),year(p.birth_date)) as temp_password from


patients as p

join admissions as a

on p.patient_id = a.patient_id;

46.Each admission costs $50 for patients without insurance, and $10 for patients with insurance. All
patients with an even patient_id have insurance. Give each patient a 'Yes' if they have insurance, and a 'No'
if they don't have insurance. Add up the dmission_total cost for each has_insurance group.
select case
when patient_id % 2 = 0 then "Yes" else "No"

end as has_insurance,

sum(case

when patient_id % 2 = 0 then 10 else 50

end) as cost_as_per_insurance_availability

from admissions

19
group by has_insurance;

47.Show the provinces that has more patients identified as 'M' than 'F'. Must only show full province_name.
select pn.province_name from patients as p
join province_names as pn

on p.province_id = pn.province_id group by province_name having sum(case

when p.gender = "M" then 1 else 0 end) >

sum(case

when p.gender = "F" then 1 else 0 end);

48.We are looking for a specific patient. Pull all columns for the patient
who matches the following criteria:- First_name contains an 'r' after the
first two letters.- Identifies their gender as 'F'- Born in February, May, or
December- Their weight would be between 60kg and 80kg- Their
patient_id is an odd number- They are from the city 'Kingston'.

select * from patients where first_name like "__r%" and gender = "F" and month(birth_date) in
(2, 5, 12) and weight between 60 and 80 and patient_id % 2 = 1 and city = "Kingston";
49.Show the percent of patients that have 'M' as their gender. Round the answer to the nearest hundreth
number and in percent form.

select concat(round((sum(case when gender = "M" then 1 else 0 end) *100.00 / count(*)), 2),
"%") as male_percentage from patients;

50.For each day display the total amount of admissions on that day. Display the amount changed from the
previous date.
SELECT admission_date, COUNT(admission_date) AS admission_count, COUNT(admission_date) -
LAG(COUNT(admission_date)) OVER (ORDER BY admission_date) AS admission_count_change
FROM admissions

GROUP BY admission_date;

51.Sort the province names in ascending order in such a way that the province 'Ontario' is always on top.
SELECT province_name

FROM province_names

ORDER BY (province_name = "Ontario") desc, province_name asc;

52.We need a breakdown for the total amount of admissions each doctor has started each year. Show the
doctor_id, doctor_full_name, specialty, year, total_admissions for that year.
select d.doctor_id, concat(d.first_name, " ", d.last_name) as Doc_full_name, d.specialty,
year(a.admission_date) as the_year, count(*) as
total_admissions_started from admissions as a join doctors as d

20
on a.attending_doctor_id = d.doctor_id group by d.doctor_id,
the_year;

1.Write a SQL query to find the top 5 customers with the highest total purchase amount.

Assume you have two tables: Customers (CustomerID, Name) and Orders (OrderID, CustomerID, Amount).
SELECT c.CustomerID, c.Name, SUM(o.Amount) AS TotalPurchase

FROM Customers c

JOIN Orders o ON c.CustomerID = o.CustomerID

GROUP BY c.CustomerID, c.Name

ORDER BY TotalPurchase DESC

LIMIT 5;

2.Write a query to find the nth highest salary from a table Employees with columns EmployeeID, Name, and
Salary.
SELECT DISTINCT Salary

FROM Employees

ORDER BY Salary DESC

LIMIT 1 OFFSET n-1;

3.Replace n with the desired rank (e.g., 2 for the second highest).
Given a table Sales with columns SaleID, ProductID, SaleDate, and Quantity, write a query to find the total
quantity sold for each product per month.

SELECT ProductID, DATE_TRUNC('month', SaleDate) AS Month, SUM(Quantity) AS

TotalQuantity FROM Sales

GROUP BY ProductID, Month ORDER BY ProductID, Month;

4.Write a SQL query to find all employees who have more than one manager. Assume you have a table
Employees (EmployeeID, Name, ManagerID).
SELECT EmployeeID, Name

FROM Employees

GROUP BY EmployeeID, Name

HAVING COUNT(DISTINCT ManagerID) > 1;

21
5.Given a table Orders with columns OrderID, CustomerID, OrderDate, and a table OrderDetails with
columns OrderID, ProductID, Quantity, write a query to find the top 3 products with the highest sales
quantity.
SELECT ProductID, SUM(Quantity) AS TotalQuantity
FROM OrderDetails

GROUP BY ProductID

ORDER BY TotalQuantity DESC

LIMIT 3;

6.Write a SQL query to find the second most recent order date for each customer from a table Orders
(OrderID, CustomerID, OrderDate).
SELECT CustomerID, MAX(OrderDate) AS SecondRecentOrderDate
FROM Orders

WHERE OrderDate < (SELECT MAX(OrderDate) FROM Orders o2 WHERE o2.CustomerID = Orders.CustomerID)

GROUP BY CustomerID;

7.Given a table Employees with columns EmployeeID, Name, DepartmentID, Salary, write a query to find
the highest paid employee in each department.
SELECT DepartmentID, EmployeeID, Name, Salary

FROM Employees e1

WHERE Salary = (SELECT MAX(Salary) FROM Employees e2 WHERE e2.DepartmentID = e1.DepartmentID);

9.Given a table Products with columns ProductID, Name, Price, and a table Sales with columns SaleID,
ProductID, Quantity, write a query to find the product with the highest revenue.
SELECT p.ProductID, p.Name, SUM(p.Price * s.Quantity) AS Revenue
FROM Products p

JOIN Sales s ON p.ProductID = s.ProductID

GROUP BY p.ProductID, p.Name

ORDER BY Revenue DESC LIMIT 1

22
TYPES OF WINDOW FUNCTIONS:
In SQL, window functions are classified into several types based on their purpose and functionality. Here’s a
brief overview of the types of window functions:

23
Practice Problems:
Q1: You have two tables: 'response_times' with columns (request_id, response_time_ms, device_type_id) and 'device_types' with columns (device_type_id, device_name, manufacturer). Write
a query to calculate the 95th percentile of response times for each device manufacturer.

Q2: Given a table 'daily_visits' with columns (visit_date, visit_count), write a query to calculate the 7- day moving average of daily visits for each date.

Q3: Given a table 'stock_prices' with columns (date, stock_symbol, closing_price). What's the cumulative change in stock price compared to the starting price of the year?

Q4: You have two tables: 'products' with columns (product_id, product_name, category_id, price) and 'categories' with columns (category_id, category_name). What is the price difference
between each product and the next most expensive product in that category?

Q5: Given a table 'customer_spending' with columns (customer_id, total_spend), how would you divide customers into 10 deciles based on their total spending?

Q6: Using a table 'daily_active_users' with columns (activity_date, user_count), write a query to calculate the day-over-day change in user count and the growth rate.

Q7: Given a table 'sales' with columns (sale_id, sale_date, amount), how would you calculate the total sales amount for each day of the current month, along with a running total of month-to-
date sales?

Q8: You have two tables 'employee_sales' with columns (employee_id, department_id, sales_amount) and ‘employees’ with columns (employee_id, employee_name), write a query to identify
the top 5 employees by sales amount in each department.

Q9: Using a table 'employee_positions' with columns (employee_id, position, start_date, end_date), write a query to find employees who have been promoted (i.e., changed to a different
position) within 6 months of their initial hire.

What is SQL?

 SQL stands for Structured Query Language


 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard

What Can SQL do?

 SQL can execute queries against a database


 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views
RDBMS

 RDBMS stands for Relational Database Management System.


 RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
 The data in RDBMS is stored in database objects called tables.
 A table is a collections of related data entries and it consists of columns and rows.
SQL Statements

Most of the actions you need to perform on a database are done with SQL statements. The following SQL statement will select all the records in the "Persons" table:

SELECT * FROM Persons

Note:
• SQL is not case sensitive
• Semicolon after SQL Statements?
• Some database systems require a semicolon at the end of each SQL statement.
• Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server.
• We are using MS Access and SQL Server 2000 and we do not have to put a semicolon after each SQL statement, but some database programs force you to use it.
SQL DML and DDL

SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition Language ( DDL ).

The query and update commands form the DML part of SQL:
SELECT - extracts data from a database
UPDATE - updates data in a database
DELETE - deletes data from a database
INSERT INTO - inserts new data into a database

The DDL part of SQL permits database tables to be created or deleted. It also define indexes (keys), specify links between tables, and impose constraints between tables. The most important
DDL statements in SQL are:

CREATE DATABASE - creates a new database


ALTER DATABASE - modifies a database
CREATE TABLE - creates a new table
ALTER TABLE - modifies a table
DROP TABLE - deletes a table
CREATE INDEX - creates an index (search key )
DROP INDEX - deletes an index

The SQL SELECT Statement


 The SELECT statement is used to select data from a database.  The result is stored in a result table, called the result-set.

 SQL SELECT Syntax

SELECT column_name(s)
FROM table_name
And
SELECT * FROM table_name

Note: SQL is not case sensitive. SELECT is the same as select.

The SQL SELECT DISTINCT Statement


 In a table, some of the columns may contain duplicate values. This is not a problem, however, sometimes you will want to list only the different (distinct) values in a table.
 The DISTINCT keyword can be used to return only distinct (different) values.

 SQL SELECT DISTINCT Syntax


SELECT DISTINCT column_name(s)
FROM table_name

The WHERE Clause


 The WHERE clause is used to filter records.
 The WHERE clause is used to extract only those records that fulfill a specified criterion.
 SQL WHERE Syntax

SELECT column_name(s)

FROM table_name

WHERE column_name operator value

Operators Allowed in the WHERE Clause


With the WHERE clause, the following operators can be used:

Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN If you know the exact value you want to return for at least one of the columns

The AND & OR Operators


 The AND & OR operators are used to filter records based on more than one condition.
 The AND operator displays a record if both the first condition and the second condition is true.  The OR operator displays a record if either the first condition or the second condition is
true.

The INSERT INTO Statement


 The INSERT INTO statement is used to insert new records in a table.
 The INSERT INTO statement is used to insert a new row in a table.
 SQL INSERT INTO Syntax
 It is possible to write the INSERT INTO statement in two forms.
 The first form doesn't specify the column names where the data will be inserted, only their values:

INSERT INTO table_name VALUES (value1, value2, value3,...)


 The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)
Insert Data Only in Specified Columns
It is also possible to only add data in specific columns.

The following SQL statement will add a new row, but only add data in the "P_Id", "LastName" and the "FirstName" columns:

INSERT INTO Persons (P_Id, LastName, FirstName) VALUES (5, 'Tjessem', 'Jakob')

The UPDATE Statement


 The UPDATE statement is used to update records in a table.
 The UPDATE statement is used to update existing records in a table.
 SQL UPDATE Syntax UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value

SQL UPDATE Warning


Be careful when updating records. If we had omitted the WHERE clause in the example above, like this:

UPDATE Persons
SET Address='Nissestien 67', City='Sandnes'

The "Persons" table would have looked like this:

P_Id LastName FirstName Address City


1 Hansen Ola Nissestien 67 Sandnes
2 Svendson Tove Nissestien 67 Sandnes
3 Pettersen Kari Nissestien 67 Sandnes
4 Nilsen Johan Nissestien 67 Sandnes 5 Tjessem Jakob Nissestien 67 Sandnes

The DELETE Statement


 The DELETE statement is used to delete records in a table.
 The DELETE statement is used to delete rows in a table.
 SQL DELETE Syntax
DELETE FROM table_name
WHERE some_column=some_value

SQL ADVANCE

The TOP Clause


 The TOP clause is used to specify the number of records to return.
 The TOP clause can be very useful on large tables with thousands of records. Returning a large number of records can impact on performance.

Note: Not all database systems support the TOP clause.

 SQL Server Syntax:


SELECT TOP number|percent column_name(s) FROM table_name
 SQL SELECT TOP Equivalent in MySQL and Oracle:

• MySQL Syntax:
SELECT column_name(s)
FROM table_name
LIMIT number
Example: SELECT *
FROM Persons
LIMIT 5

• Oracle Syntax
SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number

Example SELECT *
FROM Persons
WHERE ROWNUM <=5 SQL TOP Example

The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

4 Nilsen Tom Vingvn 23 Stavanger

Now we want to select only the two first records in the table above.

We use the following SELECT statement:

SELECT TOP 2 * FROM Persons

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes


SQL TOP PERCENT Example
The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

4 Nilsen Tom Vingvn 23 Stavanger

Now we want to select only 50% of the records in the table above.

We use the following SELECT statement: SELECT TOP 50 PERCENT * FROM Persons

The result-set will look like this:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

SQL Wildcards
 SQL wildcards can be used when searching for data in a database.
 SQL wildcards can substitute for one or more characters when searching for data in a database.
 SQL wildcards must be used with the SQL LIKE operator.
 With SQL, the following wildcards can be used:

Wildcard Description
% A substitute for zero or more characters
_ A substitute for exactly one character
[charlist] Any single character in charlist [^charlist] Any single character not in charlist or

[! charlist ]

SQL Wildcard Examples


P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
We have the following "Persons" table:

Using the % Wildcard


Now we want to select the persons living in a city that starts with "sa" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE 'sa%'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

Next, we want to select the persons living in a city that contains the pattern "nes" from the "Persons" table.

We use the following SELECT statement:


SELECT * FROM Persons
WHERE City LIKE '%nes%'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

Using the _ Wildcard


Now we want to select the persons with a first name that starts with any character, followed by "la" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE FirstName LIKE '_la'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

Next, we want to select the persons with a last name that starts with "S", followed by any character, followed by "end", followed by any character, followed by "on" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName LIKE 'S_end_on'

The result-set will look like this:

P_Id LastName FirstName Address City


2 Svendson Tove Borgvn 23 Sandnes
Using the [charlist] Wildcard
Now we want to select the persons with a last name that starts with "b" or "s" or "p" from the "Persons" table.
We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName LIKE '[bsp]%'

The result-set will look like this:

P_Id LastName FirstName Address City


2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Next, we want to select the persons with a last name that do not start with "b" or "s" or "p" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName LIKE '[!bsp]%'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

The LIKE Operator


 The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.  The LIKE operator is used to search for a specified pattern in a column.

 SQL LIKE Syntax:


SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern

LIKE Operator Example


The "Persons" table:
P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger


Now we want to select the persons living in a city that starts with "s" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE City LIKE 's%'

The "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern.

The result-set will look like this:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

Next, we want to select the persons living in a city that ends with an "s" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE City LIKE '%s'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

Next, we want to select the persons living in a city that contains the pattern "tav" from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE City LIKE '%tav%'

The result-set will look like this:

P_Id LastName FirstName Address City

3 Pettersen Kari Storgt 20 Stavanger

It is also possible to select the persons living in a city that NOT contains the pattern "tav" from the "Persons" table, by using the NOT keyword.
We use the following SELECT statement:

SELECT * FROM Persons WHERE City NOT LIKE '%tav%'

The result-set will look like this:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

The IN Operator
 The IN operator allows you to specify multiple values in a WHERE clause.
 SQL IN Syntax:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)

IN Operator Example
Now we want to select the persons with a last name equal to "Hansen" or "Pettersen" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE LastName IN ('Hansen','Pettersen') The result-set will look like this:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The BETWEEN Operator


 The BETWEEN operator is used in a WHERE clause to select a range of data between two values.
 The BETWEEN operator selects a range of data between two values. The values can be numbers, text, or dates.
 SQL BETWEEN Syntax:
SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2

BETWEEN Operator Example


The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the persons with a last name alphabetically between "Hansen" and "Pettersen" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE LastName
BETWEEN 'Hansen' AND 'Pettersen'

SQL Alias
 With SQL, an alias name can be given to a table or to a column.
 You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names.

 An alias name could be anything, but usually it is short.

 SQL Alias Syntax for Tables:


SELECT column_name(s)
FROM table_name AS alias_name

 SQL Alias Syntax for Columns:


SELECT column_name AS alias_name
FROM table_name
Alias Example
Assume we have a table called "Persons" and another table called "Product_Orders". We will give the table aliases of "p" and "po" respectively.

Now we want to list all the orders that "Ola Hansen" is responsible for.

We use the following SELECT statement:

SELECT po.OrderID, p.LastName, p.FirstName


FROM Persons AS p, Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'

The same SELECT statement without aliases:


SELECT Product_Orders.OrderID, Persons.LastName, Persons.FirstName
FROM Persons, Product_Orders
WHERE Persons.LastName='Hansen' AND Persons.FirstName='Ola'

SQL JOIN
 SQL joins are used to query data from two or more tables, based on a relationship between certain columns in these tables.
 The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
 Tables in a database are often related to each other with keys.
 A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together,
across tables, without repeating all of the data in every table.

Different SQL JOINs


Before we continue with examples, we will list the types of JOIN you can use, and the differences between them.

• JOIN: Return rows when there is at least one match in both tables
• LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table

• RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table

• FULL JOIN: Return rows when there is a match in one of the tables

SQL INNER JOIN Keyword


 The INNER JOIN keyword return rows when there is at least one match in both tables.

 SQL INNER JOIN Syntax:


SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name

SQL INNER JOIN Example


The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:


O_Id OrderNo P_Id

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the persons with any orders.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons INNER JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName

The result-set will look like this:

LastNameFirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

The INNER JOIN keyword return rows when there is at least one match in both tables. If there are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.
SQL LEFT JOIN Keyword
 The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

 SQL LEFT JOIN Syntax:


SELECT column_name(s)
FROM table_name1 LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name

 PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.

SQL LEFT JOIN Example


The "Persons" table:
P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the persons and their orders - if any, from the tables above.
We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons LEFT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastNameFirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

Svendson Tove

Notes: The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in the right table (Orders).

SQL RIGHT JOIN Keyword


 The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if there are no matches in the left table (table_name1).

 SQL RIGHT JOIN Syntax:


SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name

 PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.


SQL RIGHT JOIN Example
The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the orders with containing persons - if any, from the tables above.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons
RIGHT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:


LastNameFirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

34764

Notes: The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no matches in the left table (Persons).

SQL FULL JOIN Keyword


 The FULL JOIN keyword return rows when there is a match in one of the tables.

 SQL FULL JOIN Syntax:


SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name

SQL FULL JOIN Example


The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the persons and their orders, and all the orders with their persons.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo

FROM Persons
FULL JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastNameFirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

Svendson Tove

34764

Notes: The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from the right table (Orders). If there are rows in "Persons" that do not have matches in
"Orders", or if there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as well.

The SQL UNION Operator


 The UNION operator is used to combine the result-set of two or more SELECT statements.
 Notice that each SELECT statement within the UNION must have the same number of columns. The columns must also have similar data types. Also, the columns in each SELECT
statement must be in the same order.

 SQL UNION Syntax:


SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
Note: The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.

 SQL UNION ALL Syntax:


SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2

 PS: The column names in the result-set of a UNION are always equal to the column names in the first SELECT statement in the UNION.

SQL UNION Example


Look at the following tables:

"Employees_Norway":

E_IDE_Name

01 Hansen, Ola

02 Svendson, Tove

03 Svendson, Stephen

04 Pettersen, Kari

"Employees_USA":

E_ID E_Name

01 Turner, Sally

02 Kent, Clark

03 Svendson, Stephen

04 Scott, Stephen

Now we want to list all the different employees in Norway and USA.

We use the following SELECT statement:

SELECT E_Name FROM Employees_Norway


UNION
SELECT E_Name FROM Employees_USA

The result-set will look like this:


E_Name

Hansen, Ola

Svendson, Tove

Svendson, Stephen

Pettersen, Kari

Turner, Sally

Kent, Clark

Scott, Stephen

Note: This command cannot be used to list all employees in Norway and USA. In the example above we have two employees with equal names, and only one of them will be listed. The UNION
command selects only distinct values.

SQL UNION ALL Example


Now we want to list all employees in Norway and USA:

SELECT E_Name FROM Employees_Norway


UNION ALL
SELECT E_Name FROM Employees_USA
The SQL SELECT INTO Statement
 The SQL SELECT INTO statement can be used to create backup copies of tables.
 The SELECT INTO statement selects data from one table and inserts it into a different table.

 The SELECT INTO statement is most often used to create backup copies of tables.

 SQL SELECT INTO Syntax

We can select all columns into the new table:

SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_tablename

Or we can select only the columns we want into the new table:

SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_tablename
SQL SELECT INTO Example
Make a Backup Copy - Now we want to make an exact copy of the data in our "Persons" table.

We use the following SQL statement:

SELECT *
INTO Persons_Backup FROM Persons

We can also use the IN clause to copy the table into another database:

SELECT *
INTO Persons_Backup IN 'Backup.mdb' FROM Persons

We can also copy only a few fields into the new table:

SELECT LastName,FirstName
INTO Persons_Backup
FROM Persons

SQL SELECT INTO - With a WHERE Clause


We can also add a WHERE clause.

The following SQL statement creates a "Persons_Backup" table with only the persons who lives in the city "Sandnes":

SELECT LastName,Firstname
INTO Persons_Backup
FROM Persons
WHERE City='Sandnes'

SQL SELECT INTO - Joined Tables


Selecting data from more than one table is also possible.

The following example creates a "Persons_Order_Backup" table contains data from the two tables "Persons" and "Orders":
SELECT Persons.LastName,Orders.OrderNo
INTO Persons_Order_Backup
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id

The CREATE DATABASE Statement


 The CREATE DATABASE statement is used to create a database.

 SQL CREATE DATABASE Syntax:


CREATE DATABASE database_name

CREATE DATABASE Example


Now we want to create a database called "my_db".

We use the following CREATE DATABASE statement:

CREATE DATABASE my_db

Database tables can be added with the CREATE TABLE statement.

The CREATE TABLE Statement

The CREATE TABLE statement is used to create a table in a database.

SQL CREATE TABLE Syntax:


CREATE TABLE table_name
(
column_name1 data_type, column_name2 data_type, column_name3 data_type,
....
)

The data type specifies what type of data the column can hold. For a complete reference of all the data types available in MS Access, MySQL, and SQL Server.

CREATE TABLE Example


Now we want to create a table called "Persons" that contains five columns: P_Id, LastName, FirstName, Address, and City.

We use the following CREATE TABLE statement:


CREATE TABLE Persons
(
P_Id int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255) )

The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City columns are of type varchar with a maximum length of 255 characters.

The empty "Persons" table will now look like this:

P_Id LastName FirstName Address City

The empty table can be filled with data with the INSERT INTO statement.

SQL Constraints
 Constraints are used to limit the type of data that can go into a table.
 Constraints can be specified when a table is created (with the CREATE TABLE statement) or after the table is created (with the ALTER TABLE statement).
We will focus on the following constraints:

• NOT NULL  UNIQUE

• PRIMARY KEY

• FOREIGN KEY

• CHECK

• DEFAULT

SQL NOT NULL Constraint


 The NOT NULL constraint enforces a column to NOT accept NULL values.
 The NOT NULL constraint enforces a field to always contain a value. This means that you cannot insert a new record, or update a record without adding a value to this field.
The following SQL enforces the "P_Id" column and the "LastName" column to not accept NULL values:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

SQL UNIQUE Constraint

 The UNIQUE constraint uniquely identifies each record in a database table.


 The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns.
 A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
 Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

SQL UNIQUE Constraint on CREATE TABLE


The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is created:

MySQL:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
UNIQUE (P_Id) )

SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName) )

SQL UNIQUE Constraint on ALTER TABLE


To create a UNIQUE constraint on the "P_Id" column when the table is already created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD UNIQUE (P_Id)

To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)

To DROP a UNIQUE Constraint

To drop a UNIQUE constraint, use the following SQL:

MySQL:

ALTER TABLE Persons


DROP INDEX uc_PersonID

SQL Server / Oracle / MS Access:


ALTER TABLE Persons
DROP CONSTRAINT uc_PersonID

SQL PRIMARY KEY Constraint


 The PRIMARY KEY constraint uniquely identifies each record in a database table.
 Primary keys must contain unique values.
 A primary key column cannot contain NULL values.
 Each table should have a primary key, and each table can have only ONE primary key.

SQL PRIMARY KEY Constraint on CREATE TABLE


The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is created:

MySQL:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
PRIMARY KEY (P_Id) )

SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) )

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:


CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName) )

SQL PRIMARY KEY Constraint on ALTER TABLE


To create a PRIMARY KEY constraint on the "P_Id" column when the table is already created, use the following SQL:
MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD PRIMARY KEY (P_Id)

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)

Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s) must already have been declared to not contain NULL values (when the table was first created).

To DROP a PRIMARY KEY Constraint


To drop a PRIMARY KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Persons DROP PRIMARY KEY

SQL Server / Oracle / MS Access:

ALTER TABLE Persons DROP CONSTRAINT pk_PersonID

SQL FOREIGN KEY Constraint


 A FOREIGN KEY in one table points to a PRIMARY KEY in another table.

SQL FOREIGN KEY Constraint on CREATE TABLE


The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is created:

MySQL:

CREATE TABLE Orders


(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
FOREIGN KEY (P_Id) REFERENCES Persons(P_Id)
)
SQL Server / Oracle / MS Access:

CREATE TABLE Orders


(
O_Id int NOT NULL PRIMARY KEY,
OrderNo int NOT NULL,
P_Id int FOREIGN KEY REFERENCES Persons(P_Id) )

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:


CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id) )

SQL FOREIGN KEY Constraint on ALTER TABLE


To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders


ADD FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders


ADD CONSTRAINT fk_PerOrders
FOREIGN KEY ( P_Id )
REFERENCES Persons(P_Id)

To DROP a FOREIGN KEY Constraint


To drop a FOREIGN KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Orders


DROP FOREIGN KEY fk_PerOrders

SQL Server / Oracle / MS Access:

ALTER TABLE Orders


DROP CONSTRAINT fk_PerOrders

SQL CHECK Constraint


 The CHECK constraint is used to limit the value range that can be placed in a column.
 If you define a CHECK constraint on a single column it allows only certain values for this column.
 If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.

SQL CHECK Constraint on CREATE TABLE


The following SQL creates a CHECK constraint on the "P_Id" column when the "Persons" table is created. The CHECK constraint specifies that the column "P_Id" must only include integers greater
than 0.

My SQL:
CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CHECK (P_Id>0) )

SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL CHECK (P_Id>0),
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes') )

SQL CHECK Constraint on ALTER TABLE


To create a CHECK constraint on the "P_Id" column when the table is already created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CHECK (P_Id>0)

To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')

To DROP a CHECK Constraint


To drop a CHECK constraint, use the following SQL:

SQL Server / Oracle / MS Access:


ALTER TABLE Persons DROP CONSTRAINT chk_Person

SQL DEFAULT Constraint


 The DEFAULT constraint is used to insert a default value into a column.
 The default value will be added to all new records, if no other value is specified.

SQL DEFAULT Constraint on CREATE TABLE


The following SQL creates a DEFAULT constraint on the "City" column when the "Persons" table is created:

My SQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) DEFAULT 'Sandnes' )

The DEFAULT constraint can also be used to insert system values, by using functions like GETDATE():

CREATE TABLE Orders


(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
OrderDate date DEFAULT GETDATE()
)

SQL DEFAULT Constraint on ALTER TABLE


To create a DEFAULT constraint on the "City" column when the table is already created, use the following SQL:

MySQL:

ALTER TABLE Persons


ALTER City SET DEFAULT 'SANDNES'

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ALTER COLUMN City SET DEFAULT 'SANDNES'

To DROP a DEFAULT Constraint

To drop a DEFAULT constraint, use the following SQL:


MySQL:

ALTER TABLE Persons


ALTER City DROP DEFAULT

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ALTER COLUMN City DROP DEFAULT

Indexes

 The CREATE INDEX statement is used to create indexes in tables.

 Indexes allow the database application to find data fast; without reading the whole table.

 An index can be created in a table to find data more quickly and efficiently.
 The users cannot see the indexes, they are just used to speed up searches/queries.

Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update). So you should only create indexes on columns (and tables) that
will be frequently searched against.

SQL CREATE INDEX Syntax

Creates an index on a table. Duplicate values are allowed:

CREATE INDEX index_name


ON table_name (column_name)

SQL CREATE UNIQUE INDEX Syntax

Creates a unique index on a table. Duplicate values are not allowed:

CREATE UNIQUE INDEX index_name


ON table_name (column_name)

Note: The syntax for creating indexes varies amongst different databases. Therefore: Check the syntax for creating indexes in your database.

CREATE INDEX Example


The SQL statement below creates an index named "PIndex" on the "LastName" column in the "Persons" table:

CREATE INDEX PIndex


ON Persons (LastName)
If you want to create an index on a combination of columns, you can list the column names within the parentheses, separated by commas:

CREATE INDEX PIndex ON Persons (LastName, FirstName)

The DROP INDEX Statement


 Indexes, tables, and databases can easily be deleted/removed with the DROP statement.
 The DROP INDEX statement is used to delete an index in a table.

DROP INDEX Syntax for MS Access: DROP INDEX index_name ON table_name

DROP INDEX Syntax for MS SQL Server: DROP INDEX table_name.index_name

DROP INDEX Syntax for DB2/Oracle: DROP INDEX index_name

DROP INDEX Syntax for MySQL:


ALTER TABLE table_name DROP INDEX index_name The DROP TABLE Statement

The DROP TABLE statement is used to delete a table.

DROP TABLE table_name

The DROP DATABASE Statement


The DROP DATABASE statement is used to delete a database.

DROP DATABASE database_name

The TRUNCATE TABLE Statement


What if we only want to delete the data inside the table, and not the table itself?

Then, use the TRUNCATE TABLE statement:

TRUNCATE TABLE table_name

The ALTER TABLE Statement


The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

SQL ALTER TABLE Syntax

To add a column in a table, use the following syntax:

ALTER TABLE table_name


ADD column_name datatype

To delete a column in a table, use the following syntax (notice that some database systems don't allow deleting a column):

ALTER TABLE table_name


DROP COLUMN column_name

To change the data type of a column in a table, use the following syntax:

ALTER TABLE table_name


ALTER COLUMN column_name datatype

SQL ALTER TABLE Example


Look at the "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

Now we want to add a column named "DateOfBirth" in the "Persons" table.


We use the following SQL statement:

ALTER TABLE Persons


ADD DateOfBirth date

Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data type specifies what type of data the column can hold. For a complete reference of all the data
types available in MS Access, MySQL, and SQL Server, go to our complete Data Types reference.

The "Persons" table will now like this:


P_Id LastName FirstName Address City DateOfBirth

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

Change Data Type Example


Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

ALTER TABLE Persons


ALTER COLUMN DateOfBirth year

DROP COLUMN Example


Next, we want to delete the column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

ALTER TABLE Persons


DROP COLUMN DateOfBirth

AUTO INCREMENT a Field


 Auto-increment allows a unique number to be generated when a new record is inserted into a table.
 Very often we would like the value of the primary key field to be created automatically every time a new record is inserted.

We would like to create an auto-increment field in a table.

Syntax for MySQL

The following SQL statement defines the "P_Id" column to be an auto-increment primary key field in the "Persons" table:

CREATE TABLE Persons


(
P_Id int NOT NULL AUTO_INCREMENT,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
PRIMARY KEY (P_Id) )

MySQL uses the AUTO_INCREMENT keyword to perform an auto-increment feature.

By default, the starting value for AUTO_INCREMENT is 1, and it will increment by 1 for each new record.

To let the AUTO_INCREMENT sequence start with another value, use the following SQL statement:

ALTER TABLE Persons AUTO_INCREMENT=100

To insert a new record into the "Persons" table, we will not have to specify a value for the "P_Id" column ( a unique value will be added automatically ):

INSERT INTO Persons (FirstName,LastName)


VALUES ('Lars','Monsen')

The SQL statement above would insert a new record into the "Persons" table. The "P_Id" column would be assigned a unique value. The "FirstName" column would be set to "Lars" and the
"LastName" column would be set to "Monsen".

Syntax for SQL Server

The following SQL statement defines the "P_Id" column to be an auto-increment primary key field in the "Persons" table:

CREATE TABLE Persons


(
P_Id int PRIMARY KEY IDENTITY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) )

The MS SQL Server uses the IDENTITY keyword to perform an auto-increment feature.

By default, the starting value for IDENTITY is 1, and it will increment by 1 for each new record.

To specify that the "P_Id" column should start at value 10 and increment by 5, change the identity to IDENTITY(10,5).

To insert a new record into the "Persons" table, we will not have to specify a value for the "P_Id" column ( a unique value will be added automatically ):

INSERT INTO Persons (FirstName,LastName)


VALUES ('Lars','Monsen')

The SQL statement above would insert a new record into the "Persons" table. The "P_Id" column would be assigned a unique value. The "FirstName" column would be set to "Lars" and the
"LastName" column would be set to "Monsen".

Syntax for Access

The following SQL statement defines the "P_Id" column to be an auto-increment primary key field in the "Persons" table:

CREATE TABLE Persons


(
P_Id PRIMARY KEY AUTOINCREMENT,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) )

The MS Access uses the AUTOINCREMENT keyword to perform an auto-increment feature.

By default, the starting value for AUTOINCREMENT is 1, and it will increment by 1 for each new record.
To specify that the "P_Id" column should start at value 10 and increment by 5, change the autoincrement to AUTOINCREMENT(10,5).

To insert a new record into the "Persons" table, we will not have to specify a value for the "P_Id" column ( a unique value will be added automatically ):
INSERT INTO Persons (FirstName,LastName)
VALUES ('Lars','Monsen')

The SQL statement above would insert a new record into the "Persons" table. The "P_Id" column would be assigned a unique value. The "FirstName" column would be set to "Lars" and the
"LastName" column would be set to "Monsen".

Syntax for Oracle

In Oracle the code is a little bit more tricky.

You will have to create an auto-increment field with the sequence object (this object generates a number sequence).

Use the following CREATE SEQUENCE syntax:

CREATE SEQUENCE seq_person


MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10

The code above creates a sequence object called seq_person, that starts with 1 and will increment by 1. It will also cache up to 10 values for performance. The cache option specifies how many
sequence values will be stored in memory for faster access.

To insert a new record into the "Persons" table, we will have to use the nextval function (this function retrieves the next value from seq_person sequence):

INSERT INTO Persons (P_Id,FirstName,LastName)


VALUES (seq_person.nextval,'Lars','Monsen')

SQL CREATE VIEW Statement


 In SQL, a view is a virtual table based on the result-set of an SQL statement.
 A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

 You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.

 SQL CREATE VIEW Syntax:


CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

SQL CREATE VIEW Examples


If you have the Northwind database you can see that it has several views installed by default.

The view "Current Product List" lists all active products (products that are not discontinued) from the "Products" table. The view is created with the following SQL:

CREATE VIEW [Current Product List] AS


SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No

We can query the view above as follows:


SELECT * FROM [Current Product List]

Another view in the Northwind sample database selects every product in the "Products" table with a unit price higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS


SELECT ProductName,UnitPrice
FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)

We can query the view above as follows:

SELECT * FROM [Products Above Average Price]

Another view in the Northwind database calculates the total sale for each category in 1997. Note that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS


SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997] GROUP BY CategoryName

We can query the view above as follows:

SELECT * FROM [Category Sales For 1997]

We can also add a condition to the query. Now we want to see the total sale only for the category "Beverages":
SELECT * FROM [Category Sales For 1997]
WHERE CategoryName='Beverages'

SQL Updating a View

You can update a view by using the following syntax:

SQL CREATE OR REPLACE VIEW Syntax:


CREATE OR REPLACE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Now we want to add the "Category" column to the "Current Product List" view. We will update the view with the following SQL:

CREATE VIEW [Current Product List] AS


SELECT ProductID,ProductName,Category
FROM Products WHERE Discontinued=No

SQL Dropping a View


You can delete a view with the DROP VIEW command.

SQL DROP VIEW Syntax:


DROP VIEW view_name SQL Dates

MySQL Date Functions


The following table lists the most important built-in date functions in MySQL:
Function Description
NOW() Returns the current date and time
CURDATE() Returns the current date
CURTIME() Returns the current time
DATE() Extracts the date part of a date or date/time expression
EXTRACT() Returns a single part of a date/time
DATE_ADD() Adds a specified time interval to a date
DATE_SUB() Subtracts a specified time interval from a date
DATEDIFF() Returns the number of days between two dates DATE_FORMAT() Displays date/time data in different formats

10. Find Employees Whose Salary is Greater Than Their Manager's Salary

Sample Table: `Employees`

| EmployeeID | Name | Salary | ManagerID |

|------------|--------|--------|-----------|
|1 | Alice | 5000 |3 |

|2 | Bob | 7000 |1 |

|3 | Carol | 6000 | NULL |

|4 | Dave | 8000 |3 |

|5 | Eve | 5500 |2 |
Query Solution:

SELECT e1.Name, e1.Salary

FROM Employees e1

JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID

WHERE e1.Salary > e2.Salary;

11. Calculate Cumulative Sum in a New Column

Sample Table: `Sales`

| SaleID | Amount |

|--------|--------|

|1 | 100 |

|2 | 200 |

|3 | 300 |

|4 | 150 |
Query Solution:
SELECT SaleID, Amount,

SUM(Amount) OVER (ORDER BY SaleID) AS CumulativeSum FROM Sales;

12. Write a Query to Delete Duplicate Rows

Sample Table: `Employees`

| EmployeeID | Name | Salary |

|------------|--------|--------|

|1 | Alice | 5000 |

|2 | Bob | 7000 |

|3 | Bob | 7000 |

|4 | Dave | 8000 |

|5 | Eve | 5500 |
Query Solution:

WITH CTE AS (

SELECT Name, Salary,

ROW_NUMBER() OVER (PARTITION BY Name, Salary ORDER BY EmployeeID) AS RowNum

FROM Employees

DELETE FROM CTE

WHERE RowNum > 1;

21. Query to Calculate Mean, Median, and Mode


Sample Table: `Employees`

| EmployeeID | Name | Salary |

|------------|--------|--------|

|1 | Alice | 5000 |

|2 | Bob | 7000 |

|3 | Carol | 6000 |
|4 | Dave | 8000 |

|5 | Eve | 5000 |
Query Solution (Mean):

SELECT AVG(Salary) AS MeanSalary

FROM Employees;

Query Solution (Median):

SELECT AVG(Salary) AS MedianSalary

FROM (

SELECT Salary

FROM Employees

ORDER BY Salary

LIMIT 2 - (SELECT COUNT(*) FROM Employees) % 2 -- Handles even/odd row count

OFFSET (SELECT (COUNT(*) - 1) / 2 FROM Employees)

) AS MedianSubquery;
Query Solution (Mode):

SELECT Salary AS ModeSalary

FROM Employees

GROUP BY Salary

ORDER BY COUNT(*) DESC

LIMIT 1;

22. Query to Fetch Employees Earning More Than Their Department's


Average Salary

Sample Table: `Employees`

| EmployeeID | Name | Salary | DepartmentID |

|------------|--------|--------|--------------|

|1 | Alice | 5000 |1 |

|2 | Bob | 7000 |1 |

|3 | Carol | 6000 |2 |

|4 | Dave | 8000 |2 |

|5 | Eve | 5500 |1 |
Query Solution:
SELECT e.Name, e.Salary, e.DepartmentID

FROM Employees e

JOIN (

SELECT DepartmentID, AVG(Salary) AS AvgSalary

FROM Employees
GROUP BY DepartmentID

) AS dept_avg ON e.DepartmentID = dept_avg.DepartmentID

WHERE e.Salary > dept_avg.AvgSalary;

24. Calculate Month-over-Month Sales Growth

Sample Table: Sales

SaleID SaleDate SaleAmount

1 2024-01-15 500

2 2024-01-20 700

3 2024-02-10 600

4 2024-02-15 800

5 2024-03-05 900

6 2024-03-20 1000
Goal: Find the total sales amount for each month and calculate the month-over-month sales growth.

Query:

SELECT

DATE_FORMAT(SaleDate, '%Y-%m') AS SaleMonth,


SUM(SaleAmount) AS TotalSales,

LAG(SUM(SaleAmount)) OVER (ORDER BY DATE_FORMAT(SaleDate,


'%Y-%m')) AS PreviousMonthSales,
(SUM(SaleAmount) - LAG(SUM(SaleAmount)) OVER (ORDER BY
DATE_FORMAT(SaleDate, '%Y-%m'))) / LAG(SUM(SaleAmount)) OVER
(ORDER BY DATE_FORMAT(SaleDate, '%Y-%m')) * 100 AS MonthOverMonthGrowth

FROM

Sales

GROUP BY

DATE_FORMAT(SaleDate, '%Y-%m')

ORDER BY

SaleMonth;

● state

xmlStores xml data

How to Solve: Use the LIKE operator to filter email addresses by provider. Use % as a wildcard to match any email address ending with the specified provider.

1. How Does the CHECK Constraint Function? Problem Statement: Explain the CHECK constraint in SQL and provide an example of how it can be used to ensure data integrity.
How to Solve: Define the CHECK constraint as a rule applied to column values. Provide examples to show its usage in maintaining data quality, such as ensuring positive values or date ranges.

2. Calculate Average Card Usage Per Month Problem Statement: Find the average transaction cost per cardholder for each month. The transactions are recorded in a table with transaction
IDs, cardholder IDs, transaction dates, and transaction costs.
How to Solve: Extract month from transaction_date. Group by month and card_holder_id. Calculate the average transaction cost for each cardholder each month.
3. Calculating Click-Through-Rate for Marketing Campaigns Problem Statement: Calculate the clickthrough rate (CTR) for each marketing campaign. CTR is the ratio of the number of clicks to
the number of views, expressed as a percentage. You have two tables: campaigns and clicks.
How to Solve: Join the campaigns and clicks tables on campaign_id. Count 'Clicked' and 'Viewed' actions for each campaign. Calculate CTR as (Clicked / Viewed) * 100.
4. Distinction Between Cross Join and Natural Join Problem Statement: Explain the difference between a cross join and a natural join. Provide examples for both types of joins.
How to Solve: Define cross join (Cartesian product) and natural join (based on common columns). Provide example queries for each type of join to illustrate the differences.
1.Identify the VIP Customers for American Express Problem Statement: Find customers who have made transactions exceeding $5000 each and have done so more than once. These
customers are considered 'VIP' or 'Whale' customers.
How to Solve:
Filter transactions with amounts greater than or equal to $5000. Group by customer and count the number of qualifying transactions. Filter groups with more than one qualifying transaction.
1. Employees Earning More Than Their Managers Problem Statement: Identify employees whose salaries exceed those of their direct managers.
How to Solve:
Perform a self-join on the employee table to compare employees with their managers. Filter where employee’s salary is greater than manager’s salary.
2. Calculate Average Transaction Amount per Year per Client Problem Statement: Compute the average transaction amount for each client, segmented by year, for the years 2020 to 2024.
How to Solve: Extract the year from transaction dates. Group by client and year. Calculate the average transaction amount.
3. Find Products with Sales Greater Than Their Average Sales in the Last 12 Months Problem Statement: Identify products whose total sales in the last 12 months exceed their average
monthly sales.
How to Solve:
Aggregate monthly sales for each product. Compute average sales per product. Compare total sales to average sales.
4. Determine the Churn Rate for Customers Who Made Their First Purchase in the Last 6 Months Problem Statement: Calculate the churn rate for customers who made their first purchase
within the last 6 months but have not made any purchase in the last 30 days.
How to Solve:
Identify customers with their first purchase in the last 6 months. Filter out customers who have not made a purchase in the last 30 days. Compute churn rate based on total new customers and
churned customers.

Data Engineering pdf


WRITE A QUERY TO FIND THE TOP 3 CUSTOMERS WHO HAVE MADE THE MOST PURCHASES IN THE LAST MONTH
WITH customer_purchases AS ( SELECTcustomer_id, COUNT(*) as purchase_count, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) as row_num FROM orders WHERE order_date >=
DATE_TRUNC('month' , CURRENT_DATE) - INTERVAL '1 month' GROUP BYcustomer_id ) SELECTcustomer_id, purchase_count FROM customer_purchases WHERErow_num <= 3;

CREATE A QUERY TO CALCULATE A RUNNING TOTAL OF SALES FOR EACH PRODUCT CATEGORY, ORDERED BY DATE

SELECT o.order_date, p.category, SUM(o.total_amount) as daily_sales, SUM(SUM(o.total_amount)) OVER ( PARTITION BY p.category ORDER BY o.order_date ROWS UNBOUNDED
PRECEDING ) as running_total FROM orders o JOIN products p ON o.product_id = p.id GROUP BY o.order_date, p.category ORDER BY p.category, o.order_date;

WRITE A QUERY TO FIND EMPLOYEES WHO EARN MORE THAN THEIR DEPARTMENT'S AVERAGE SALARY
WITH dept_avg AS ( SELECT department_id, AVG(salary) as avg_salary FROM employees GROUP BY department_id ) SELECT e.employee_id, e.name, e.salary, e.department_id FROM
employees e JOIN dept_avg d ON e.department_id = d.department_id WHERE e.salary > d.avg_salary;

CREATE A QUERY TO IDENTIFY CUSTOMERS WHO HAVE MADE PURCHASES IN CONSECUTIVE MONTHS
WITH monthly_purchases AS ( SELECT customer_id, DATE_TRUNC('month' , order_date) as order_month, LEAD(DATE_TRUNC('month' , order_date), 1) OVER ( PARTITION BYcustomer_id
ORDER BY DATE_TRUNC('month' , order_date) ) as next_month FROM orders ) SELECT DISTINCTcustomer_id FROM monthly_purchases WHERE DATEDIFF(month, order_month, next_month)
= 1;

WRITE A QUERY TO PIVOT SALES DATA FROM ROWS TO COLUMNS, SHOWING QUARTERLY SALES FOR EACH PRODUCT.
SELECT product_id, SUM(CASE WHEN EXTRACT(QUARTER FROM order_date) = 1 THEN total_amountELSE 0 END) as Q1, SUM(CASE WHEN EXTRACT(QUARTER FROM order_date) = 2 THEN
total_amountELSE 0 END) as Q2, SUM(CASE WHEN EXTRACT(QUARTER FROM order_date) = 3 THEN total_amountELSE 0 END) as Q3, SUM(CASE WHEN EXTRACT(QUARTER FROM order_date)
= 4 THEN total_amountELSE 0 END) as Q4 FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2023 GROUP BY product_id;

CREATE A QUERY TO FIND THE MEDIAN SALARY FOR EACH DEPARTMENT.


WITH ranked_salaries AS ( SELECT department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BYsalary) as row_num, COUNT(*) OVER (PARTITION BY
department_id) as dept_count FROM employees ) SELECT department_id, AVG(salary) as median_salary FROM ranked_salaries WHERE row_num IN (FLOOR((dept_count+1)/2),
CEIL((dept_count+1)/2)) GROUP BY department_id;

WRITE A QUERY TO FIND THE TOP PRODUCT IN EACH CATEGORY BASED ON TOTAL SALES AMOUNT.
WITH ranked_products AS ( SELECT p.category, p.product_id, SUM(o.total_amount) as total_sales, RANK() OVER (PARTITION BY p.category ORDER BY SUM(o.total_amount) DESC) as rank
FROM products p JOIN orders o ON p.product_id = o.product_id GROUP BY p.category, p.product_id ) SELECTcategory, product_id, total_sales FROM ranked_products WHERErank = 1;

CREATE A QUERY TOCALCULATE THE YEAR-OVER-YEAR GROWTH RATE FOR EACH PRODUCT.
WITH yearly_sales AS ( SELECTEXTRACT(YEAR FROM order_date) as year, product_id, SUM(total_amount) as yearly_total FROM orders GROUP BYEXTRACT(YEAR FROM order_date),
product_id ) SELECTcurrent.year, current.product_id, current.yearly_total, previous.yearly_total as prev_year_total, (current.yearly_total - previous.yearly_total) / previous.yearly_total* 100
asgrowth_rate FROM yearly_sales current LEFTJOIN yearly_sales previous ON current.product_id = previous.product_id AND current.year = previous.year + 1 WHERE previous.yearly_total IS
NOT NULL;

WRITE A QUERY TO IDENTIFY CUSTOMERS WHO HAVE NEVER MADE A PURCHASE.


SELECTc.customer_id, c.name FROM customers c LEFTJOIN orders o ON c.customer_id = o.customer_id WHERE o.order_id IS NULL;
CREATE A QUERY TO CALCULATE THE RUNNING TOTAL OF INVENTORY FOR EACH PRODUCT, CONSIDERING BOTH ADDITIONS AND SUBTRACTIONS.
WITH inventory_changes AS ( SELECT product_id, change_date, quantity, SUM(quantity) OVER (PARTITION BY product_id ORDER BYchange_date) as running_total FROM ( SELECT product_id,
date as change_date, received_quantity as quantity FROM inventory_receipts UNION ALL SELECT product_id, date as change_date, -shipped_quantity as quantity FROM
inventory_shipments ) all_changes ) SELECT product_id, change_date, quantity, running_total FROM inventory_changes ORDER BY product_id, change_date;

Window Functions: Advanced Analysis: Learn to use `OVER()` with functions like `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, and

`NTILE()` to perform tasks like running totals and rankings. These functions help analyze data in a more detailed way. Partitioning and Ordering: Break your data into smaller sections
and sort within these sections to perform calculations more efficiently.
CTEs and Temporary Tables:
.Simplify Queries: Use Common Table Expressions (CTEs) and temporary tables to make complex queries easier to understand, especially with large data.

Recursive CTEs: Use these for tasks involving hierarchies, like creating organizational charts or analyzing

relationships in data.
Performance: Know when to use CTEs or temporary tables to manage performance and resources .
Query Optimization:
Improve Performance: Learn how to make queries run faster by using techniques like indexing,
restructuring queries, and understanding execution plans.

Indexing: Explore different types of indexes (e.g., clustered, non-


clustered) and when to use them.

Execution Plans: Learn to read execution plans to find and fix performance issues in your queries.
PIVOT and UNPIVOT:
Transform Data: Use PIVOT to turn rows into columns and UNPIVOT to do the reverse. This helps in
reorganizing data for easier analysis.

Advanced Use: Combine PIVOT and UNPIVOT with other functions to create dynamic summaries and
reshape data for better visualization.

What approach did you take?

1. You have an Employee table with the following columns:

● EmpID (Employee ID)


● Emp_name (Employee Name)
● Manager_id (Manager ID)
● Salary (Employee Salary)
● Location (Employee Location)

Write a SQL query to find employees whose salary is greater than the average salary of their respective
location.

Answer:
SELECT e.EmpID, e.Emp_name, e.Salary, e.Location

FROM Employee e

JOIN (

SELECT Location, AVG(Salary) AS AvgSalary


FROM Employee

GROUP BY Location

) loc_avg ON e.Location = loc_avg.Location

WHERE e.Salary > loc_avg.AvgSalary;

2. You have a Trip table with the following columns:

● trip_id (Trip ID)


● driver_id (Driver ID)
● rider_id (Rider ID)
● trip_start_timestamp (Trip Start Timestamp)

Write a SQL query to find riders who have taken at least one trip each day for the last 10 days.
Answer:
SELECT rider_id

FROM Trip

WHERE trip_start_timestamp >= CURRENT_DATE - INTERVAL '10 days'

GROUP BY rider_id

HAVING COUNT(DISTINCT DATE(trip_start_timestamp)) = 10;

3. Percentage of Successful Payments per Driver


Write a query to find the percentage of successful payments for each driver.

Table and Column Assumptions:


● Rides: ride_id, driver_id, fare_amount, driver_rating, start_time
● Payments: payment_id, ride_id, payment_status (payment_status =
'Completed' indicates success)
Query:

WITH RecentRides AS (

SELECT

r.ride_id,

r.driver_id,

r.fare_amount,

r.driver_rating,

r.start_time,

p.payment_status

FROM

Rides r

LEFT JOIN

Payments p ON r.ride_id = p.ride_id

WHERE

r.start_time >= DATEADD(MONTH, -3, CURRENT_DATE)

),

DriverMetrics AS (
SELECT

driver_id,

COUNT(ride_id) AS total_rides,

COUNT(CASE WHEN payment_status = 'Completed' THEN 1 END) * 100.0


/ COUNT(ride_id) AS percentage_successful_payments

FROM

RecentRides

GROUP BY driver_id

SELECT

driver_id,
percentage_successful_payments

FROM

DriverMetrics;

4. Calculate the Percentage of Menu Items Sold per Restaurant Write a query to calculate

the percentage of items sold at the restaurant level.

Table and Column Assumptions:


● Items: item_id, rest_id (restaurant ID)
● Orders: order_id, item_id, quantity, is_offer, client_id, Date_Timestamp Query:

WITH TotalItemsPerRestaurant AS (

SELECT rest_id, COUNT(item_id) AS total_items

FROM Items

GROUP BY rest_id

),

SoldItemsPerRestaurant AS (

SELECT i.rest_id, COUNT(o.item_id) AS sold_items

FROM Orders o
JOIN Items i ON o.item_id = i.item_id

GROUP BY i.rest_id

SELECT

t.rest_id,

(s.sold_items * 100.0 / t.total_items) AS percentage_items_sold

FROM

TotalItemsPerRestaurant t

JOIN
SoldItemsPerRestaurant s ON t.rest_id = s.rest_id;

5. Time Taken for Next Order (Clients with Offers vs


Without Offers)
Write a query to compare the time taken for clients who placed their first order with and without an
offer to make their next order.

Table and Column Assumptions:


● Orders: order_id, user_id, is_offer, Date_Timestamp Query:

WITH FirstOrder AS (

SELECT user_id, MIN(Date_Timestamp) AS first_order_time, is_offer

FROM Orders

GROUP BY user_id, is_offer

),

NextOrder AS (

SELECT o.user_id, MIN(o.Date_Timestamp) AS next_order_time, f.is_offer

FROM Orders o

JOIN FirstOrder f ON o.user_id = f.user_id

WHERE o.Date_Timestamp > f.first_order_time

GROUP BY o.user_id, f.is_offer

SELECT
is_offer,

AVG(TIMESTAMPDIFF(SECOND, first_order_time, next_order_time)) AS avg_time_to_next_order

FROM

FirstOrder f

JOIN

NextOrder n ON f.user_id = n.user_id


GROUP BY is_offer;

6. Find All Numbers Appearing at Least Three Times


Consecutively
Write a query to find all numbers that appear at least three times consecutively in a log table.

Table and Column Assumptions:


● Logs: Id, Num

Query (Using Self-Join):

SELECT DISTINCT l1.Num

FROM Logs l1

JOIN Logs l2 ON l1.Num = l2.Num AND l1.Id = l2.Id - 1

JOIN Logs l3 ON l1.Num = l3.Num AND l1.Id = l3.Id - 2;

7. Find the Length of the Longest Consecutive Sequence


Write a Query to find the length of the longest consecutive sequence of numbers in a table.

Table and Column Assumptions:


● Consecutive: number
Query:
WITH NumberWithRank AS (

SELECT

number,

ROW_NUMBER() OVER (ORDER BY number) AS row_num

FROM

Consecutive

),
ConsecutiveGroups AS (

SELECT

number,

row_num - number AS group_id

FROM

NumberWithRank

SELECT

COUNT(*) AS longest_consecutive_sequence

FROM

ConsecutiveGroups

GROUP BY group_id

ORDER BY

longest_consecutive_sequence DESC

LIMIT 1;

8. Percentage of Promo Trips – Members vs Non-


Members
Write a query to calculate the percentage of promo trips comparing members with non-members.

Table and Column Assumptions:


● Pass_Subscriptions: user_id, pass_id, start_date, end_date, status
● Orders: order_id, user_id, is_offer, Date_Timestamp
Query:

WITH Members AS (

SELECT DISTINCT user_id

FROM Pass_Subscriptions

WHERE status = 'PAID'


),

PromoTrips AS (

SELECT user_id, is_offer

FROM Orders

WHERE is_offer = 1

),

PromoTripsByMemberStatus AS (

SELECT

CASE WHEN m.user_id IS NOT NULL THEN 'Member' ELSE


'Non-Member' END AS member_status, COUNT(*) AS
total_promo_trips

FROM PromoTrips p

LEFT JOIN Members m ON p.user_id = m.user_id

GROUP BY member_status

SELECT

member_status,

(total_promo_trips * 100.0 / SUM(total_promo_trips) OVER ()) AS promo_trip_percentage

FROM

PromoTripsByMemberStatus;

Data Analyst Interview Questions: SQL


RDBMS is one of the most commonly used databases till date, and therefore SQL skills are indispensable
in most of the job roles such as a Data Analyst. Knowing Structured Query Language, boots your path on
becoming a data analyst, as it will be clear in your interviews that you know how to handle databases.
Q1. What is the default port for SQL?

The default TCP port assigned by the official Internet Number Authority(IANA) for SQL server is 1433.

Q2. What do you mean by DBMS? What are its different types?
A Database Management System (DBMS) is a software application that interacts with the user,
applications and the database itself to capture and analyze data. The data stored in the database can be
modified, retrieved and deleted, and can be of any type like strings, numbers, images etc.

There are mainly 4 types of DBMS, which are Hierarchical, Relational, Network, and Object-Oriented
DBMS.

• Hierarchical DBMS: As the name suggests, this type of DBMS has a style of predecessor-
successor type of relationship. So, it has a structure similar to that of a tree, wherein the nodes
represent records and the branches of the tree represent fields.
• Relational DBMS (RDBMS): This type of DBMS, uses a structure that allows the users to identify
and access data in relation to another piece of data in the database.
• Network DBMS: This type of DBMS supports many to many relations wherein multiple member
records can be linked.
• Object-oriented DBMS: This type of DBMS uses small individual software called objects. Each
object contains a piece of data and the instructions for the actions to be done with the data.

Q3. What is ACID property in a database?


ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. This property is used in the
databases to ensure whether the data transactions are processed reliably in the system or not. If you
have to define each of these terms, then you can refer below.

• Atomicity: Refers to the transactions which are either completely successful or failed. Here a
transaction refers to a single operation. So, even if a single transaction fails, then the entire
transaction fails and the database state is left unchanged.

• Consistency: This feature makes sure that the data must meet all the validation rules. So, this
basically makes sure that the transaction never leaves the database without completing its
state.

• Isolation: Isolation keeps transactions separated from each other until they’re finished. So
basically each and every transaction is independent.
• Durability: Durability makes sure that your committed transaction is never lost. So, this
guarantees that the database will keep track of pending changes in such a way that even if there
is a power loss, crash or any sort of error the server can recover from an abnormal termination.

Q4. What is Normalization? Explain different types of Normalization with


advantages.
Normalization is the process of organizing data to avoid duplication and redundancy. There are many
successive levels of normalization. These are called normal forms. Each consecutive normal form
depends on the previous one. The first three normal forms are usually adequate.

• First Normal Form (1NF) – No repeating groups within rows


• Second Normal Form (2NF) – Every non-key (supporting) column value is dependent on the
whole primary key.
• Third Normal Form (3NF) – Dependent solely on the primary key and no other non-key
(supporting) column value.
• Boyce- Codd Normal Form (BCNF) – BCNF is the advanced version of 3NF. A table is said to be in
BCNF if it is 3NF and for every X ->Y, relation X should be the super key of the table.

Some of the advantages are:

• Better Database organization


• More Tables with smaller rows
• Efficient data access
• Greater Flexibility for Queries
• Quickly find the information
• Easier to implement Security
• Allows easy modification
• Reduction of redundant and duplicate data
• More Compact Database
• Ensure Consistent data after modification

Q5. What are the different types of Joins?


The various types of joins used to retrieve data between tables are Inner Join, Left Join, Right Join and
Full Outer Join. Refer to the image on the right side.
• Inner join: Inner Join in MySQL is the most common type of join. It is used to return all the rows
from multiple tables where the join condition is satisfied.

• Left Join: Left Join in MySQL is used to return all the rows from the left table, but only the
matching rows from the right table where the join condition is fulfilled.

• Right Join: Right Join in MySQL is used to return all the rows from the right table, but only the
matching rows from the left table where the join condition is fulfilled.

• Full Join: Full join returns all the records when there is a match in any of the tables. Therefore, it
returns all the rows from the left-hand side table and all the rows from the right-hand side table.

Q6. Suppose you have a table of employee details consisting of columns


names (employeeId, employeeName), and you want to fetch alternate
records from a table. How do you think you can perform this task?
You can fetch alternate tuples by using the row number of the tuple. Let us say if we want to display the
employeeId, of even records, then you can use the mod function and simply write the following query:

Select employeeId from (Select rownumber, employeeId from employee) where


1 mod(rownumber,2)=0

where ‘employee’ is the table name.


Similarly, if you want to display the employeeId of odd records, then you can write the following query

Select employeeId from (Select rownumber, employeeId from employee) where


1 mod(rownumber ,2)=1

Q7. Consider the following two tables.

Table 5: Example Table – Data Analyst Interview Questions

Now, write a query to get the list of customers who took the course more than
once on the same day. The customers should be grouped by customer, and course
and the list should be ordered according to the most recent date.

1 SELECT
2 c.Customer_Id,
3 CustomerName,
Course_Id,
4
Course_Date,
5 count(Customer_Course_Id) AS count
6 FROM customers c JOIN course_details d ON d.Customer_Id = c.Customer_Id
GROUP BY c.Customer_Id,
7
8
CustomerName,
9
Course_Id,
10
Course_Date
11
HAVING count( Customer_Course_Id ) > 1 12 ORDER BY Course_Date DESC;
13
Table 6: Output Table – Data Analyst Interview Questions

Q8. Consider the below Employee_Details table. Here the table has various
features such as Employee_Id, EmployeeName, Age, Gender, and Shift. The
Shift has m = Morning Shift and e = Evening Shift. Now, you have to swap
the ‘m’ and the ‘e’ values and vice versa, with a single update query.

Table 7: Example Table – Data Analyst Interview Questions

You can write the below query:

1 UPDATE Employee_Details SET Shift = CASE Shift WHEN 'm' THEN 'e' ELSE 'm' EN

Table 8: Output Table – Data Analyst Interview Questions

Q9. Write a SQL query to get the third highest salary of an employee from
Employee_Details table as illustrated below.

Table 9: Example Table – Data Analyst Interview Questions

1 SELECT TOP 1 Salary


2 FROM(
3 SELECT TOP 3 Salary
4 FROM Employee_Details
5 ORDER BY salary DESC) AS emp
6 ORDER BY salary ASC;

Q10. What is the difference between NVL and NVL2 functions in SQL?
NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions which check whether the value of exp1 is null
or not.

If we use NVL(exp1,exp2) function, then if exp1 is not null, then the value of exp1 will be returned; else
the value of exp2 will be returned. But, exp2 must be of the same data type of exp1.

Similarly, if we use NVL2(exp1, exp2, exp3) function, then if exp1 is not null, exp2 will be returned, else
the value of exp3 will be returned.

Basic SQL Questions

1. What is SQL?
o SQL (Structured Query Language) is used to manage and manipulate relational
databases.

2. What are the different types of SQL commands?


o DDL (Data Definition Language): CREATE, ALTER, DROP
o DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
o DCL (Data Control Language): GRANT, REVOKE
o TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT

3. What is the difference between WHERE and HAVING?


o WHERE: Filters rows before aggregation.
o HAVING: Filters groups after aggregation.

4. What is the difference between DELETE and TRUNCATE?


o DELETE: Removes rows based on a condition; can be rolled back.
o TRUNCATE: Removes all rows from a table; faster but cannot be rolled back.

5. What is a Primary Key?


o A unique identifier for each record in a table. It cannot contain NULL values.

Intermediate SQL Questions

6. What is the difference between INNER JOIN and OUTER JOIN?


o INNER JOIN: Returns matching rows from both tables.
o OUTER JOIN: Returns matching rows and unmatched rows from one or both tables
(LEFT, RIGHT, FULL OUTER JOIN).
7. What is a Subquery?
o A query nested within another query. It can return scalar, column, or table results.

8. Explain the difference between UNION and UNION ALL.


o UNION: Combines results and removes duplicates.
o UNION ALL: Combines results without removing duplicates.

9. What are Window Functions in SQL?


o Functions that perform calculations across a set of rows related to the current row.
Examples: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE().

10. What is a View? How do you create one?


o A View is a virtual table based on the result of a query.

sql
Copy code
CREATE VIEW ViewName AS
SELECT column1, column2 FROM TableName WHERE condition;

11. What are the differences between CHAR and VARCHAR?


o CHAR: Fixed length; pads with spaces if data is shorter.
o VARCHAR: Variable length; saves space for shorter data.

12. What is the difference between a CROSS JOIN and a SELF JOIN?
o CROSS JOIN: Combines all rows from two tables (Cartesian product).
o SELF JOIN: Joins a table with itself.

Advanced SQL Questions

13. What is Normalization? Explain its types.


o Process of organizing data to reduce redundancy. Forms include:
 1NF: Atomic columns.
 2NF: No partial dependency.
 3NF: No transitive dependency.

14. What are Common Table Expressions (CTEs)?


o Temporary result sets defined within a WITH clause.

sql
Copy code
WITH CTE_Name AS (
SELECT column1, column2 FROM TableName WHERE condition
)
SELECT * FROM CTE_Name;

15. How do you find the second-highest salary in a table?


o Using a subquery:
sql
Copy code
SELECT MAX(Salary) FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);

o Using RANK():

sql
Copy code
SELECT Salary FROM (
SELECT Salary, RANK() OVER (ORDER BY Salary DESC) AS rnk FROM
Employee
) AS Ranked WHERE rnk = 2;

16. What is the difference between OLTP and OLAP?


o OLTP (Online Transaction Processing): Handles transactional data.
o OLAP (Online Analytical Processing): Handles analytical queries for reporting.

17. Explain the concept of Indexing in SQL.


o Indexes improve query performance by enabling faster data retrieval. Types:
 Clustered Index: Reorders the table data.
 Non-Clustered Index: Creates a separate structure.

18. How do you handle duplicate records in SQL?


o Use DISTINCT to filter unique records:

sql
Copy code
SELECT DISTINCT column1 FROM TableName;

o Use ROW_NUMBER() to delete duplicates:

sql
Copy code
DELETE FROM Employee WHERE EmployeeID NOT IN (
SELECT MIN(EmployeeID) FROM Employee GROUP BY Name
);

19. What are Transactions in SQL?


o A sequence of operations executed as a single unit. Use:
 BEGIN TRANSACTION
 COMMIT
 ROLLBACK

20. Explain the EXISTS clause in SQL.


o Checks for the existence of rows returned by a subquery.

sql
Copy code
SELECT Name FROM Employee WHERE EXISTS (
SELECT 1 FROM Department WHERE Employee.DeptID = Department.DeptID
);

Scenario-Based Questions

21. How do you find employees with salaries above the department average?

sql
Copy code
SELECT Name FROM Employee WHERE Salary > (
SELECT AVG(Salary) FROM Employee WHERE Employee.DeptID = DeptID
);

22. How would you join three tables in SQL?


o Example with INNER JOIN:

sql
Copy code
SELECT A.column1, B.column2, C.column3
FROM TableA A
INNER JOIN TableB B ON A.ID = B.ID
INNER JOIN TableC C ON B.ID = C.ID;

23. How do you find the cumulative sum of a column?

sql
Copy code
SELECT Name, Salary, SUM(Salary) OVER (ORDER BY Name) AS
CumulativeSalary
FROM Employee;

24. How do you find the number of employees in each department?

sql
Copy code
SELECT DeptID, COUNT(*) AS EmployeeCount
FROM Employee
GROUP BY DeptID;

25. How do you rank employees by salary within each department?

sql
Copy code
SELECT Name, Salary, RANK() OVER (PARTITION BY DeptID ORDER BY Salary
DESC) AS Rank
FROM Employee;

Optimization and Best Practices

26. How do you optimize SQL queries?


o Use proper indexing.
o Avoid SELECT *; specify needed columns.
o Use EXISTS instead of IN for subqueries.
o Apply appropriate joins.
o Use query execution plans for analysis.

27. How do you detect and resolve deadlocks in SQL?


o Identify with SQL Profiler or logs.
o Resolve by:
 Minimizing transaction scope.
 Acquiring locks in a consistent order.

28. What is Query Execution Plan?


o A tool to visualize how SQL queries are executed, helping identify inefficiencies.

29. What is the difference between Scalar and Table-Valued Functions?


o Scalar Function: Returns a single value.
o Table-Valued Function: Returns a table.

30. How do you schedule automated SQL jobs?


o Use tools like SQL Server Agent to schedule tasks like backups or query executions.

Soft Skills & Real-World Scenarios

31. Describe a challenging SQL query you wrote in a previous project.


o Be ready to discuss a query with complex joins, subqueries, or performance tuning.

32. How do you ensure data accuracy in your queries?


o Use testing datasets and verify results with stakeholders.

33. Explain a time when you optimized a slow query.


o Share steps like analyzing execution plans, adding indexes, or restructuring queries.

Tips for Cracking SQL Interviews

 Practice Queries: Platforms like LeetCode, HackerRank, and SQLZoo are great for practicing SQL.
 Understand Business Context: Focus on how SQL helps solve real-world business problems.
 Optimize Performance: Be ready to discuss query performance and indexing strategies.
 Hands-On Experience: Mention projects or dashboards where you used SQL extensively.

These questions cover most of what companies typically ask during SQL interviews, from
fundamentals to advanced scenarios.
Practical Questions
1. Retrieve Data Based on Conditions

 Question: Write a query to retrieve all employees whose salary is above $50,000 and belong to
the "IT" department.

sql
Copy code
SELECT EmployeeID, Name, Salary, Department
FROM Employees
WHERE Salary > 50000 AND Department = 'IT';

2. Aggregate Data

 Question: Find the total revenue generated by each product category.

sql
Copy code
SELECT Category, SUM(Revenue) AS TotalRevenue
FROM Sales
GROUP BY Category;

3. Using HAVING

 Question: Retrieve departments with an average salary greater than $70,000.

sql
Copy code
SELECT Department, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY Department
HAVING AVG(Salary) > 70000;

4. Find Duplicate Records

 Question: Find duplicate email addresses in a user table.

sql
Copy code
SELECT Email, COUNT(*) AS Count
FROM Users
GROUP BY Email
HAVING COUNT(*) > 1;

5. Subqueries

 Question: Retrieve employees whose salary is greater than the average salary in the company.
sql
Copy code
SELECT EmployeeID, Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

6. Ranking with Window Functions

 Question: Rank employees by salary within their departments.

sql
Copy code
SELECT EmployeeID, Name, Department, Salary,
RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS
Rank
FROM Employees;

7. Top-N Results

 Question: Find the top 3 best-selling products by revenue.

sql
Copy code
SELECT ProductID, ProductName, Revenue
FROM (
SELECT ProductID, ProductName, Revenue,
RANK() OVER (ORDER BY Revenue DESC) AS Rank
FROM Sales
) AS RankedSales
WHERE Rank <= 3;

8. Joins

 Question: Retrieve the list of employees along with their department names (from Employees
and Departments tables).

sql
Copy code
SELECT E.EmployeeID, E.Name, D.DepartmentName
FROM Employees E
INNER JOIN Departments D ON E.DepartmentID = D.DepartmentID;

9. Self-Join

 Question: Find employees who have the same manager.

sql
Copy code
SELECT A.EmployeeID AS Employee1, B.EmployeeID AS Employee2, A.ManagerID
FROM Employees A
INNER JOIN Employees B ON A.ManagerID = B.ManagerID
WHERE A.EmployeeID <> B.EmployeeID;

10. Using UNION

 Question: Combine lists of customers from two tables, removing duplicates.

sql
Copy code
SELECT CustomerName, Email FROM OnlineCustomers
UNION
SELECT CustomerName, Email FROM OfflineCustomers;

11. Date Functions

 Question: Find all orders placed in the last 30 days.

sql
Copy code
SELECT OrderID, CustomerID, OrderDate
FROM Orders
WHERE OrderDate >= DATEADD(DAY, -30, GETDATE());

12. Cumulative Totals

 Question: Calculate the running total of sales for each product.

sql
Copy code
SELECT ProductID, SaleDate, Revenue,
SUM(Revenue) OVER (PARTITION BY ProductID ORDER BY SaleDate) AS
RunningTotal
FROM Sales;
13. Handle NULL Values

 Question: Retrieve all customers and their orders, including those without orders.

sql
Copy code
SELECT C.CustomerID, C.Name, O.OrderID, O.TotalAmount
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID;
14. Find the Second-Highest Value

 Question: Retrieve the second-highest salary from the Employees table.

sql
Copy code
SELECT MAX(Salary) AS SecondHighestSalary
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

15. Conditional Aggregation

 Question: Count the number of orders for each customer, showing 0 for customers with no
orders.

sql
Copy code
SELECT C.CustomerID, C.Name, COUNT(O.OrderID) AS OrderCount
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID, C.Name;

16. Recursive Query (CTE)

 Question: Write a query to calculate the factorial of 5 using recursion.

sql
Copy code
WITH RECURSIVE Factorial AS (
SELECT 1 AS Num, 1 AS Result
UNION ALL
SELECT Num + 1, Result * (Num + 1)
FROM Factorial
WHERE Num < 5
)
SELECT MAX(Result) AS FactorialOf5 FROM Factorial;

17. Find Gaps in Data

 Question: Identify missing sequential order IDs in an Orders table.

sql
Copy code
SELECT O1.OrderID + 1 AS MissingOrderID
FROM Orders O1
LEFT JOIN Orders O2 ON O1.OrderID + 1 = O2.OrderID
WHERE O2.OrderID IS NULL;

18. Pivot Tables

 Question: Transform sales data into a pivot table showing monthly sales for each product.

SELECT ProductID,
SUM(CASE WHEN MONTH(SaleDate) = 1 THEN Revenue ELSE 0 END) AS
Jan,
SUM(CASE WHEN MONTH(SaleDate) = 2 THEN Revenue ELSE 0 END) AS
Feb,
SUM(CASE WHEN MONTH(SaleDate) = 3 THEN Revenue ELSE 0 END) AS Mar
FROM Sales
GROUP BY ProductID;

19. Dynamic Filtering

 Question: Retrieve orders filtered by a list of order IDs provided dynamically.

sql
Copy code
SELECT OrderID, CustomerID, TotalAmount
FROM Orders
WHERE OrderID IN (101, 102, 103); -- Replace with dynamic IDs

20. JSON and XML Data

 Question: Extract specific fields from JSON data stored in a column.

SELECT JSON_VALUE(JsonColumn, '$.CustomerName') AS CustomerName,


JSON_VALUE(JsonColumn, '$.OrderID') AS OrderID
FROM Orders;

Tips for SQL Practical Questions

 Read the Question Carefully: Ensure you understand the requirements, like filters, grouping,
and joins.
 Optimize Queries: Avoid inefficiencies like unnecessary joins or subqueries.
 Test Queries: Use sample data to validate correctness.
 Use Comments: Explain your thought process if needed during interviews.

1.How would you find customers who have bougth a product at least 3 times consecutively?
Table Name:Orders
Columns:CustomrID,ProductID,OrderDate

2.How can you find employees whose salary is greater than their managers salary?
Table name:Employees

1. Write a SQL query to find employees whose salary is greater than the average
salary of employees in their respective location.

Table Name: Employee


Column Names: EmpID (Employee ID), Emp_name (Employee Name), Manager_id
(Manager ID), Salary (Employee Salary), Location (Employee Location)

---
2. Write a SQL query to identify riders who have taken at least one trip every day
for the last 10 days.

Table Name: Trip


Column Names: trip_id (Trip ID), driver_id (Driver ID), rider_id (Rider ID),
trip_start_timestamp (Trip Start Timestamp)

---

3. Write a SQL query to calculate the percentage of successful payments for each
driver. A payment is considered successful if its status is 'Completed'.

Table Name: Rides


Column Names: ride_id (Ride ID), driver_id (Driver ID), fare_amount (Fare
Amount), driver_rating (Driver Rating), start_time (Start Time)

Table Name: Payments


Column Names: payment_id (Payment ID), ride_id (Ride ID), payment_status
(Payment Status)

---

4. Write a SQL query to calculate the percentage of menu items sold for each
restaurant.

Table Name: Items


Column Names: item_id (Item ID), rest_id (Restaurant ID)

Table Name: Orders


Column Names: order_id (Order ID), item_id (Item ID), quantity (Quantity),
is_offer (Is Offer), client_id (Client ID), Date_Timestamp (Date Timestamp)

---

5. Write a SQL query to compare the time taken for clients who placed their first
order with an offer versus those without an offer to make their next order.

Table Name: Orders


Column Names: order_id (Order ID), user_id (User ID), is_offer (Is Offer),
Date_Timestamp (Date Timestamp)

---

6. Write a SQL query to find all numbers that appear at least three times
consecutively in the log.

Table Name: Logs


Column Names: Id (ID), Num (Number)
---

7. Write a SQL query to find the length of the longest sequence of consecutive
numbers in the table.

Table Name: Consecutive


Column Names: number (Number)
Sample Table -
Number
1
2
3
4
10
11
20
21
22
23
24
30

---

8. Write a SQL query to calculate the percentage of promo trips, comparing


members versus non-members.

Table Name: Pass_Subscriptions


Column Names: user_id (User ID), pass_id (Pass ID), start_date (Start Date),
end_date (End Date), status (Status)

Table Name: Orders


Column Names: order_id (Order ID), user_id (User ID), is_offer (Is Offer),
Date_Timestamp (Date Timestamp)

Here are the most frequently asked SQL interview questions for a data analyst role:

1. How would you find customers who have bought a product at least 3 times
consecutively?
- Table name: Orders
- Columns: CustomerID, ProductID, OrderDate

2. How can you find employees whose salary is greater than their manager's salary?
- Table name: Employees
- Columns: EmployeeID, Salary, ManagerID

3. Write a query to calculate the cumulative sum of a column in a table.


- Table name: Sales
- Columns: SaleID, Amount

4. How would you count the number of rows returned by each type of join between
the following two tables?
- Table_1:
- Id
-1
-1
-1
-2
-2
- Null
-3
-4
-5
- Null
- Null
- Table_2:
- Id
-1
-1
- Null
-2
-2
-2
-3
-3
-3
- Null
-4

5. Explain the order of execution in SQL.

6. What is the difference between row_number(), rank(), and dense_rank() in SQL?

7. What is the difference between count(*) and count(column_name) in the same


table?

8. How would you find the second highest salary in a table?


- Table name: Employees
- Columns: EmployeeID, Salary

9. How can you find duplicate records in a table?


- Table name: Orders
- Columns: OrderID, CustomerID, ProductID, OrderDate

10. How would you optimize a slow-running query in SQL?

Note: I have made a YouTube video too with step-by-step explanations for some of
the questions mentioned above. Please click the YouTube link to learn more about
the solutions.

As a beginner, here’s how you can start building logic for SQL interview questions:

1. Understand the flow of execution: Learn the order in which SQL processes
different clauses (e.g., `FROM`, `JOIN`, `WHERE`, `GROUP BY`, `SELECT`, `ORDER
BY`). This knowledge will guide you in logically structuring your queries.

2. Study others' SQL queries: Analyze existing queries to see how they’re
constructed. Visualize or write down the output of each step according to the
execution flow to understand how different parts of the query interact.

3. Build queries step-by-step: Start writing your queries by following the flow of
execution. Begin with the `FROM` clause, then add `JOIN`, `WHERE`, and so on. Run
your query after adding each step to visualize the intermediate output and refine
your logic.

4. Practice on online platforms: Use online SQL practice platforms to solve as many
queries as possible, starting with easy-level questions. This practice will build your
confidence and skill gradually.

5. Experiment with different solutions: Solve the same problem using various
methods. This will help you understand SQL's flexibility and learn new approaches.
Watching others’ solutions on online platforms can also provide valuable insights.

6. Review and refine your queries: Regularly revisit the queries you've written to
identify areas for improvement. This practice will help you enhance your query-
building skills over time.

Consistent practice and a solid understanding of SQL’s execution flow are essential
for mastering SQL queries. Keep experimenting, learning, and refining your
approach.

SQL:

1. Write a SQL query to see employee name and manager name using a self-join on
'employees' table with columns 'emp_id', 'name', and 'manager_id'.
2. Explain the order of execution in SQL queries.

Recent Interview Experience for Data Analyst Role at Deloitte:

Tell me about yourself & your roles-responsibilities at your current work place.

Describe some challenges you faced in your recent project related to tableau
dashboarding and how you overcame it.

What is the difference between a KPI (Key Performance Indicator) and a dimension?

Write a SQL code to extract the third highest salary from an employee table with
columns - EID, ESalary.

Write SQL code to create a procedure with ESalary as one parameter - select all EIDs
from Employee table where ESalary < 50000.

For table Employee (columns - EID, ESalary), filter all EIDs where salary is odd
numbers and merge with another table empdetails (columns - EID, EDOB) to get
EDOB.

Name one chart type in Power BI that is different from a normal chart and explain its
use.

Most frequently asked SQL interview questions with their answers:-

1. Write a SQL query to find the top 5 customers with the highest total purchase
amount. Assume you have two tables: Customers (CustomerID, Name) and Orders
(OrderID, CustomerID, Amount).

SELECT c.CustomerID, c.Name, SUM(o.Amount) AS TotalPurchase


FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID, c.Name
ORDER BY TotalPurchase DESC
LIMIT 5;

2. Write a query to find the nth highest salary from a table Employees with columns
EmployeeID, Name, and Salary.

SELECT DISTINCT Salary


FROM Employees
ORDER BY Salary DESC
LIMIT 1 OFFSET n-1;

Replace `n` with the desired rank (e.g., 2 for the second highest).

3. Given a table Sales with columns SaleID, ProductID, SaleDate, and Quantity, write
a query to find the total quantity sold for each product per month.

SELECT ProductID, DATE_TRUNC('month', SaleDate) AS Month, SUM(Quantity) AS


TotalQuantity
FROM Sales
GROUP BY ProductID, Month
ORDER BY ProductID, Month;

4. Write a SQL query to find all employees who have more than one manager.
Assume you have a table Employees (EmployeeID, Name, ManagerID).

SELECT EmployeeID, Name


FROM Employees
GROUP BY EmployeeID, Name
HAVING COUNT(DISTINCT ManagerID) > 1;

5. Given a table Orders with columns OrderID, CustomerID, OrderDate, and a table
OrderDetails with columns OrderID, ProductID, Quantity, write a query to find the top
3 products with the highest sales quantity.

SELECT ProductID, SUM(Quantity) AS TotalQuantity


FROM OrderDetails
GROUP BY ProductID
ORDER BY TotalQuantity DESC
LIMIT 3;

6. Write a SQL query to find the second most recent order date for each customer
from a table Orders (OrderID, CustomerID, OrderDate).

SELECT CustomerID, MAX(OrderDate) AS SecondRecentOrderDate


FROM Orders
WHERE OrderDate < (SELECT MAX(OrderDate) FROM Orders o2 WHERE
o2.CustomerID = Orders.CustomerID)
GROUP BY CustomerID;

7. Given a table Employees with columns EmployeeID, Name, DepartmentID, Salary,


write a query to find the highest paid employee in each department.

SELECT DepartmentID, EmployeeID, Name, Salary


FROM Employees e1
WHERE Salary = (SELECT MAX(Salary) FROM Employees e2 WHERE
e2.DepartmentID = e1.DepartmentID);

8. Write a SQL query to calculate the cumulative sales for each day in a table Sales
with columns SaleID, SaleDate, and Amount.

SELECT SaleDate, SUM(Amount) OVER (ORDER BY SaleDate ROWS BETWEEN


UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeSales
FROM Sales
ORDER BY SaleDate;

9. Given a table Products with columns ProductID, Name, Price, and a table Sales
with columns SaleID, ProductID, Quantity, write a query to find the product with the
highest revenue.

SELECT p.ProductID, p.Name, SUM(p.Price * s.Quantity) AS Revenue


FROM Products p
JOIN Sales s ON p.ProductID = s.ProductID
GROUP BY p.ProductID, p.Name
ORDER BY Revenue DESC
LIMIT 1;

gn.
Write a query to rank employees by their performance score within each department,
resetting the rank for each new department.
Create a query to find gaps in a series of dates for a given employee's attendance
records.

Question 1: Monthly Revenue Trends by Category

Scenario: Analyze monthly revenue trends for each product category.


Table:
1. transactions (Transaction_id, Product_id, Amount_spent, Transaction_date),

2. products (Product_id, Category)

Challenge: Write a SQL query to calculate the total revenue for each category on a
monthly basis and identify the top 3 categories with the highest revenue growth
month-over-month.

Question 2: Customer Retention Analysis


Scenario: Determine the retention rate of customers.

Table:
1. customer_visits (Customer_id, Visit_date)

Challenge: Write a SQL query to calculate the retention rate of customers month-
over-month for the past year, identifying the percentage of customers who return
the following month.

Question 3: Product Affinity Analysis


Scenario: Identify products that are frequently bought together.
Table:
1. order_details (Order_id, Product_id, Quantity)

Challenge: Write a SQL query to find pairs of products that are frequently bought
together. Include the count of how many times each pair appears in the same order
and rank them by frequency.

Question 4: Customer Purchase Segmentation

Scenario: Segment customers based on their purchase behavior.


Table:
1. purchases (Customer_id, Product_id, Amount_spent, Purchase_date)

Challenge: Write a SQL query to segment customers into different groups based on
their total spending and purchase frequency in the last year. Classify them into
categories like 'High Spenders', 'Medium Spenders', and 'Low Spenders'.

Question 5: Anomaly Detection in Transactions

Scenario: Detect anomalies in transaction amounts.


Table:
1. transactions (Transaction_id, Customer_id, Amount_spent, Transaction_date)

Challenge: Write a SQL query to identify transactions that deviate significantly from
the customer's average spending. Flag transactions that are more than three
standard deviations away from the mean spending amount for each customer.

1. Proper Indexing
- Ensure columns in 'WHERE', 'JOIN', 'ORDER BY', and 'GROUP BY' clauses are
indexed.
- Avoid over-indexing to prevent slowdowns in 'INSERT', 'UPDATE', and 'DELETE'
operations.

2. Optimize SELECT Statements


- Select only necessary columns, avoid 'SELECT *'.
- Use 'EXISTS' instead of 'IN' for subqueries.

3. Simplify Joins
- Break down complex joins into simpler queries.
- Index foreign keys used in joins.

4. Proper Query Structure


- Apply 'WHERE' clauses early.
- Use 'LIMIT' or 'TOP' to restrict result sets.

5. Optimize Subqueries and Derived Tables


- Rewrite subqueries as joins when possible.
- Use temporary tables for complex operations.
6. Use Query Execution Plans
- Regularly check execution plans to identify bottlenecks.
- Avoid full table scans by ensuring proper indexing.

7. Avoid Functions on Indexed Columns


- Don’t use functions on indexed columns in 'WHERE' clauses.

8. Partition Large Tables


- Split large tables into smaller partitions.
- Ensure partitions are indexed.

9. Batch Processing
- Process large datasets in batches to avoid locking and reduce transaction log size.
- Keep transactions short to minimize locking and blocking.

Monitoring and Continuous Improvement


- Regularly monitor performance and refine queries and indexing strategies.

SQL:
1. Beginner
- Fundamentals: SELECT, WHERE, ORDER BY, GROUP BY, HAVING
- Essential JOINS: INNER, LEFT, RIGHT, FULL
- Basics of database and table creation

2. Intermediate
- Aggregate functions: COUNT, SUM, AVG, MAX, MIN
- Subqueries and nested queries
- Common Table Expressions with the WITH clause
- Conditional logic in queries using CASE statements

3. Advanced
- Complex JOIN techniques: self-join, non-equi join
- Window functions: OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK, lead,
lag
- Query optimization through indexing
- Manipulating data: INSERT, UPDATE, DELETE

1. Find Total Number of Employees in Each Department


SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
Pandas:
employee_count_by_department =
employees.groupby('department_id').size().reset_index(name='employee_count')

2. Calculate Moving Average for Sales


SELECT date, sales,
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT
ROW) AS moving_averageFROM sales_table;
Pandas:
sales_table['moving_average'] = sales_table['sales'].rolling(window=7).mean()

2. Find the Most Recent Sale for Each Product


SELECT product_id, MAX(sale_date) AS most_recent_sale
FROM sales
GROUP BY product_id;
Pandas:
most_recent_sale = sales.groupby('product_id')
['sale_date'].max().reset_index(name='most_recent_sale')

4. Calculate the Difference Between Current and Previous Rows


SELECT id, value,
value - LAG(value) OVER (ORDER BY id) AS difference
FROM your_table;
Pandas:
your_table['difference'] = your_table['value'].diff()

5. Find Employees with the Lowest Salary in Each Department


WITH department_salaries AS (
SELECT department_id, MIN(salary) AS min_salary
FROM employees
GROUP BY department_id
)
SELECT e.*
FROM employees e
JOIN department_salaries ds ON e.department_id = ds.department_id AND e.salary =
ds.min_salary;
Pandas:
min_salary_by_department = employees.groupby('department_id')
['salary'].min().reset_index()
employees_with_lowest_salary = employees.merge(min_salary_by_department,
on=['department_id', 'salary'])

6. Calculate Yearly Sales Growth


SELECT EXTRACT(YEAR FROM sale_date) AS year,
SUM(sales) AS total_sales,
LAG(SUM(sales)) OVER (ORDER BY EXTRACT(YEAR FROM sale_date)) AS
previous_year_sales,
(SUM(sales) - LAG(SUM(sales)) OVER (ORDER BY EXTRACT(YEAR FROM sale_date))) /
LAG(SUM(sales)) OVER (ORDER BY EXTRACT(YEAR FROM sale_date)) * 100 AS
sales_growth
FROM sales_table
GROUP BY EXTRACT(YEAR FROM sale_date);
Pandas:
sales_table['year'] = sales_table['sale_date'].dt.year
yearly_sales = sales_table.groupby('year')['sales'].sum()
yearly_sales_growth = (yearly_sales - yearly_sales.shift()) / yearly_sales.shift() * 100

7. Identify the Top 5 Products by Revenue


SELECT product_id, SUM(revenue) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 5;
Pandas:
top_5_products = sales.groupby('product_id')['revenue'].sum().nlargest(5)

SQL Essentials:
- SELECT statements including WHERE, ORDER BY, GROUP BY, HAVING
- Basic JOINS: INNER, LEFT, RIGHT, FULL
- Aggregate functions: COUNT, SUM, AVG, MAX, MIN
- Subqueries, Common Table Expressions (WITH clause)
- CASE statements, advanced JOIN techniques, and Window functions (OVER,
PARTITION BY, ROW_NUMBER, RANK)

2) Write a SQL code to extract the third highest salary from an employee table with
columns - EID, ESalary ?

SELECT ESalary
FROM (
SELECT ESalary,
DENSE_RANK() OVER (ORDER BY ESalary DESC) AS salary_rank
FROM employee
) AS ranked_salaries
WHERE salary_rank = 3;

Explanation of the query:-

➡️Step 1: Create a Subquery: We start with a subquery to rank all the salaries in
descending order (from highest to lowest).

➡️Step 2: Rank the Salaries: We use the DENSE_RANK() function to assign a rank to
each salary:

The highest salary gets rank 1.

The second highest salary gets rank 2.

The third highest salary gets rank 3, and so on.

➡️Step 3: Assign an Alias: We give a name (ranked_salaries) to our subquery so that


we can use it in the main query.

➡️Step 4: Filter for the Third Highest Salary: In the main query, we filter the results
to find only the salary with rank 3.

➡️Step 5: Select the Third Highest Salary: Finally, we select and display the third
highest salary from the filtered results.

2) Write a SQL code to extract the third highest salary from an employee table with
columns - EID, ESalary ?

SELECT ESalary
FROM (
SELECT ESalary,
DENSE_RANK() OVER (ORDER BY ESalary DESC) AS salary_rank
FROM employee
) AS ranked_salaries
WHERE salary_rank = 3;

1. Write a SQL query to find the top 5 customers with the highest total purchase
amount. Assume you have two tables: Customers (CustomerID, Name) and Orders
(OrderID, CustomerID, Amount).
SELECT c.CustomerID, c.Name, SUM(o.Amount) AS TotalPurchase
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID, c.Name
ORDER BY TotalPurchase DESC
LIMIT 5;

Problem: You are working with a database that tracks customer orders for an e-
commerce platform. You have two tables:

Customers Table:
CID CName
1 Alice
2 Bob
3 Charlie

Orders Table:
OID CID OAmount ODate
101 1 100 2024-01-15
102 2 250 2024-01-20
103 1 300 2024-02-12
104 3 400 2024-02-28
105 2 150 2024-03-05

2.Write a query to find all customers who placed at least two orders where their total
order amount increased in their second order compared to their first order.

Expected Output:
CID CName FirstOrderAmount SecondOrderAmount
1 Alice 100 300

3.Question - You are given a table Sales with the following structure:

s_id c_id p_id s_date s_amt


1 101 1 2023-01-10 150
2 102 2 2023-02-14 200
3 103 1 2023-03-20 120
4 101 2 2023-04-25 250
5 102 1 2023-05-15 180
6 104 3 2023-06-10 300
7 101 3 2023-07-22 350

Write a query to find customers who bought products from three or more distinct
categories (p_id) within a six-month period.

Expected Result:

c_id p_categories total_sales


101 3 750

1. Write a SQL query to find employees whose salary is greater than the average
salary of employees in their respective location.

Table Name: Employee


Column Names: EmpID (Employee ID), Emp_name (Employee Name), Manager_id
(Manager ID), Salary (Employee Salary), Location (Employee Location)
2. Write a SQL query to identify riders who have taken at least one trip every day
for the last 10 days.

Table Name: Trip


Column Names: trip_id (Trip ID), driver_id (Driver ID), rider_id (Rider ID),
trip_start_timestamp (Trip Start Timestamp)
3. Write a SQL query to calculate the percentage of successful payments for each
driver. A payment is considered successful if its status is 'Completed'.

Table Name: Rides


Column Names: ride_id (Ride ID), driver_id (Driver ID), fare_amount (Fare
Amount), driver_rating (Driver Rating), start_time (Start Time)

Table Name: Payments


Column Names: payment_id (Payment ID), ride_id (Ride ID), payment_status
(Payment Status)
4. Write a SQL query to calculate the percentage of menu items sold for each
restaurant.

Table Name: Items


Column Names: item_id (Item ID), rest_id (Restaurant ID)
Table Name: Orders
Column Names: order_id (Order ID), item_id (Item ID), quantity (Quantity),
is_offer (Is Offer), client_id (Client ID), Date_Timestamp (Date Timestamp
5. Write a SQL query to compare the time taken for clients who placed their first
order with an offer versus those without an offer to make their next order.

Table Name: Orders


Column Names: order_id (Order ID), user_id (User ID), is_offer (Is Offer),
Date_Timestamp (Date Timestamp
6. Write a SQL query to find all numbers that appear at least three times
consecutively in the log.

Table Name: Logs


Column Names: Id (ID), Num (Number)
7. Write a SQL query to find the length of the longest sequence of consecutive
numbers in the table.

Table Name: Consecutive


Column Names: number (Number)
Sample Table -
Number
1
2
3
4
10
11
20
21
22
23
24
30

---

8. Write a SQL query to calculate the percentage of promo trips, comparing


members versus non-members.

Table Name: Pass_Subscriptions


Column Names: user_id (User ID), pass_id (Pass ID), start_date (Start Date),
end_date (End Date), status (Status)

Table Name: Orders


Column Names: order_id (Order ID), user_id (User ID), is_offer (Is Offer),
Date_Timestamp (Date Timestamp)

SQL Questions:
1. Joins and Relationships:
- Can you explain the different types of joins (INNER, LEFT, RIGHT, FULL) and
provide an example of when to use each?
- Write a query to find the top 5 products by sales.
- How would you optimize a query with multiple joins on large tables?

2. Aggregations and Grouping:


- How would you calculate the Year-over-Year growth in sales for each product
category?
- Write a query to calculate the total sales for each region.
- How do window functions work in SQL (e.g., ROW_NUMBER, RANK, SUM)?

3. Subqueries and CTEs:


- What is the difference between a subquery and a Common Table Expression
(CTE)?
- Write a query using a CTE to find the second-highest revenue-generating store.

4. Data Manipulation:
- Write a query to update prices for a product category by 10%.
- How would you delete duplicate records from a table while keeping the latest
entry?

5. Database Design:
- Explain normalization and why it is important in database design.
- What are indexes, and how do they impact query performance?

6. Practical Scenario:
- Suppose Walmart wants to analyze customer purchase behavior based on
transactional data. How would you design a query to identify the top customers
by spend for a given period?

You are given a table Orders that contains the following data:
Column Name Type
order_id INT
customer_id INT
order_date DATE
amount DECIMAL

Question:

Write an SQL query to find the total amount spent by each customer who made
more than 3 orders in the system. Return the customer ID and total amount,
sorted by the total amount in descending order.
Find the customers who have never placed an order.

You are given two tables:

Customers Table:
customer_id (Primary Key), customer_name, location

Orders Table:
order_id (Primary Key), customer_id (Foreign Key from Customers table),
order_date, total_amount

Question - You are provided with a dataset of customer transactions containing


the following columns:

CustomerID: Unique identifier for each customer


TransactionID: Unique identifier for each transaction
TransactionDate: The date of the transaction
Amount: The amount spent in each transaction

Write an SQL query to find the top 3 customers who have spent the highest total
amount in any single month across the entire dataset.

Question - Walmart wants to analyze the inventory status of products in their


stores to ensure they are always adequately stocked. You are provided with two
tables:

Sales
sale_id (INT)
product_id (INT)
store_id (INT)
sale_date (DATE)
quantity_sold (INT)

Inventory
product_id (INT)
store_id (INT)
inventory_date (DATE)
quantity_in_stock (INT)

Write an SQL query to find the products that were out of stock (i.e.,
quantity_in_stock = 0) on the day after they were sold. Return the product_id,
store_id, and the date when they went out of stock.

Question - Consider a table Employee with the following schema:

Column Type
EmployeeID INT
Name VARCHAR
Department VARCHAR
Salary INT
JoiningDate DATE

You need to find the highest-paid employee(s) in each department who joined
the company after a certain date (e.g., '2020-01-01'). If there are multiple
employees with the same highest salary in a department, include all of them.

Table Example:
EmployeeID Name Department Salary JoiningDate
1 John HR 50000 2019-05-15
2 Alice HR 60000 2021-08-21
3 Bob IT 70000 2020-02-12
4 Carol IT 70000 2021-03-18
5 Eve IT 60000 2022-06-10

Question - You are given two tables:

Table 1: Orders

OrderID (int): Unique identifier for each order.


CustomerID (int): Unique identifier for each customer.
OrderDate (date): Date when the order was placed.

Table 2: OrderDetails

OrderID (int): Unique identifier for each order.


ProductID (int): Unique identifier for each product.
Quantity (int): Number of units ordered.

Write an SQL query to find customers who have ordered more than the average
number of products across all orders.

Question - You are given a Sales table containing daily sales data for multiple
stores. Some of the data entries are duplicated.

Table: Sales
SaleID StoreID SaleDate Amount
1 101 2024-01-01 500
2 101 2024-01-01 500
3 102 2024-01-02 300
4 102 2024-01-02 300
5 101 2024-01-03 450

Write a query to identify the duplicate rows in the table and delete the extra
copies, keeping only one.
Question - You have the following two tables:

Transactions:
transaction_id (INT)
customer_id (INT)
transaction_date (DATE)
amount (DECIMAL)

Customers:
customer_id (INT)
customer_name (VARCHAR)

Write a SQL query to find the average transaction amount for each customer who
made more than 5 transactions in September 2023.

Problem: You are given a table Employee_Salaries that contains the following
columns -

Employee_ID (INT): Unique ID for each employee


Department (VARCHAR): Department of the employee
Salary (DECIMAL): Salary of the employee

Write a SQL query to find the second highest salary in each department. If there
is no second highest salary in a department, the department should not be
included in the results.

Table Schema:
Column Type
Employee_ID INT
Department VARCHAR
Salary DECIMAL

Expected Output:
Department Second_Highest_Salary
Finance 120000
Marketing 85000

Problem: You have two tables -

- Projects
- Employees

Table 1: Projects
PID PName Start_Date End_Date
1 Alpha 2023-01-15 2023-06-30
2 Beta 2023-02-10 2023-07-15
3 Gamma 2023-03-01 2023-08-01

Table 2: Employees
Employee_ID Employee_Name Project_ID Role Hours_Worked
101 Ayesha 1 Manager 120
102 John 2 Analyst 140
103 Priya 1 Analyst 130
104 Rajesh 3 Manager 110
105 Anjali 2 Developer 150
106 Dev 3 Developer 160

Write an SQL query to find the total hours worked by employees in each role
across all projects.

Expected Output:
Role Total_Hours
Manager 230
Analyst 270
Developer 310

Question: You are given two tables: fb_users_activity and fb_friends, which store
user activity and friendship relationships on Facebook.

fb_users_activity table stores information about user activity:

user_id INT
post_id INT
post_date DATE

fb_friends table stores friendship relationships between users:

user_id INT
friend_id INT

Write an SQL query to find the top 3 users who have the most active friends (in
terms of posts made by their friends) over the past 30 days.

Problem: You are given the following two tables:

Transactions Table:
transaction_id (INT): Unique ID for each transaction.
customer_id (INT): The ID of the customer who made the transaction.
transaction_date (DATE): The date of the transaction.
amount (DECIMAL): The transaction amount.
Customers Table:
customer_id (INT): Unique ID for each customer.
customer_name (VARCHAR): Name of the customer.

Task:
Write an SQL query to find the top 3 customers who have the highest total
transaction amounts in the last 6 months from today's date.

You are provided with a table transactions that logs Paytm users' transaction
details. Write a query to find the top 3 transactions (by amount) made by
each user in every month.

Table Structure:

transactions
transaction_id (INT)
user_id (INT)
transaction_date (DATE)
transaction_amount (DECIMAL)

The result should display:


user_id
month
transaction_id
transaction_amount

The output must be ordered by user_id, month, and transaction_amount (in


descending order).
Activate to view larger image,

SQL Interview Question for hashtag#DataAnalyst Role at HDFC Bank

Problem: You are given two tables - customers and transactions.

customers table: Contains customer information.


customer_id: Unique ID for each customer.
customer_name: Name of the customer.

transactions table: Contains transaction details.


transaction_id: Unique ID for each transaction.
customer_id: The ID of the customer who made the transaction.
transaction_amount: Amount of each transaction.
transaction_date: Date of the transaction.

Write an SQL query to find the top 5 customers who have made the highest total
transactions in terms of the amount. Display the customer’s name and their total
transaction amount, sorted in descending order of total transaction amount.

Schema:

customers(customer_id, customer_name)
transactions(transaction_id, customer_id, transaction_amount, transaction_date)

Problem: You are given two tables - customers and transactions.

customers table: Contains customer information.


customer_id: Unique ID for each customer.
customer_name: Name of the customer.

transactions table: Contains transaction details.


transaction_id: Unique ID for each transaction.
customer_id: The ID of the customer who made the transaction.
transaction_amount: Amount of each transaction.
transaction_date: Date of the transaction.

Write an SQL query to find the top 5 customers who have made the highest total
transactions in terms of the amount. Display the customer’s name and their total
transaction amount, sorted in descending order of total transaction amount.

Schema:

customers(customer_id, customer_name)
transactions(transaction_id, customer_id, transaction_amount, transaction_date)

Problem: You are given two tables:

Employees:
EID Name MID
1 John 3
2 Jane 3
3 Alice NULL
4 Bob 3
5 Tom 1

Salaries:
EID Sal
1 50000
2 60000
3 80000
4 55000
5 40000

Write an SQL query to find the average salary of employees managed by each manager.
Exclude managers who do not manage any employees.
Problem: You are given two tables: Orders and Products. The Orders table contains
details about customer orders, and the Products table contains details about the
products. You need to find the top 3 performing products based on the total sales
amount (quantity * price).

Table: Orders

Col Name Type


order_id int
product_id int
quantity int

Table: Products

Col Name Type


product_id int
product_name varchar
price decimal

Write an SQL query to return the top 3 products by sales amount (quantity * price).

Expected Output:
prod_name total_sales
Product A 15000.00
Product B 12000.00
Product C 10000.00

You are provided with a table transactions that logs Paytm users' transaction details.
Write a query to find the top 3 transactions (by amount) made by each user in every
month.

Table Structure:

transactions

transaction_id (INT)
user_id (INT)
transaction_date (DATE)
transaction_amount (DECIMAL)

The result should display:


user_id
month
transaction_id
transaction_amount

The output must be ordered by user_id, month, and transaction_amount (in descending
order).

Problem: You are given a table Sales with the following columns:

sale_id: Unique identifier for each sale.


product_id: ID of the product sold.
customer_id: ID of the customer who made the purchase.
sale_date: Date of the sale.
sale_amount: Total amount of the sale.

Table: Sales
sid pid cid sale_date sale_amount
1 101 1001 2023-09-01 500
2 102 1002 2023-09-02 300
3 101 1001 2023-09-03 400
4 103 1003 2023-09-04 700
5 102 1002 2023-09-05 600

Write a query to find customers who have made more than one purchase on different
dates and calculate their total purchase amount.

Problem: You have two tables -

-Employees
- Salaries

Employees Table:
EID Name Dept
1 John HR
2 Sarah IT
3 Mark Sales
4 Jane IT
5 Bob Sales

Salaries Table:
EID Sal
1 70000
2 95000
3 60000
4 105000
5 75000
Write an SQL query to find the second highest salary in each department.
Activate to view larger image,

Problem: You have a table named Employee with the following columns:

- EmployeeID
- DepartmentID
- Salary
- JoiningDate

Write a query to find the rank of each employee's salary within their
respective departments.

Table Structure:

EID DID Sal JoiningDate


1 101 60000 2022-01-10
2 101 75000 2021-03-15
3 102 50000 2020-06-22
4 102 60000 2023-01-05
5 101 55000 2020-11-30
Activate to view larger image,

Problem: You are given two tables -> Orders and Customers.

Orders:
OrderID: Unique identifier for each order.
CustomerID: Unique identifier for each customer.
OrderAmount: The total amount of the order.
OrderDate: The date the order was placed.

Customers:
CustomerID: Unique identifier for each customer.
CustomerName: The name of the customer.
City: The city the customer resides in.

Write an SQL query to find the CustomerID and CustomerName of all


customers who have placed more than three orders with an order amount
greater than $500. Additionally, display the total number of orders they have
placed.
Activate to view larger image,
1. Joins and Relationships:
- Can you explain the different types of joins (INNER, LEFT, RIGHT, FULL) and provide an
example of when to use each?

- Write a query to find the top 5 products by sales.

- How would you optimize a query with multiple joins on large tables?

2. Aggregations and Grouping:

- How would you calculate the Year-over-Year growth in sales for each product
category?

- Write a query to calculate the total sales for each region.

- How do window functions work in SQL (e.g., ROW_NUMBER, RANK, SUM)?

3. Subqueries and CTEs:

- What is the difference between a subquery and a Common Table Expression (CTE)?

- Write a query using a CTE to find the second-highest revenue-generating store.

4. Data Manipulation:

- Write a query to update prices for a product category by 10%.

- How would you delete duplicate records from a table while keeping the latest entry?

5. Database Design:

- Explain normalization and why it is important in database design.

- What are indexes, and how do they impact query performance?

6. Practical Scenario:

- Suppose Walmart wants to analyze customer purchase behavior based on


transactional data. How would you design a query to identify the top customers by
spend for a given period?

2. Write a query to find the nth highest salary from a table Employees with
columns EmployeeID, Name, and Salary.

SELECT DISTINCT Salary


FROM Employees
ORDER BY Salary DESC
LIMIT 1 OFFSET n-1;
Replace `n` with the desired rank (e.g., 2 for the second highest).

3. Given a table Sales with columns SaleID, ProductID, SaleDate, and Quantity,
write a query to find the total quantity sold for each product per month.
SELECT ProductID, DATE_TRUNC('month', SaleDate) AS Month, SUM(Quantity) AS
TotalQuantity
FROM Sales
GROUP BY ProductID, Month
ORDER BY ProductID, Month;

4. Write a SQL query to find all employees who have more than one manager.
Assume you have a table Employees (EmployeeID, Name, ManagerID).
SELECT EmployeeID, Name
FROM Employees
GROUP BY EmployeeID, Name
HAVING COUNT(DISTINCT ManagerID) > 1;

5. Given a table Orders with columns OrderID, CustomerID, OrderDate, and a


table OrderDetails with columns OrderID, ProductID, Quantity, write a query to
find the top 3 products with the highest sales quantity.
SELECT ProductID, SUM(Quantity) AS TotalQuantity
FROM OrderDetails
GROUP BY ProductID
ORDER BY TotalQuantity DESC
LIMIT 3;

6. Write a SQL query to find the second most recent order date for each
customer from a table Orders (OrderID, CustomerID, OrderDate).
SELECT CustomerID, MAX(OrderDate) AS SecondRecentOrderDate
FROM Orders
WHERE OrderDate < (SELECT MAX(OrderDate) FROM Orders o2 WHERE
o2.CustomerID = Orders.CustomerID)
GROUP BY CustomerID;

7. Given a table Products with columns ProductID, Name, Price, and a table Sales
with columns SaleID, ProductID, Quantity, write a query to find the product with
the highest revenue.
SELECT p.ProductID, p.Name, SUM(p.Price * s.Quantity) AS Revenue
FROM Products p
JOIN Sales s ON p.ProductID = s.ProductID
GROUP BY p.ProductID, p.Name
ORDER BY Revenue DESC
LIMIT 1;

SQL:
Basic
➡️SELECT statements with WHERE, ORDER BY, GROUP BY, HAVING
➡️Basic JOINS (INNER, LEFT, RIGHT, FULL)
➡️Creating and using simple databases and tables
Intermediate
➡️Aggregate functions (COUNT, SUM, AVG, MAX, MIN)
➡️Subqueries and nested queries
➡️Common Table Expressions (WITH clause)
➡️CASE statements for conditional logic in queries
Advanced
➡️Advanced JOIN techniques (self-join, non-equi join)
➡️Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK,
LEAD, LAG)
➡️Optimization with indexing
➡️Data manipulation (INSERT, UPDATE, DELETE)

Q2: Given the employee table (EmpID, ManagerID, JoinDate, Dept, Salary), write a
query to find the nth highest salary.

SELECT DISTINCT Salary


FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET n-1;

Some important topics for a SQL Interview Here is the list for the topics for
the SQL interviews

1.Window Functions & Their Variations: Know the differences between


ROW_NUMBER(), RANK(), and DENSE_RANK().

2.Creating and Utilizing CTEs (Common Table Expressions): Learn to use


CTEs to simplify complex queries and enhance readability.

6.Aggregation Functions as Window Functions: Explore advanced operations


by nesting aggregate functions within window functions for more detailed
analytical queries.

3. Commonly Asked Queries: Practice solving frequent problems such as


finding the nth highest salary, calculating cumulative totals, using LEAD()
and LAG() functions, and effectively implementing self-joins and other join
types. Subqueries and Their Application:

4.Understand how to use nested queries for complex data manipulations or


filtering. Indexing and Its Importance in Query Performance: Recognize how
indexing improves data retrieval speed by optimizing search pathways.
5.Handling NULL Values in SQL: Master techniques for managing NULL values
to ensure accurate data processing in queries. Joins vs. Subqueries:
Differentiate between joins and subqueries to design efficient queries based
on data relationships and performance considerations.
Your document has finished loading

1. Data Types: Familiarize yourself with common data types (e.g., integers, strings,
dates) and how they are used.
Basic Commands: Be comfortable with SELECT, INSERT, UPDATE, DELETE, and
WHERE clauses.

2. Complex Queries
Prepare to write and understand complex queries involving:
Joins: Know how to perform INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN to
combine data from multiple tables.
Subqueries: Understand how to nest queries and when to use them effectively.
Common Table Expressions (CTEs): Learn how to use CTEs for better readability
and organization of your SQL code.

3. Data Aggregation and Grouping


Be proficient in using aggregate functions (e.g., COUNT, SUM, AVG) and the
GROUP BY clause to summarize data. Practice writing queries that require
filtering aggregated results using the HAVING clause.

4. Window Functions
Get comfortable with window functions, which allow you to perform calculations
across a set of table rows related to the current row. This is particularly useful for
running totals, moving averages, and ranking.

5. Data Manipulation
Know how to effectively manipulate data using SQL:
Insertions and Updates: Be prepared to write queries that add new records or
update existing ones based on specific conditions.
Transactions: Understand the basics of SQL transactions, including the concepts
of COMMIT and ROLLBACK for ensuring data integrity.

6. Database Design Principles


Familiarize yourself with basic database design concepts, such as normalization
and the importance of primary and foreign keys. Understanding how data is
structured will help you write more efficient queries.

7. Performance Optimization
Learn about query optimization techniques, including:
Indexing: Understand how indexes improve query performance and when to use
them.
Query Execution Plans: Get comfortable reading execution plans to identify
performance bottlenecks in your SQL queries.

8. Hands-On Practice
Finally, practice is key. Work on real datasets to build your confidence and skills
in writing SQL queries. Consider using platforms like SQLZoo, LeetCode, or
Kaggle to find exercises and projects.
Problem: You have two tables:

Employees
emp_id (INT)
name (VARCHAR)
department_id (INT)

Departments
department_id (INT)
department_name (VARCHAR)

Write an SQL query to find the names of all employees who work in a department with
at least 3 employees.

Here's my step-by-step explanation:

Step 1: Extract the Month

We use the EXTRACT(MONTH FROM sale_date) function to pull out the month from the
sale_date. This will allow us to display the sales totals by month.

Step 2: Running Total Calculation

The core part of the solution is using the SUM() OVER() window function.

We calculate the running total using SUM(sale_amount) while partitioning by product_id.


This ensures that each product's sales are summed individually.

Step 3: Order the results

The ORDER BY inside the window function ensures that the sum is computed in the
correct order, based on the month the sales occurred.

Step 4: Final Output

The query outputs the month, product_id, and the computed running_total for each
product, ordered by both month and product for readability.

Problem: You are given a table Employee that stores information about employees and
their managers in the following structure:

EID Name MID


1 Alice NULL
2 Bob 1
3 Charlie 1
4 David 2
5 Eve 2
6 Frank 3

➡️EmployeeID: Unique identifier for each employee.


➡️Name: Name of the employee.

➡️ManagerID: References the EmployeeID of the manager. If NULL, the employee has no
manager (i.e., they are at the top of the hierarchy).

Write an SQL query to find all employees who directly or indirectly report to Alice.
Return their Name and EmployeeID.

1. How do you find duplicate records in a table based on a particular column?

SELECT column_name, COUNT(*)


FROM your_table
GROUP BY column_name
HAVING COUNT(*) > 1;
2. How would you join three tables — customers, orders, and products — to get a list of
all customers, their orders, and product names for each order?

SELECT c.customer_id, c.customer_name, o.order_id, p.product_name


FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

3.How do you calculate the total sales for each product in the last 30 days?

SELECT p.product_id, p.product_name, SUM(oi.quantity * oi.unit_price) AS total_sales


FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY p.product_id, p.product_name;

4. How do you find customers who have made orders in both 2023 and 2024?

SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2023
INTERSECT
SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2024;

5. How do you calculate the total sales, ensuring that NULL values are treated as 0?

SELECT product_id,
SUM(COALESCE(sales_amount, 0)) AS total_sales
FROM sales
GROUP BY product_id;

6. How do you calculate the total sales for each employee using a CTE?
WITH EmployeeSales AS (
SELECT employee_id, SUM(order_amount) AS total_sales
FROM orders
GROUP BY employee_id
)
SELECT e.employee_id, e.employee_name, es.total_sales
FROM employees e
JOIN EmployeeSales es ON e.employee_id = es.employee_id;

1.Find the second highest salary from the Employees table.

SELECT MAX(Salary) AS SecondHighestSalary


FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

2.Find employees who earn more than the average salary.

SELECT Name, Salary


FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

3.List employees who have worked in more than one department.

SELECT Emp_ID
FROM EmployeeDepartments
GROUP BY Emp_ID
HAVING COUNT(DISTINCT Dept_ID) > 1;

4. Find the department with the highest average salary.

SELECT Dept_ID
FROM Employees
GROUP BY Dept_ID
ORDER BY AVG(Salary) DESC
LIMIT 1;

5.Find the cumulative sum of salary for each employee, ordered by name.

SELECT Emp_ID, Name, Salary,


SUM(Salary) OVER (ORDER BY Name) AS CumulativeSalary
FROM Employees;

6.Find the total number of employees per department.

SELECT Dept_ID, COUNT(*) AS TotalEmployees


FROM Employees
GROUP BY Dept_ID;

1.CURDATE() / CURRENT_DATE
Returns the current date (without time).
SELECT CURDATE(); -- Output: 2024-11-19
2.NOW()
Returns the current date and time.
SELECT NOW(); -- Output: 2024-11-19 15:30:00

3.DATE_ADD()
Adds a specific time interval (like days, months, etc.) to a date.
SELECT DATE_ADD('2024-11-19', INTERVAL 5 DAY); -- Output: 2024-11-24

4.DATE_SUB()
Subtracts a time interval from a date.
SELECT DATE_SUB('2024-11-19', INTERVAL 10 DAY); -- Output: 2024-11-09

5.DATEDIFF()
Calculates the difference between two dates (returns days).
SELECT DATEDIFF('2024-11-19', '2024-11-10'); -- Output: 9

6.YEAR()
Extracts the year from a date.
SELECT YEAR('2024-11-19'); -- Output: 2024

7.MONTH()
Extracts the month from a date.
SELECT MONTH('2024-11-19'); -- Output: 11

8.DAY()
Extracts the day of the month from a date.
SELECT DAY('2024-11-19'); -- Output: 19

9.EXTRACT()
Extracts a part of a date (year, month, day, etc.).
SELECT EXTRACT(YEAR FROM '2024-11-19'); -- Output: 2024
SELECT EXTRACT(MONTH FROM '2024-11-19'); -- Output: 11

🔟 DATE_FORMAT()
Formats a date according to a specified format.
SELECT DATE_FORMAT('2024-11-19', '%M %d, %Y'); -- Output: November 19, 2024

Q1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using
example. use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary)

Q2: Find the nth highest salary from the Employee table.
Q3: You have an employee table with employee ID and manager ID. Find all employees under a specific
manager, including their subordinates at any level.
Q4: Write a query to find the cumulative salary of employees department-wise, who have joined
company in last 30 days.
Q5: Find the top 2 customers with the highest order amount for each product category, handling ties
appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)
Calculate the average salary for each department from the table.
Write a SQL query to display the employee’s name along with their manager’s name using a self-join on
the ‘employees’ table, which contains ‘emp_id’, ‘name’, and ‘manager_id’ columns.

Find the most recent hire for each department (solved using LEAD/LAG functions).

Write a query to retrieve the nth highest salary from the Employees table, which has ‘EmployeeID’,
‘Name’, and ‘Salary’ columns.

Write a query to calculate the total revenue generated by each region.


Display the names of employees who have a salary above the average salary in their department.

Identify the second highest salary in each department from the ‘employees’ table, which has ‘emp_id’,
‘department_id’, and ‘salary’ columns.

Write a SQL query to find employees who have not had any recent sales in the last 3 months.

How would you optimize a slow-running query with multiple joins?


What is a recursive CTE, and can you provide an example of when to use it?
Explain the difference between clustered and non-clustered indexes and when to use each.

Write a query to find the second highest salary in each department.


How would you detect and resolve deadlocks in SQL?
Explain window functions and provide examples of ROW_NUMBER, RANK, and DENSE_RANK.

Describe the ACID properties in database transactions and their significance.


Write a query to calculate a running total with partitions based on specific conditions.
Describe a scenario where you used SQL to analyze customer data. What insights did you uncover?

Rate your SQL skills on a scale of 1-10 and provide examples of advanced queries you’ve written.

Write a query to identify the second-highest salary in each department.


Explain the concept of JOINs and provide a query that joins three tables (Orders, Customers, Products)
to find the top 5 customers by revenue.

What is a Common Table Expression (CTE) in SQL, and when would you use it?
Write a CTE query to calculate cumulative monthly sales.

Write an SQL query to find all employees whose salaries are above the department average.
Describe your approach to optimizing SQL queries. Can you share an example where optimization made
a noticeable difference?

Write a SQL query to find the third most recent order date for each customer from a table Orders
(OrderID, CustomerID, OrderDate).

Write a query to find the employee with the second-highest salary in a departmentwise ranking.

Explain the difference between WHERE and HAVING clauses in SQL.


Given a table Sales with columns SaleID, ProductID, Quantity, and Price, write a query to find the
product with the highest total sales revenue.

Write a query to calculate the cumulative sales for each product category in the last 90 days.

1. Explain the outputs for all the joins.


2. How would you write a query to find the top five products by sales?
3. Explain the purpose of indexes in SQL. How do they improve query performance?
4. Can you describe a situation where you had to optimize a slow-running SQL query?
5. Q1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()
functions using example.

Q2: Use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary)

Q3: Find the nth highest salary from the Employee table.

Q4: You have an employee table with employee ID and manager ID. Find all
employees under a specific manager, including their subordinates at any level.

Q5: Write a query to find the cumulative salary of employees department-wise, who
have joined company in last 30 days.

Q6: Find the top 2 customers with the highest order amount for each product
category, handling ties appropriately. Table: customer (CustomerID, ProductCategory,
OrderAmount)
1. Write a SQL query to find the third most recent order date for each customer from a
table Orders (OrderID, CustomerID, OrderDate).
2. Write a query to find the employee with the second-highest salary in a department-
wise ranking.
3. Explain the difference between WHERE and HAVING clauses in SQL.
4. Given a table Sales with columns SaleID, ProductID, Quantity, and Price, write a query
to find the product with the highest total sales revenue.
5. Write a query to calculate the cumulative sales for each product category in the last
90 days.
6.
7.
Write a SQL query to perform a running total that resets based on a certain condition
(e.g., monthly running total that resets each year)?
Write a SQL query to identify and handle duplicate records in a table without using
the ROW_NUMBER() function?
Write a SQL query to find the second highest salary for each department, handling
ties appropriately?
8.
Write a query to calculate the cumulative percentage of sales by product category.
Generate a report that lists the top 5 products by revenue for each quarter over the
past 3 years.
Write a recursive query to find the hierarchical path from the CEO to each employee in
an organization.
Create a query to identify anomalies in monthly expenditure across departments.
Write a query to perform a full outer join on two tables and then filter the results to
show only mismatched records.
Construct a query to dynamically pivot data where the columns are derived from a
subquery.
Write a query to calculate the moving average of sales for the past 6 months for each
product.
Generate a query to compare the sales performance of products before and after a
specific marketing campaign.

You might also like