Course Pack - Introduction To Databases
Course Pack - Introduction To Databases
Study Material
Bachelor in
Data Science
Subject
Introduction to Databases
Faculty
Nitish Patil
Page 1
School of Data Science Introduction to Databases
A. COURSE DESCRIPTION
This course is a semester long, project based curriculum based on SQL in Data Sceince that
develops proficient skills in the field of Data Analytics with the use of Databases like
MySQL,Oracle. Each Database project has a research and development process from project
planning to final outcome as ready for client delivery. Students will gain real world project
experience throughout their learning cycle that help them to better understand the roles
and processes in wide range of Data Science careers.
B. LEARNING OBJECTIVES
C. LEARNING OUTCOME
1. Employee Database
2. Banking Database
3. University Database
a. https://dev.mysql.com/doc/refman/8.0/en/tutorial.html
b. https://www.w3schools.com/mysql/default.asp
Page 2
School of Data Science Introduction to Databases
Database
A database is an organized collection of structured information, or data, typically stored electronically in
a computer system. A database is usually controlled by a database management system (DBMS).
Together, the data and the DBMS, along with the applications that are associated with them, are referred
to as a database system, often shortened to just database.
Data within the most common types of databases in operation today is typically modeled in rows and
columns in a series of tables to make processing and data querying efficient. The data can then be easily
accessed, managed, modified, updated, controlled, and organized. Most databases use structured query
language (SQL) for writing and querying data.
Types of databases
There are many different types of databases. The best database for a specific organization depends on
how the organization intends to use the data.
Relational databases
Relational databases became dominant in the 1980s. Items in a relational database are organized as a
set of tables with columns and rows. Relational database technology provides the most efficient and
flexible way to access structured information.
Object-oriented databases
Information in an object-oriented database is represented in the form of objects, as in object-oriented
programming.
Page 3
School of Data Science Introduction to Databases
Distributed databases
A distributed database consists of two or more files located in different sites. The database may be
stored on multiple computers, located in the same physical location, or scattered over different
networks.
Data warehouses
A central repository for data, a data warehouse is a type of database specifically designed for fast
query and analysis.
NoSQL databases
A NoSQL, or nonrelational database, allows unstructured and semistructured data to be stored and
manipulated (in contrast to a relational database, which defines how all data inserted into the database
must be composed). NoSQL databases grew popular as web applications became more common and
more complex.
Graph databases
A graph database stores data in terms of entities and the relationships between entities.
OLTP databases. An OLTP database is a speedy, analytic database designed for large numbers of
transactions performed by multiple users.
These are only a few of the several dozen types of databases in use today. Other, less common databases
are tailored to very specific scientific, financial, or other functions. In addition to the different database
types, changes in technology development approaches and dramatic advances such as the cloud and
automation are propelling databases in entirely new directions. Some of the latest databases include
Cloud databases
A cloud database is a collection of data, either structured or unstructured, that resides on a private,
public, or hybrid cloud computing platform. There are two types of cloud database models: traditional
and database as a service (DBaaS). With DBaaS, administrative tasks and maintenance are performed
by a service provider.
Multimodel database
Multimodel databases combine different types of database models into a single, integrated back end.
This means they can accommodate various data types.
Document/JSON database
Designed for storing, retrieving, and managing document-oriented information, document
databases are a modern way to store data in JSON format rather than rows and columns.
Page 4
School of Data Science Introduction to Databases
Self-driving databases
The newest and most groundbreaking type of database, self-driving databases (also known as
autonomous databases) are cloud-based and use machine learning to automate database tuning,
security, backups, updates, and other routine management tasks traditionally performed by database
administrators.
Clauses are in-built functions available to us in SQL. With the help of clauses, we can deal with data
easily stored in the table. Clauses help us filter and analyze data quickly. When we have large amounts of
data stored in the database, we use Clauses to query and get data required by the user.
Page 5
School of Data Science Introduction to Databases
Normalization
Normalization is a database design technique that reduces data redundancy and eliminates undesirable
characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables
into smaller tables and links them using relationships. The purpose of Normalisation in SQL is to
eliminate redundant (repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal
Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Create Table
Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
The column parameters specify the names of the columns of the table.
The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date,
etc.).
The following example creates a table called "Persons" that contains five columns: PersonID,
LastName, FirstName, Address, and City:
Example
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
Page 6
School of Data Science Introduction to Databases
The INSERT INTO SELECT statement copies data from one table and inserts it into another
table.
The INSERT INTO SELECT statement requires that the data types in source and target tables
matches.
Copy only some columns from one table into another table:
Page 7
School of Data Science Introduction to Databases
"Suppliers" table:
2 New Orleans Cajun Shelley Burke P.O. Box New 70117 USA
Delights 78934 Orleans
3 Grandma Kelly's Regina Murphy 707 Oxford Ann Arbor 48104 USA
Homestead Rd.
The following SQL statement copies "Suppliers" into "Customers" (the columns that are not
filled with data, will contain NULL):
Example
INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country FROM Suppliers;
The following SQL statement copies "Suppliers" into "Customers" (fill all columns):
Example
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
SELECT SupplierName, ContactName, Address, City, PostalCode, Country FROM Suppliers;
Page 8
School of Data Science Introduction to Databases
The following SQL statement copies only the German suppliers into "Customers":
Example
INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country FROM Suppliers
WHERE Country='Germany';
Syntax
CREATE DATABASE databasename;
Example
CREATE DATABASE testDB;
WHERE Clause
WHERE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: The WHERE clause is not only used in SELECT statements, it is also used
in UPDATE, DELETE, etc.!
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
Page 9
School of Data Science Introduction to Databases
3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México 05023 Mex
D.F.
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
Example
SELECT * FROM Customers
WHERE Country = 'Mexico';
SQL requires single quotes around text values (most database systems will also allow double quotes).
Example
SELECT * FROM Customers
WHERE CustomerID = 1;
Page
10
School of Data Science Introduction to Databases
Logical Operators
AND, OR and NOT Operators
The WHERE clause can be combined with AND, OR, and NOT operators.
The AND and OR operators are used to filter records based on more than one condition:
The AND operator displays a record if all the conditions separated by AND are TRUE.
The OR operator displays a record if any of the conditions separated by OR is TRUE.
AND Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
OR Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
NOT Syntax
SELECT column1, column2, ...
FROM table_name
WHERE NOT condition;
ORDER BY Keyword
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC
Page
11
School of Data Science Introduction to Databases
LIMIT Clause
The LIMIT clause is useful on large tables with thousands of records. Returning a large number
of records can impact performance.
LIMIT Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;
Page
12
School of Data Science Introduction to Databases
CONCAT(column1|expression1
,
column2|expression2)
Concatenates the first character value to the second character
value; equivalent to concatenation operator (||)
SUBSTR(column|expression,m
[,n])
Returns specified characters from character value starting at
character position m, n characters long (If m is negative, the
count starts from the end of the character value. If n is
omitted, all characters to the end of the string are returned.)
Page
13
School of Data Science Introduction to Databases
TRIM() Function
TRIM is a String function of Oracle. This function is used to remove the specified character from
head of the string or tail of the string.
Syntax
Parameters
BOTH : it will trim from head as well as from tail of the string
LENGTH() Function
LENGTH is a String function of Oracle. This function returns the size of the given string.
Syntax
1. LENGTH( string1 )
Parameters
Return
Example 1
Page
14
School of Data Science Introduction to Databases
The boolean data type can not be specified during table creation, unlike other data types. Boolean
expressions are mainly used with WHERE clauses to filter the data from a table. It can include
comparison operators and other operators like ‘AND’ operator, ‘OR’ operator, etc.
CONCAT Function
The CONCAT function in SQL is a String function, which is used to merge two or more strings. The Concat
service converts the Null values to an Empty string when we display the result. This function is used to
concatenate two strings to make a single string. The operator is used to link character strings and column
string.
We can use a literal in CONCAT Function. A literal is a number, character, or date that includes the
SELECT statement.
Example-
1. SELECT CONCAT (id , name , work_date )
2. ->FROM employee_ tbl;
3. CONCAT(id, name, work_date)
Stringfunctions:
String Fucntions are used to perform an operation on input string and return an output string.
Following are the string functions defined in SQL:
1. ASCII(): This function is used to find the ASCII value of a character.
2. Syntax: SELECT ascii('t');
Output: 116
3. CHAR_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is
used to find the length of a word.
4. Syntax: SELECT char_length('Hello!');
Output: 6
5. CHARACTER_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This
function is used to find the length of a line.
6. Syntax: SELECT CHARACTER_LENGTH('geeks for geeks');
Output: 15
7. CONCAT(): This function is used to add two words or strings.
8. Syntax: SELECT 'Geeks' || ' ' || 'forGeeks' FROM dual;
Output: ‘GeeksforGeeks’
9. CONCAT_WS(): This function is used to add two words or strings with a symbol as concatenating
symbol.
10. Syntax: SELECT CONCAT_WS('_', 'geeks', 'for', 'geeks');
Page
15
School of Data Science Introduction to Databases
Output: geeks_for_geeks
11. FIND_IN_SET(): This function is used to find a symbol from a set of symbols.
12. Syntax: SELECT FIND_IN_SET('b', 'a, b, c, d, e, f');
Output: 2
13. FORMAT(): This function is used to display a number in the given format.
14. Syntax: Format("0.981", "Percent");
Output: ‘98.10%’
15. INSERT(): This function is used to insert the data into a database.
16. Syntax: INSERT INTO database (geek_id, geek_name) VALUES (5000, 'abc');
Output: successfully updated
17. INSTR(): This function is used to find the occurrence of an alphabet.
18. Syntax: INSTR('geeks for geeks', 'e');
Output: 2 (the first occurrence of ‘e’)
Syntax: INSTR('geeks for geeks', 'e', 1, 2 );
Output: 3 (the second occurrence of ‘e’)
19. LCASE(): This function is used to convert the given string into lower case.
20. Syntax: LCASE ("GeeksFor Geeks To Learn");
Output: geeksforgeeks to learn
21. LEFT(): This function is used to SELECT a sub string from the left of given size or characters.
22. Syntax: SELECT LEFT('geeksforgeeks.org', 5);
Output: geeks
23. LENGTH(): This function is used to find the length of a word.
24. Syntax: LENGTH('GeeksForGeeks');
Output: 13
25. LOCATE(): This function is used to find the nth position of the given word in a string.
26. Syntax: SELECT LOCATE('for', 'geeksforgeeks', 1);
Output: 6
27. LOWER(): This function is used to convert the upper case string into lower case.
28. Syntax: SELECT LOWER('GEEKSFORGEEKS.ORG');
Output: geeksforgeeks.org
29. LPAD(): This function is used to make the given string of the given size by adding the given
symbol.
30. Syntax: LPAD('geeks', 8, '0');
31. Output:
000geeks
32. LTRIM(): This function is used to cut the given sub string from the original string.
33. Syntax: LTRIM('123123geeks', '123');
Output: geeks
34. MID(): This function is to find a word from the given position and of the given size.
35. Syntax: Mid ("geeksforgeeks", 6, 2);
Output: for
36. POSITION(): This function is used to find position of the first occurrence of the given alphabet.
37. Syntax: SELECT POSITION('e' IN 'geeksforgeeks');
Output: 2
38. REPEAT(): This function is used to write the given string again and again till the number of times
mentioned.
39. Syntax: SELECT REPEAT('geeks', 2);
Page
16
School of Data Science Introduction to Databases
Output: geeksgeeks
40. REPLACE(): This function is used to cut the given string by removing the given sub string.
41. Syntax: REPLACE('123geeks123', '123');
Output: geeks
42. REVERSE(): This function is used to reverse a string.
43. Syntax: SELECT REVERSE('geeksforgeeks.org');
Output: ‘gro.skeegrofskeeg’
44. RIGHT(): This function is used to SELECT a sub string from the right end of the given size.
45. Syntax: SELECT RIGHT('geeksforgeeks.org', 4);
Output: ‘.org’
46. RPAD(): This function is used to make the given string as long as the given size by adding the
given symbol on the right.
47. Syntax: RPAD('geeks', 8, '0');
Output: ‘geeks000’
48. RTRIM(): This function is used to cut the given sub string from the original string.
49. Syntax: RTRIM('geeksxyxzyyy', 'xyz');
Output: ‘geeks’
50. SPACE(): This function is used to write the given number of spaces.
51. Syntax: SELECT SPACE(7);
Output: ‘ ‘
52. STRCMP(): This function is used to compare 2 strings.
If string1 and string2 are the same, the STRCMP function will return 0.
If string1 is smaller than string2, the STRCMP function will return -1.
If string1 is larger than string2, the STRCMP function will return 1.
53. Syntax: SELECT STRCMP('google.com', 'geeksforgeeks.com');
Output: -1
54. SUBSTR(): This function is used to find a sub string from the a string from the given position.
55. Syntax:SUBSTR('geeksforgeeks', 1, 5);
Output: ‘geeks’
56. SUBSTRING(): This function is used to find an alphabet from the mentioned size and the given
string.
57. Syntax: SELECT SUBSTRING('GeeksForGeeks.org', 9, 1);
Output: ‘G’
58. SUBSTRING_INDEX(): This function is used to find a sub string before the given symbol.
59. Syntax: SELECT SUBSTRING_INDEX('www.geeksforgeeks.org', '.', 1);
Output: ‘www’
60. TRIM(): This function is used to cut the given symbol from the string.
61. Syntax: TRIM(LEADING '0' FROM '000123');
Output: 123
62. UCASE(): This function is used to make the string in upper case.
63. Syntax: UCASE ("GeeksForGeeks");
64. Output:
GEEKSFORGEEKS
Group Functions
• DISTINCT makes the function consider only nonduplicate values; ALL makes it consider every
value including duplicates. The default is ALL and therefore does not need to be specified.
Page
17
School of Data Science Introduction to Databases
• The data types for the functions with an expr argument may be CHAR, VARCHAR2, NUMBER,
or DATE.
• All group functions ignore null values. To substitute a value for null values, use the NVL, NVL2,
or COALESCE functions.
• The Oracle server implicitly sorts the result set in ascending order when using a GROUP BY
clause. To override this default ordering, DESC can be used in an ORDER BY clause.
AVG, SUM, MIN, and MAX functions against columns that can store numeric data. The
example on the slide displays the average, highest, lowest, and sum of monthly salaries for all sales
representatives.
COUNT Function
The COUNT function has three formats:
• COUNT(*)
• COUNT(expr)
• COUNT(DISTINCT expr)
COUNT(*) returns the number of rows in a table that satisfy the criteria of the SELECT statement,
including duplicate rows and rows containing null values in any of the columns. If a WHERE clause is
included in the SELECT statement, COUNT(*) returns the number of rows that satisfies the condition
in the WHERE clause.
In contrast, COUNT(expr) returns the number of non-null values in the column identified by expr.
COUNT(DISTINCT expr) returns the number of unique, non-null values in the column identified
by expr.
SELECT COUNT(*)
FROM employees
WHERE department_id = 50;
Page
18
School of Data Science Introduction to Databases
6. .
7. WHEN comparison_exprn THEN return_exprn
8. ELSE else_expr]
END
DECODE Function : Facilitates conditional inquiries by doing the work of a CASE or IF-THEN-
ELSE statement.
The DECODE function decodes an expression in a way similar to the IF-THEN-ELSE logic used in
various languages. The DECODE function decodes expression after comparing it to each search value.
If the expression is the same as search, result is returned.
If the default value is omitted, a null value is returned where a search value does not match any of the
result values.
Page
19
School of Data Science Introduction to Databases
The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to
group the result-set by one or more columns.
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
GROUP BY Examples
The following SQL statement lists the number of customers in each country:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
The following SQL statement lists the number of customers in each country, sorted high to low:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.
HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
HAVING Examples
The following SQL statement lists the number of customers in each country. Only include countries with more than
5 customers:
Page
20
School of Data Science Introduction to Databases
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;
SQL statement lists the number of customers in each country, sorted high to low (Only include countries with more
than 5 customers):
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5
ORDER BY COUNT(CustomerID) DESC;
MySQL Aliases
Aliases are used to give a table, or a column in a table, a temporary name.
SQL statement creates two aliases, one for the CustomerID column and one for the CustomerName column:
Example
SELECT CustomerID AS ID, CustomerName AS Customer
FROM Customers;
Page
21
School of Data Science Introduction to Databases
Subqueries
A subquery is a SELECT statement that is embedded in a clause of another SELECT statement.
You
can build powerful statements out of simple ones by using subqueries. They can be very useful
when
you need to select rows from a table with a condition that depends on the data in the table itself.
You can place the subquery in a number of SQL clauses, including:
• The WHERE clause
• The HAVING clause
• The FROM clause
In the syntax:
operator includes a comparison condition such as >, =, or IN
Note: Comparison conditions fall into two classes: single-row operators (>, =, >=, <, <>, <=)
and
multiple-row operators (IN, ANY, ALL).
The subquery is often referred to as a nested SELECT, sub-SELECT, or inner SELECT
statement. The
subquery generally executes first, and its output is used to complete the query condition for the
main or outer query.
SELECT last_name
FROM employees
WHERE salary >
(SELECT salary
FROM employees
WHERE last_name = ’Abel’);
Multiple-Row Subqueries
• Return more than one row
• Use multiple-row comparison operators
Operator
IN
ANY
ALL
Meaning
Page
22
School of Data Science Introduction to Databases
Subqueries that return more than one row are called multiple-row subqueries. You use a
multiple-row operator, instead of a single-row operator, with a multiple-row subquery. The
multiple-row operator expects one or more values.
Example
Find the employees who earn the same salary as the minimum salary for each department.
The inner query is executed first, producing a query result. The main query block is then
processed and
uses the values returned by the inner query to complete its search condition. In fact, the main
query would
appear to the Oracle server as follows:
SELECT last_name, salary, department_id
FROM employees
WHERE salary IN (2500, 4200, 4400, 6000, 7000, 8300, 8600, 17000);
Page
23
School of Data Science Introduction to Databases
CASE Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;
10308 2 1996-09-18
10309 37 1996-09-19
Page
24
School of Data Science Introduction to Databases
10310 77 1996-09-20
Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the "Customers" table. The
relationship between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that have
matching values in both tables:
Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Page
25
School of Data Science Introduction to Databases
The following SQL statement selects all orders with customer information:
Example
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
SQL statement will select all customers, and any orders they might have:
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
Page
26
School of Data Science Introduction to Databases
The RIGHT JOIN keyword returns all records from the right table (table2), and the matching records (if any) from
the left table (table1).
Example
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
Page
27
School of Data Science Introduction to Databases
Self Join
A self join is a regular join, but the table is joined with itself.
Example
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
UNION Operator
The UNION operator is used to combine the result-set of two or more SELECT statements.
Every SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in every SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Page
28
School of Data Science Introduction to Databases
Example
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;
Note: If some customers or suppliers have the same city, each city will only be listed once, because UNION selects
only distinct values. Use UNION ALL to also select duplicate values!
Example
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers
ORDER BY City;
Example
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION
Page
29
School of Data Science Introduction to Databases
Example
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;
Example
SELECT 'Customer' AS Type, ContactName, City, Country
FROM Customers
UNION
SELECT 'Supplier', ContactName, City, Country
FROM Suppliers;
A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables
in the database.
You can add SQL statements and functions to a view and present the data as if the data were coming from one single
table.
Page
30
School of Data Science Introduction to Databases
Note: A view always shows up-to-date data! The database engine recreates the view, every time a user queries it.
Example
CREATE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName
FROM Customers
WHERE Country = 'Brazil';
Example
SELECT * FROM [Brazil Customers];
The following SQL creates a view that selects every product in the "Products" table with a price higher than the
average price:
Example
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
Example
SELECT * FROM [Products Above Average Price];
Updating a View
Page
31
School of Data Science Introduction to Databases
The following SQL adds the "City" column to the "Brazil Customers" view:
Example
CREATE OR REPLACE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName, City
FROM Customers
WHERE Country = 'Brazil';
Example
DROP VIEW [Brazil Customers];
Page
32
School of Data Science Introduction to Databases
Page
33
School of Data Science Introduction to Databases
DENSE_RANK() –
It assigns rank to each row within partition. Just like rank function first row is assigned rank 1 and
rows having same value have same rank. The difference between RANK() and DENSE_RANK() is
that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank
is skipped.
ROW_NUMBER() –
It assigns consecutive integers to all the rows within partition. Within a partition, no two rows can
have same row number.
Note –
ORDER BY() should be specified compulsorily while using rank window functions.
Example –
Calculate row no., rank, dense rank of employees is employee table according to salary within each
department.
SELECT
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC)
AS emp_row_no, Name, Department, Salary,
RANK() OVER(PARTITION BY Department
ORDER BY Salary DESC) AS emp_rank,
DENSE_RANK() OVER(PARTITION BY Department
ORDER BY Salary DESC)
AS emp_dense_rank,
FROM employee
The output of above query will be :
emp_row_n Departmen
o Name t Salary emp_rank emp_dense_rank
Page
34
School of Data Science Introduction to Databases
So, we can see that as mentioned in the definition of ROW_NUMBER() the row numbers are
consecutive integers within each partition. Also, we can see difference between rank and dense rank
that in dense rank there is no gap between rank values while there is gap in rank values after repeated
rank
The window frame is a set of rows related to the current row where the window function is used for
calculation. The window frame can be a different set of rows for the next row in the query result,
since it depends on the current row being processed. Every row in the result set of the query has its
own window frame.
In the rest of this article, we will show example queries based on a database of a car dealership
group. The group stores the sales information grouped by month in a table called monthly_car_sales.
Below is the table with some sample data:
monthly_car_sales
qua
ye mo ma mo typ reve
ntit
ar nth ke del e nue
y
Pic
20 For F10 2500
01 kU 40
21 d 0 000
p
Page
35
School of Data Science Introduction to Databases
qua
ye mo ma mo typ reve
ntit
ar nth ke del e nue
y
d 0 p 000
A simple way to create a window frame is by using an OVER clause with a PARTITION BY subclause.
In the following SQL example, we generate a report of revenue by make of the car for the year 2021.
SELECT make,
FROM monthly_car_sales
Page
36
School of Data Science Introduction to Databases
Given an ordered set of rows, FIRST_VALUE returns the value of the specified expression
with respect to the first row in the window frame. The LAST_VALUE function returns the
value of the expression with respect to the last row in the frame.
Syntax
FIRST_VALUE | LAST_VALUE
[ PARTITION BY expr_list ]
1
2
3
4
5
6
Page
37
School of Data Science Introduction to Databases
SELECT
[OrderQty],
LAG([OrderQty]) OVER(ORDER BY[OrderQty] DESC) AS[Lag "OrderQty"
Column]
FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
In the example above, the first column in the SQL query is the order quantity column, the
second column makes use of the LAG function, acting on the order quantity column. The
OVER() clause is then applied (because LAG and LEAD are window functions), wherein the
new column being formed - the [Lag “OrderQty” Column] - is ordered by order quantity in
descending order. The entire query is also ordered by the order quantity in descending order.
The result of this query will appear like this:
In the image above, in the Lag “Order Qty” column, the first value is NULL. This is because
the LAG function is supposed to return the last value from the current row. Because the first
value in the OrderQty column is 44, there is no value for the LAG function to return, hence
the NULL value in the first cell of the second column.
In the third row in the image above, the values for each of the columns are 40 and 41. The
value 41 in the second column is a previous value of the first column, brought forward. This is
how the LAG function works.
For the LEAD function, imagine a manager asks the data analyst to produce a query showing
all the order quantity values, along with each following order quantity value. The query for
that would look like this:
1
2
3
4
5
6
Page
38
School of Data Science Introduction to Databases
SELECT
[OrderQty],
LEAD([OrderQty]) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty"
Column]
FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
The results of the query are similar to the previous one, with the exception of the LEAD
function being exchanged with the LAG function.
The result of the query above would appear like this:
In the image above, there’s no NULL value appearing in the results set, but there is a
difference. In row one, the value 41, in the Lead “OrderQty” column is alongside the value 44
in the OrderQty column. The value 41 is brought forward by the LEAD function from the
second row where 41 resides in the OrderQty column. This is how the LEAD function works.
It is also possible to LEAD or LAG by a specific number of rows. What this means is that,
with the LEAD function, for example, I can specify to start bringing forward values starting
from after the next N rows. Let’s look at another example.
1
2
3
4
5
6
SELECT
[OrderQty],
LEAD([OrderQty], 2) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty"
Column]
Page
39
School of Data Science Introduction to Databases
FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
In the query above, the LEAD function is being used to bring values forward, not from the
current row, but starting from after the next two rows. The result of this query will be this:
In the image above, both rows one and two of the second column contain the value 40. This is
because in column one (OrderQty) the third and fourth values are 40 and 40. This is how
offsetting the values in the LAG and LEAD function work.
ROLLUP
Syntax
ROLLUP appears in the GROUP BY clause in a SELECT statement. Its form is:
ROLLUP's action is straightforward: it creates subtotals which "roll up" from the most detailed
level to a grand total, following a grouping list specified in the ROLLUP clause. ROLLUP takes
Page
40
School of Data Science Introduction to Databases
as its argument an ordered list of grouping columns. First, it calculates the standard aggregate
values specified in the GROUP BY clause. Then, it creates progressively higher-level subtotals,
moving from right to left through the list of grouping columns. Finally, it creates a grand total.
ROLLUP will create subtotals at n+1 levels, where n is the number of grouping columns. For
instance, if a query specifies ROLLUP on grouping columns of Time, Region, and Department
( n=3), the result set will include rows at four aggregation levels.
Example
This example of ROLLUP uses the data in the video store database.
CUBE
Note that the subtotals created by ROLLUP are only a fraction of possible subtotal
combinations.the departmental totals across regions (279,000 and 319,000) would not be
calculated by a ROLLUP(Time, Region, Department) clause. To generate those numbers would
require a ROLLUP clause with the grouping columns specified in a different
order: ROLLUP(Time, Department, Region). The easiest way to generate the full set of subtotals
needed for cross-tabular reports such as those needed for is to use the CUBE extension.
CUBE enables a SELECT statement to calculate subtotals for all possible combinations of a
group of dimensions. It also calculates a grand total. This is the set of information typically
needed for all cross-tabular reports, so CUBE can calculate a cross-tabular report with a
single SELECT statement. Like ROLLUP, CUBE is a simple extension to
the GROUP BY clause, and its syntax is also easy to learn.
Syntax
CUBE appears in the GROUP BY clause in a SELECT statement. Its form is:
Page
41