0% found this document useful (0 votes)
46 views

Course Pack - Introduction To Databases

This document provides an introduction to databases. It describes a semester-long course on databases and data science that uses projects to develop skills in SQL, MySQL, and Oracle. The course objectives are to identify database applications and tools, develop queries and joining tables, and demonstrate data analytics skills. Learning outcomes include identifying database queries, implementing SQL, creating and using databases, and performing operations like inserting, updating, and deleting values. The document also defines what a database is, compares databases to spreadsheets, describes different types of databases, explains structured query language and clauses, defines MySQL databases, and describes database normalization.

Uploaded by

nitish.patil
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Course Pack - Introduction To Databases

This document provides an introduction to databases. It describes a semester-long course on databases and data science that uses projects to develop skills in SQL, MySQL, and Oracle. The course objectives are to identify database applications and tools, develop queries and joining tables, and demonstrate data analytics skills. Learning outcomes include identifying database queries, implementing SQL, creating and using databases, and performing operations like inserting, updating, and deleting values. The document also defines what a database is, compares databases to spreadsheets, describes different types of databases, explains structured query language and clauses, defines MySQL databases, and describes database normalization.

Uploaded by

nitish.patil
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

School of Data Science Introduction to Databases

Study Material

Bachelor in
Data Science

Subject
Introduction to Databases

Faculty
Nitish Patil

School of Data Science


Asian School of Media Studies

Page 1
School of Data Science Introduction to Databases

A. COURSE DESCRIPTION

This course is a semester long, project based curriculum based on SQL in Data Sceince that
develops proficient skills in the field of Data Analytics with the use of Databases like
MySQL,Oracle. Each Database project has a research and development process from project
planning to final outcome as ready for client delivery. Students will gain real world project
experience throughout their learning cycle that help them to better understand the roles
and processes in wide range of Data Science careers.
B. LEARNING OBJECTIVES

Students will able to –

 Identify the application of Databases

 Recognize the tools & techniques for databases

 Develop Database queries,Joining tables

 Demonstrate his/her skills with Data analytics

C. LEARNING OUTCOME

At the end of this course participant will be able to –


1. Identify the database queries and solve database related problems.
2. Implement structured query language to solve data analytics problems.
3. Creation of database and to use databases.
4. Able to insert, update and delete values in tables.
5. Able to join tables, solve subqueries which are required for present real life scenarios.
D. Projects

1. Employee Database
2. Banking Database
3. University Database

E. LEARNING RESOURCE MATERIAL

Online References: SQL in Data Science -

a. https://dev.mysql.com/doc/refman/8.0/en/tutorial.html

b. https://www.w3schools.com/mysql/default.asp

Page 2
School of Data Science Introduction to Databases

Unit-1: Introduction to Database

Database
A database is an organized collection of structured information, or data, typically stored electronically in
a computer system. A database is usually controlled by a database management system (DBMS).
Together, the data and the DBMS, along with the applications that are associated with them, are referred
to as a database system, often shortened to just database.
Data within the most common types of databases in operation today is typically modeled in rows and
columns in a series of tables to make processing and data querying efficient. The data can then be easily
accessed, managed, modified, updated, controlled, and organized. Most databases use structured query
language (SQL) for writing and querying data.

What’s the difference between a database and a spreadsheet?


Databases and spreadsheets (such as Microsoft Excel) are both convenient ways to store information. The
primary differences between the two are:

How the data is stored and manipulated


Who can access the data
How much data can be stored
Spreadsheets were originally designed for one user, and their characteristics reflect that. They’re great for
a single user or small number of users who don’t need to do a lot of incredibly complicated data
manipulation. Databases, on the other hand, are designed to hold much larger collections of organized
information—massive amounts, sometimes. Databases allow multiple users at the same time to quickly
and securely access and query the data using highly complex logic and language.

Types of databases
There are many different types of databases. The best database for a specific organization depends on
how the organization intends to use the data.

Relational databases
 Relational databases became dominant in the 1980s. Items in a relational database are organized as a
set of tables with columns and rows. Relational database technology provides the most efficient and
flexible way to access structured information.

Object-oriented databases
 Information in an object-oriented database is represented in the form of objects, as in object-oriented
programming.

Page 3
School of Data Science Introduction to Databases

Distributed databases
 A distributed database consists of two or more files located in different sites. The database may be
stored on multiple computers, located in the same physical location, or scattered over different
networks.

Data warehouses
 A central repository for data, a data warehouse is a type of database specifically designed for fast
query and analysis.

NoSQL databases
 A NoSQL, or nonrelational database, allows unstructured and semistructured data to be stored and
manipulated (in contrast to a relational database, which defines how all data inserted into the database
must be composed). NoSQL databases grew popular as web applications became more common and
more complex.

Graph databases
 A graph database stores data in terms of entities and the relationships between entities.
 OLTP databases. An OLTP database is a speedy, analytic database designed for large numbers of
transactions performed by multiple users.
These are only a few of the several dozen types of databases in use today. Other, less common databases
are tailored to very specific scientific, financial, or other functions. In addition to the different database
types, changes in technology development approaches and dramatic advances such as the cloud and
automation are propelling databases in entirely new directions. Some of the latest databases include

Open source databases


 An open source database system is one whose source code is open source; such databases could be
SQL or NoSQL databases.

Cloud databases
 A cloud database is a collection of data, either structured or unstructured, that resides on a private,
public, or hybrid cloud computing platform. There are two types of cloud database models: traditional
and database as a service (DBaaS). With DBaaS, administrative tasks and maintenance are performed
by a service provider.

Multimodel database
 Multimodel databases combine different types of database models into a single, integrated back end.
This means they can accommodate various data types.

Document/JSON database
 Designed for storing, retrieving, and managing document-oriented information, document
databases are a modern way to store data in JSON format rather than rows and columns.

Page 4
School of Data Science Introduction to Databases

Self-driving databases
 The newest and most groundbreaking type of database, self-driving databases (also known as
autonomous databases) are cloud-based and use machine learning to automate database tuning,
security, backups, updates, and other routine management tasks traditionally performed by database
administrators.

What is Structured Query Language (SQL)?


SQL is a programming language used by nearly all relational databases to query, manipulate, and define
data, and to provide access control. SQL was first developed at IBM in the 1970s with Oracle as a major
contributor, which led to implementation of the SQL ANSI standard, SQL has spurred many extensions
from companies such as IBM, Oracle, and Microsoft. Although SQL is still widely used today, new
programming languages are beginning to appear.

What are clauses in database?

Clauses are in-built functions available to us in SQL. With the help of clauses, we can deal with data
easily stored in the table. Clauses help us filter and analyze data quickly. When we have large amounts of
data stored in the database, we use Clauses to query and get data required by the user.

What is a MySQL database?


MySQL is an open source relational database management system based on SQL. It was designed and
optimized for web applications and can run on any platform. As new and different requirements emerged
with the internet, MySQL became the platform of choice for web developers and web-based applications.
Because it’s designed to process millions of queries and thousands of transactions, MySQL is a popular
choice for ecommerce businesses that need to manage multiple money transfers. On-demand flexibility is
the primary feature of MySQL.
MySQL is the DBMS behind some of the top websites and web-based applications in the world, including
Airbnb, Uber, LinkedIn, Facebook, Twitter, and YouTube.

Page 5
School of Data Science Introduction to Databases

Normalization
Normalization is a database design technique that reduces data redundancy and eliminates undesirable
characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables
into smaller tables and links them using relationships. The purpose of Normalisation in SQL is to
eliminate redundant (repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal
Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.

Create Table

The CREATE TABLE statement is used to create a new table in a database.

Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);

The column parameters specify the names of the columns of the table.

The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date,
etc.).

CREATE TABLE Example

The following example creates a table called "Persons" that contains five columns: PersonID,
LastName, FirstName, Address, and City:

Example
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);

MySQL INSERT INTO SELECT Statement

Page 6
School of Data Science Introduction to Databases

The INSERT INTO SELECT statement copies data from one table and inserts it into another
table.

The INSERT INTO SELECT statement requires that the data types in source and target tables
matches.

Note: The existing records in the target table are unaffected.

INSERT INTO SELECT Syntax

Copy all columns from one table to another table:

INSERT INTO table2


SELECT * FROM table1
WHERE condition;

Copy only some columns from one table into another table:

INSERT INTO table2 (column1, column2, column3, ...)


SELECT column1, column2, column3, ...
FROM table1
WHERE condition;

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Maria Anders Obere Str. 57 Berlin 12209 Germany


Futterkiste

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222

Page 7
School of Data Science Introduction to Databases

3 Antonio Moreno Antonio Mataderos México 05023 Mexico


Taquería Moreno 2312 D.F.

"Suppliers" table:

SupplierID SupplierName ContactName Address City Postal Country


Code

1 Exotic Liquid Charlotte 49 Gilbert Londona EC1 4SD UK


Cooper St.

2 New Orleans Cajun Shelley Burke P.O. Box New 70117 USA
Delights 78934 Orleans

3 Grandma Kelly's Regina Murphy 707 Oxford Ann Arbor 48104 USA
Homestead Rd.

INSERT INTO SELECT Examples

The following SQL statement copies "Suppliers" into "Customers" (the columns that are not
filled with data, will contain NULL):

Example
INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country FROM Suppliers;

The following SQL statement copies "Suppliers" into "Customers" (fill all columns):

Example
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
SELECT SupplierName, ContactName, Address, City, PostalCode, Country FROM Suppliers;

Page 8
School of Data Science Introduction to Databases

The following SQL statement copies only the German suppliers into "Customers":

Example
INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country FROM Suppliers
WHERE Country='Germany';

CREATE DATABASE Statement

The CREATE DATABASE statement is used to create a new SQL database.

Syntax
CREATE DATABASE databasename;

CREATE DATABASE Example

The following SQL statement creates a database called "testDB":

Example
CREATE DATABASE testDB;

WHERE Clause

The WHERE clause is used to filter records.

It is used to extract only those records that fulfill a specified condition.

WHERE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Note: The WHERE clause is not only used in SELECT statements, it is also used
in UPDATE, DELETE, etc.!

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

Page 9
School of Data Science Introduction to Databases

CustomerID CustomerName ContactName Address City PostalCod Cou


e

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germ

2 Ana Trujillo Emparedados Ana Trujillo Avda. de la México 05021 Mex


y helados Constitución 2222 D.F.

3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México 05023 Mex
D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

5 Berglunds snabbköp Christina Berguvsvägen 8 Luleå S-958 22 Swe


Berglund

SQL statement selects all the customers from "Mexico":

Example
SELECT * FROM Customers
WHERE Country = 'Mexico';

SQL requires single quotes around text values (most database systems will also allow double quotes).

However, numeric fields should not be enclosed in quotes:

Example
SELECT * FROM Customers
WHERE CustomerID = 1;

Page
10
School of Data Science Introduction to Databases

Logical Operators
AND, OR and NOT Operators
The WHERE clause can be combined with AND, OR, and NOT operators.

The AND and OR operators are used to filter records based on more than one condition:

 The AND operator displays a record if all the conditions separated by AND are TRUE.
 The OR operator displays a record if any of the conditions separated by OR is TRUE.

The NOT operator displays a record if the condition(s) is NOT TRUE.

AND Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;

OR Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;

NOT Syntax
SELECT column1, column2, ...
FROM table_name
WHERE NOT condition;

ORDER BY Keyword
The ORDER BY keyword is used to sort the result-set in ascending or descending order.

The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.

ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC

Page
11
School of Data Science Introduction to Databases

LIMIT Clause

The LIMIT clause is used to specify the number of records to return.

The LIMIT clause is useful on large tables with thousands of records. Returning a large number
of records can impact performance.

LIMIT Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;

Page
12
School of Data Science Introduction to Databases

Unit 2: Introduction to functions in SQL


Character Functions
Single-row character functions accept character data as input and can return both character and
numeric values. Character functions can be divided into the following:
• Case-manipulation functions
• Character-manipulation functions

LOWER(column|expression) Converts alpha character values to lowercase


UPPER(column|expression) Converts alpha character values to uppercase

CONCAT(column1|expression1
,
column2|expression2)
Concatenates the first character value to the second character
value; equivalent to concatenation operator (||)
SUBSTR(column|expression,m
[,n])
Returns specified characters from character value starting at
character position m, n characters long (If m is negative, the
count starts from the end of the character value. If n is
omitted, all characters to the end of the string are returned.)

Page
13
School of Data Science Introduction to Databases

TRIM() Function

TRIM is a String function of Oracle. This function is used to remove the specified character from
head of the string or tail of the string.

Syntax

1. TRIM( [ [ LEADING | TRAILING | BOTH ] trim_character FROM ] string1 )

Parameters

LEADING : it will trim from head of the string.

TRAILING: it will trim from tail of the string.

BOTH : it will trim from head as well as from tail of the string

LENGTH() Function

LENGTH is a String function of Oracle. This function returns the size of the given string.

Syntax

1. LENGTH( string1 )

Parameters

string1: string for getting the length.

Return

This function returns a numeric value.

Example 1

Select length('oracle') from dual;

Boolean Expressions in SQL


Boolean expressions are that expression that returns boolean datatype as result. In SQL there are three
values for boolean datatype, those are:
 TRUE
 FALSE
 UNKNOWN

Page
14
School of Data Science Introduction to Databases

The boolean data type can not be specified during table creation, unlike other data types. Boolean
expressions are mainly used with WHERE clauses to filter the data from a table. It can include
comparison operators and other operators like ‘AND’ operator, ‘OR’ operator, etc.

CONCAT Function
The CONCAT function in SQL is a String function, which is used to merge two or more strings. The Concat
service converts the Null values to an Empty string when we display the result. This function is used to
concatenate two strings to make a single string. The operator is used to link character strings and column
string.

We can use a literal in CONCAT Function. A literal is a number, character, or date that includes the
SELECT statement.

Syntax of CONCAT function:


1. SELECT CONCAT (String 1, String 2, String3.., String N)
2. FROM [Source]

Example-
1. SELECT CONCAT (id , name , work_date )
2. ->FROM employee_ tbl;
3. CONCAT(id, name, work_date)

Stringfunctions:
String Fucntions are used to perform an operation on input string and return an output string.
Following are the string functions defined in SQL:
1. ASCII(): This function is used to find the ASCII value of a character.
2. Syntax: SELECT ascii('t');
Output: 116
3. CHAR_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is
used to find the length of a word.
4. Syntax: SELECT char_length('Hello!');
Output: 6
5. CHARACTER_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This
function is used to find the length of a line.
6. Syntax: SELECT CHARACTER_LENGTH('geeks for geeks');
Output: 15
7. CONCAT(): This function is used to add two words or strings.
8. Syntax: SELECT 'Geeks' || ' ' || 'forGeeks' FROM dual;
Output: ‘GeeksforGeeks’
9. CONCAT_WS(): This function is used to add two words or strings with a symbol as concatenating
symbol.
10. Syntax: SELECT CONCAT_WS('_', 'geeks', 'for', 'geeks');

Page
15
School of Data Science Introduction to Databases

Output: geeks_for_geeks
11. FIND_IN_SET(): This function is used to find a symbol from a set of symbols.
12. Syntax: SELECT FIND_IN_SET('b', 'a, b, c, d, e, f');
Output: 2
13. FORMAT(): This function is used to display a number in the given format.
14. Syntax: Format("0.981", "Percent");
Output: ‘98.10%’
15. INSERT(): This function is used to insert the data into a database.
16. Syntax: INSERT INTO database (geek_id, geek_name) VALUES (5000, 'abc');
Output: successfully updated
17. INSTR(): This function is used to find the occurrence of an alphabet.
18. Syntax: INSTR('geeks for geeks', 'e');
Output: 2 (the first occurrence of ‘e’)
Syntax: INSTR('geeks for geeks', 'e', 1, 2 );
Output: 3 (the second occurrence of ‘e’)
19. LCASE(): This function is used to convert the given string into lower case.
20. Syntax: LCASE ("GeeksFor Geeks To Learn");
Output: geeksforgeeks to learn
21. LEFT(): This function is used to SELECT a sub string from the left of given size or characters.
22. Syntax: SELECT LEFT('geeksforgeeks.org', 5);
Output: geeks
23. LENGTH(): This function is used to find the length of a word.
24. Syntax: LENGTH('GeeksForGeeks');
Output: 13
25. LOCATE(): This function is used to find the nth position of the given word in a string.
26. Syntax: SELECT LOCATE('for', 'geeksforgeeks', 1);
Output: 6
27. LOWER(): This function is used to convert the upper case string into lower case.
28. Syntax: SELECT LOWER('GEEKSFORGEEKS.ORG');
Output: geeksforgeeks.org
29. LPAD(): This function is used to make the given string of the given size by adding the given
symbol.
30. Syntax: LPAD('geeks', 8, '0');
31. Output:
000geeks
32. LTRIM(): This function is used to cut the given sub string from the original string.
33. Syntax: LTRIM('123123geeks', '123');
Output: geeks
34. MID(): This function is to find a word from the given position and of the given size.
35. Syntax: Mid ("geeksforgeeks", 6, 2);
Output: for
36. POSITION(): This function is used to find position of the first occurrence of the given alphabet.
37. Syntax: SELECT POSITION('e' IN 'geeksforgeeks');
Output: 2
38. REPEAT(): This function is used to write the given string again and again till the number of times
mentioned.
39. Syntax: SELECT REPEAT('geeks', 2);

Page
16
School of Data Science Introduction to Databases

Output: geeksgeeks
40. REPLACE(): This function is used to cut the given string by removing the given sub string.
41. Syntax: REPLACE('123geeks123', '123');
Output: geeks
42. REVERSE(): This function is used to reverse a string.
43. Syntax: SELECT REVERSE('geeksforgeeks.org');
Output: ‘gro.skeegrofskeeg’
44. RIGHT(): This function is used to SELECT a sub string from the right end of the given size.
45. Syntax: SELECT RIGHT('geeksforgeeks.org', 4);
Output: ‘.org’
46. RPAD(): This function is used to make the given string as long as the given size by adding the
given symbol on the right.
47. Syntax: RPAD('geeks', 8, '0');
Output: ‘geeks000’
48. RTRIM(): This function is used to cut the given sub string from the original string.
49. Syntax: RTRIM('geeksxyxzyyy', 'xyz');
Output: ‘geeks’
50. SPACE(): This function is used to write the given number of spaces.
51. Syntax: SELECT SPACE(7);
Output: ‘ ‘
52. STRCMP(): This function is used to compare 2 strings.
 If string1 and string2 are the same, the STRCMP function will return 0.
 If string1 is smaller than string2, the STRCMP function will return -1.
 If string1 is larger than string2, the STRCMP function will return 1.
53. Syntax: SELECT STRCMP('google.com', 'geeksforgeeks.com');
Output: -1
54. SUBSTR(): This function is used to find a sub string from the a string from the given position.
55. Syntax:SUBSTR('geeksforgeeks', 1, 5);
Output: ‘geeks’
56. SUBSTRING(): This function is used to find an alphabet from the mentioned size and the given
string.
57. Syntax: SELECT SUBSTRING('GeeksForGeeks.org', 9, 1);
Output: ‘G’
58. SUBSTRING_INDEX(): This function is used to find a sub string before the given symbol.
59. Syntax: SELECT SUBSTRING_INDEX('www.geeksforgeeks.org', '.', 1);
Output: ‘www’
60. TRIM(): This function is used to cut the given symbol from the string.
61. Syntax: TRIM(LEADING '0' FROM '000123');
Output: 123
62. UCASE(): This function is used to make the string in upper case.
63. Syntax: UCASE ("GeeksForGeeks");
64. Output:
GEEKSFORGEEKS

Group Functions
• DISTINCT makes the function consider only nonduplicate values; ALL makes it consider every
value including duplicates. The default is ALL and therefore does not need to be specified.

Page
17
School of Data Science Introduction to Databases

• The data types for the functions with an expr argument may be CHAR, VARCHAR2, NUMBER,
or DATE.
• All group functions ignore null values. To substitute a value for null values, use the NVL, NVL2,
or COALESCE functions.
• The Oracle server implicitly sorts the result set in ascending order when using a GROUP BY
clause. To override this default ordering, DESC can be used in an ORDER BY clause.

SELECT AVG(salary), MAX(salary),


MIN(salary), SUM(salary)
FROM employees
WHERE job_id LIKE ’%REP%’;

AVG, SUM, MIN, and MAX functions against columns that can store numeric data. The
example on the slide displays the average, highest, lowest, and sum of monthly salaries for all sales
representatives.

SELECT MIN(hire_date), MAX(hire_date)


FROM employees;

COUNT Function
The COUNT function has three formats:
• COUNT(*)
• COUNT(expr)
• COUNT(DISTINCT expr)
COUNT(*) returns the number of rows in a table that satisfy the criteria of the SELECT statement,
including duplicate rows and rows containing null values in any of the columns. If a WHERE clause is
included in the SELECT statement, COUNT(*) returns the number of rows that satisfies the condition
in the WHERE clause.
In contrast, COUNT(expr) returns the number of non-null values in the column identified by expr.
COUNT(DISTINCT expr) returns the number of unique, non-null values in the column identified
by expr.
SELECT COUNT(*)
FROM employees
WHERE department_id = 50;

Conditional Expressions in SQL


1. The CASE Expression: Let you use IF-THEN-ELSE statements without having to invoke
procedures.
In a simple CASE expression, the SQL searches for the first WHEN……THEN pair for which expr
is equal to comparison_expr and returns return_expr. If above condition is not satisfied, an ELSE
clause exists, the SQL returns else_expr. Otherwise, returns NULL.
We cannot specify literal null for the return_expr and the else_expr. All of the expressions(expr,
comparison_expr, return_expr) must be of the same data type.
Syntax:
2. CASE expr WHEN comparison_expr1 THEN return_expr1
3. [WHEN comparison_expr2 THEN return_expr2
4. .
5. .

Page
18
School of Data Science Introduction to Databases

6. .
7. WHEN comparison_exprn THEN return_exprn
8. ELSE else_expr]
END

DECODE Function : Facilitates conditional inquiries by doing the work of a CASE or IF-THEN-
ELSE statement.
The DECODE function decodes an expression in a way similar to the IF-THEN-ELSE logic used in
various languages. The DECODE function decodes expression after comparing it to each search value.
If the expression is the same as search, result is returned.
If the default value is omitted, a null value is returned where a search value does not match any of the
result values.

Unit 3: Understanding Grouping:


GROUP BY and HAVING Clauses:
GROUP BY Statement
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of
customers in each country".

Page
19
School of Data Science Introduction to Databases

The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to
group the result-set by one or more columns.

GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

GROUP BY Examples
The following SQL statement lists the number of customers in each country:

Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

The following SQL statement lists the number of customers in each country, sorted high to low:

Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;

HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.

HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);

HAVING Examples
The following SQL statement lists the number of customers in each country. Only include countries with more than
5 customers:

Page
20
School of Data Science Introduction to Databases

Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

SQL statement lists the number of customers in each country, sorted high to low (Only include countries with more
than 5 customers):

Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5
ORDER BY COUNT(CustomerID) DESC;

MySQL Aliases
Aliases are used to give a table, or a column in a table, a temporary name.

Aliases are often used to make column names more readable.

An alias only exists for the duration of that query.

An alias is created with the AS keyword.

Alias Column Syntax


SELECT column_name AS alias_name
FROM table_name;

Alias Table Syntax


SELECT column_name(s)
FROM table_name AS alias_name;

SQL statement creates two aliases, one for the CustomerID column and one for the CustomerName column:

Example
SELECT CustomerID AS ID, CustomerName AS Customer
FROM Customers;

Page
21
School of Data Science Introduction to Databases

Subqueries
A subquery is a SELECT statement that is embedded in a clause of another SELECT statement.
You
can build powerful statements out of simple ones by using subqueries. They can be very useful
when
you need to select rows from a table with a condition that depends on the data in the table itself.
You can place the subquery in a number of SQL clauses, including:
• The WHERE clause
• The HAVING clause
• The FROM clause
In the syntax:
operator includes a comparison condition such as >, =, or IN
Note: Comparison conditions fall into two classes: single-row operators (>, =, >=, <, <>, <=)
and
multiple-row operators (IN, ANY, ALL).
The subquery is often referred to as a nested SELECT, sub-SELECT, or inner SELECT
statement. The
subquery generally executes first, and its output is used to complete the query condition for the
main or outer query.

SELECT last_name
FROM employees
WHERE salary >
(SELECT salary
FROM employees
WHERE last_name = ’Abel’);

Multiple-Row Subqueries
• Return more than one row
• Use multiple-row comparison operators
Operator
IN
ANY
ALL
Meaning

Page
22
School of Data Science Introduction to Databases

Equal to any member in the list


Compare value to each value returned by
the subquery
Compare value to every value returned
by the subquery

Subqueries that return more than one row are called multiple-row subqueries. You use a
multiple-row operator, instead of a single-row operator, with a multiple-row subquery. The
multiple-row operator expects one or more values.

SELECT last_name, salary, department_id


FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);

Example
Find the employees who earn the same salary as the minimum salary for each department.
The inner query is executed first, producing a query result. The main query block is then
processed and
uses the values returned by the inner query to complete its search condition. In fact, the main
query would
appear to the Oracle server as follows:
SELECT last_name, salary, department_id
FROM employees
WHERE salary IN (2500, 4200, 4400, 6000, 7000, 8300, 8600, 17000);

SELECT employee_id, last_name, job_id, salary


FROM employees
WHERE salary < ANY
(SELECT salary
FROM employees
WHERE job_id = ’IT_PROG’)
AND job_id <> ’IT_PROG’;

Conditional Expressions Using CASE Clause


he CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else
statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns
the value in the ELSE clause.

If there is no ELSE part and no conditions are true, it returns NULL.

Page
23
School of Data Science Introduction to Databases

CASE Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;

Unit 4: Working with Tables


Joining Tables
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

Let's look at a selection from the "Orders" table:

OrderID CustomerID OrderDate

10308 2 1996-09-18

10309 37 1996-09-19

Page
24
School of Data Science Introduction to Databases

10310 77 1996-09-20

Then, look at a selection from the "Customers" table:

CustomerID CustomerName ContactName Country

1 Alfreds Futterkiste Maria Anders Germany

2 Ana Trujillo Emparedados y helados Ana Trujillo Mexico

3 Antonio Moreno Taquería Antonio Moreno Mexico

Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the "Customers" table. The
relationship between the two tables above is the "CustomerID" column.

Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that have
matching values in both tables:

Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

INNER JOIN Keyword


The INNER JOIN keyword selects records that have matching values in both tables.

Page
25
School of Data Science Introduction to Databases

INNER JOIN Syntax


SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

The following SQL statement selects all orders with customer information:

Example
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

LEFT JOIN Keyword


The LEFT JOIN keyword returns all records from the left table (table1), and the matching records (if any) from the
right table (table2).

LEFT JOIN Syntax


SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

SQL statement will select all customers, and any orders they might have:

Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

RIGHT JOIN Keyword

Page
26
School of Data Science Introduction to Databases

The RIGHT JOIN keyword returns all records from the right table (table2), and the matching records (if any) from
the left table (table1).

RIGHT JOIN Syntax


SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

RIGHT JOIN Example


The following SQL statement will return all employees, and any orders they might have placed:

Example
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;

CROSS JOIN Keyword


The CROSS JOIN keyword returns all records from both tables (table1 and table2).

Page
27
School of Data Science Introduction to Databases

CROSS JOIN Syntax


SELECT column_name(s)
FROM table1
CROSS JOIN table2;

Self Join
A self join is a regular join, but the table is joined with itself.

Self Join Syntax


SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;

Self Join Example


The following SQL statement matches customers that are from the same city:

Example
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;

UNION Operator
The UNION operator is used to combine the result-set of two or more SELECT statements.

 Every SELECT statement within UNION must have the same number of columns
 The columns must also have similar data types
 The columns in every SELECT statement must also be in the same order

UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

UNION ALL Syntax


The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL:

Page
28
School of Data Science Introduction to Databases

SELECT column_name(s) FROM table1


UNION ALL
SELECT column_name(s) FROM table2;

SQL UNION Example


The following SQL statement returns the cities (only distinct values) from both the "Customers" and the "Suppliers"
table:

Example
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

Note: If some customers or suppliers have the same city, each city will only be listed once, because UNION selects
only distinct values. Use UNION ALL to also select duplicate values!

SQL UNION ALL Example


The following SQL statement returns the cities (duplicate values also) from both the "Customers" and the
"Suppliers" table:

Example
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers
ORDER BY City;

SQL UNION With WHERE


The following SQL statement returns the German cities (only distinct values) from both the "Customers" and the
"Suppliers" table:

Example
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION

Page
29
School of Data Science Introduction to Databases

SELECT City, Country FROM Suppliers


WHERE Country='Germany'
ORDER BY City;

SQL UNION ALL With WHERE


The following SQL statement returns the German cities (duplicate values also) from both the "Customers" and the
"Suppliers" table:

Example
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;

Another UNION Example


The following SQL statement lists all customers and suppliers:

Example
SELECT 'Customer' AS Type, ContactName, City, Country
FROM Customers
UNION
SELECT 'Supplier', ContactName, City, Country
FROM Suppliers;

CREATE VIEW Statement


In SQL, a view is a virtual table based on the result-set of an SQL statement.

A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables
in the database.

You can add SQL statements and functions to a view and present the data as if the data were coming from one single
table.

A view is created with the CREATE VIEW statement.

Page
30
School of Data Science Introduction to Databases

CREATE VIEW Syntax


CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Note: A view always shows up-to-date data! The database engine recreates the view, every time a user queries it.

MySQL CREATE VIEW Examples


The following SQL creates a view that shows all customers from Brazil:

Example
CREATE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName
FROM Customers
WHERE Country = 'Brazil';

We can query the view above as follows:

Example
SELECT * FROM [Brazil Customers];

The following SQL creates a view that selects every product in the "Products" table with a price higher than the
average price:

Example
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);

We can query the view above as follows:

Example
SELECT * FROM [Products Above Average Price];

Updating a View

Page
31
School of Data Science Introduction to Databases

A view can be updated with the CREATE OR REPLACE VIEW statement.

CREATE OR REPLACE VIEW Syntax


CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

The following SQL adds the "City" column to the "Brazil Customers" view:

Example
CREATE OR REPLACE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName, City
FROM Customers
WHERE Country = 'Brazil';

MySQL Dropping a View


A view is deleted with the DROP VIEW statement.

DROP VIEW Syntax


DROP VIEW view_name;

The following SQL drops the "Brazil Customers" view:

Example
DROP VIEW [Brazil Customers];

Page
32
School of Data Science Introduction to Databases

Unit 5: Data Analytics using SQL


Window functions applies aggregate and ranking functions over a particular window (set of rows).
OVER clause is used with window functions to define that window. OVER clause does two things :
 Partitions rows into form set of rows. (PARTITION BY clause is used)
 Orders rows within those partitions into a particular order. (ORDER BY clause is used)
Note: If partitions aren’t done, then ORDER BY orders all rows of table.
Syntax:
SELECT coulmn_name1,
window_function(cloumn_name2)
OVER([PARTITION BY column_name1] [ORDER BY column_name3]) AS new_column
FROM table_name;

window_function= any aggregate or ranking function


column_name1= column to be selected
coulmn_name2= column on which window function is to be applied
column_name3= column on whose basis partition of rows is to be done
new_column= Name of new column
table_name= Name of table
Aggregate Window Function :
Various aggregate functions such as SUM(), COUNT(), AVERAGE(), MAX(), MIN() applied over a
particular window (set of rows) are called aggregate window functions.
Example –
Find average salary of employees for each department and order employees within a department by
age.
SELECT Name, Age, Department, Salary,
AVERAGE(Salary) OVER( PARTITION BY Department) AS Avg_Salary
FROM employee
Ranking Window Functions :
Ranking functions are, RANK(), DENSE_RANK(), ROW_NUMBER()
 RANK() –
As the name suggests, the rank function assigns rank to all the rows within every partition. Rank is
assigned such that rank 1 given to the first row and rows having same value are assigned same rank.
For the next rank after two same rank values, one rank value will be skipped.

Page
33
School of Data Science Introduction to Databases

 DENSE_RANK() –
It assigns rank to each row within partition. Just like rank function first row is assigned rank 1 and
rows having same value have same rank. The difference between RANK() and DENSE_RANK() is
that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank
is skipped.

 ROW_NUMBER() –
It assigns consecutive integers to all the rows within partition. Within a partition, no two rows can
have same row number.

Note –
ORDER BY() should be specified compulsorily while using rank window functions.
Example –
Calculate row no., rank, dense rank of employees is employee table according to salary within each
department.
SELECT
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC)
AS emp_row_no, Name, Department, Salary,
RANK() OVER(PARTITION BY Department
ORDER BY Salary DESC) AS emp_rank,
DENSE_RANK() OVER(PARTITION BY Department
ORDER BY Salary DESC)
AS emp_dense_rank,
FROM employee
The output of above query will be :

emp_row_n Departmen
o Name t Salary emp_rank emp_dense_rank

1 Suresh Finance 50, 000 1 1

2 Ramesh Finance 50, 000 1 1

3 Ram Finance 20, 000 3 2

1 Deep Sales 30, 000 1 1

2 Pradeep Sales 20, 000 2 2

Page
34
School of Data Science Introduction to Databases

So, we can see that as mentioned in the definition of ROW_NUMBER() the row numbers are
consecutive integers within each partition. Also, we can see difference between rank and dense rank
that in dense rank there is no gap between rank values while there is gap in rank values after repeated
rank

Using PARTITION BY to Define a Window Frame


SQL window functions perform calculations based on a set of records. For example, you might want
to calculate the average salary of a specific group of employee records. This group of records is
called the window frame, and its definition is central to understanding how window functions work
and how we can take advantage of them.

The window frame is a set of rows related to the current row where the window function is used for
calculation. The window frame can be a different set of rows for the next row in the query result,
since it depends on the current row being processed. Every row in the result set of the query has its
own window frame.

In the rest of this article, we will show example queries based on a database of a car dealership
group. The group stores the sales information grouped by month in a table called monthly_car_sales.
Below is the table with some sample data:

monthly_car_sales

qua
ye mo ma mo typ reve
ntit
ar nth ke del e nue
y

Pic
20 For F10 2500
01 kU 40
21 d 0 000
p

20 For Mus 1010


01 Car 9
21 d tang 000

20 Ren Fue 9000


01 Car 20
21 ault go 000

20 Ren Fue 2300


02 Car 50
21 ault go 0000

20 02 For F10 Pic 20 1200


21 kU

Page
35
School of Data Science Introduction to Databases

qua
ye mo ma mo typ reve
ntit
ar nth ke del e nue
y

d 0 p 000

20 For Mus 1050


02 Car 10
21 d tang 000

20 Ren Meg 2000


03 Car 50
21 ault ane 0000

20 Ren Kol 1004


03 Car 15
21 ault eos 000

20 For Mus 2080


03 Car 20
21 d tang 000

20 Ren Meg 2000


04 Car 50
21 ault ane 0000

20 Ren Kol 1004


04 Car 15
21 ault eos 000

20 For Mus 2520


04 Car 25
21 d tang 000

A simple way to create a window frame is by using an OVER clause with a PARTITION BY subclause.
In the following SQL example, we generate a report of revenue by make of the car for the year 2021.

SELECT make,

SUM(revenue) OVER (PARTITION BY make) AS total_revenue

FROM monthly_car_sales

WHERE year = 2021

Page
36
School of Data Science Introduction to Databases

Given an ordered set of rows, FIRST_VALUE returns the value of the specified expression
with respect to the first row in the window frame. The LAST_VALUE function returns the
value of the expression with respect to the last row in the frame.

Syntax

FIRST_VALUE | LAST_VALUE

( expression [ IGNORE NULLS | RESPECT NULLS ] ) OVER

[ PARTITION BY expr_list ]

[ ORDER BY order_list frame_clause ]

LEAD and LAG functions


LEAD and LAG functions were first introduced in SQL Server 2012. They are window
functions.
The LEAD function is used to access data from SUBSEQUENT rows along with data from
the current row.
The LAG function is used to access data from PREVIOUS rows along with data from the
current row.
An ORDER BY clause is required when working with LEAD and LAG functions, but a
PARTITION BY clause is optional.
Now, let’s look at some examples. Imagine a manager asks an analyst to run a query of all the
order quantity values, along with another column for each preceding order quantity value.
That query would look like this:

1
2
3
4
5
6

Page
37
School of Data Science Introduction to Databases

SELECT
[OrderQty],
LAG([OrderQty]) OVER(ORDER BY[OrderQty] DESC) AS[Lag "OrderQty"
Column]
FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
In the example above, the first column in the SQL query is the order quantity column, the
second column makes use of the LAG function, acting on the order quantity column. The
OVER() clause is then applied (because LAG and LEAD are window functions), wherein the
new column being formed - the [Lag “OrderQty” Column] - is ordered by order quantity in
descending order. The entire query is also ordered by the order quantity in descending order.
The result of this query will appear like this:

In the image above, in the Lag “Order Qty” column, the first value is NULL. This is because
the LAG function is supposed to return the last value from the current row. Because the first
value in the OrderQty column is 44, there is no value for the LAG function to return, hence
the NULL value in the first cell of the second column.
In the third row in the image above, the values for each of the columns are 40 and 41. The
value 41 in the second column is a previous value of the first column, brought forward. This is
how the LAG function works.
For the LEAD function, imagine a manager asks the data analyst to produce a query showing
all the order quantity values, along with each following order quantity value. The query for
that would look like this:

1
2
3
4
5
6

Page
38
School of Data Science Introduction to Databases

SELECT
[OrderQty],
LEAD([OrderQty]) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty"
Column]
FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
The results of the query are similar to the previous one, with the exception of the LEAD
function being exchanged with the LAG function.
The result of the query above would appear like this:

In the image above, there’s no NULL value appearing in the results set, but there is a
difference. In row one, the value 41, in the Lead “OrderQty” column is alongside the value 44
in the OrderQty column. The value 41 is brought forward by the LEAD function from the
second row where 41 resides in the OrderQty column. This is how the LEAD function works.

It is also possible to LEAD or LAG by a specific number of rows. What this means is that,
with the LEAD function, for example, I can specify to start bringing forward values starting
from after the next N rows. Let’s look at another example.

1
2
3
4
5
6
SELECT
[OrderQty],
LEAD([OrderQty], 2) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty"
Column]

Page
39
School of Data Science Introduction to Databases

FROM[Sales]. [SalesOrderDetail]
ORDER BY[OrderQty] DESC
In the query above, the LEAD function is being used to bring values forward, not from the
current row, but starting from after the next two rows. The result of this query will be this:

In the image above, both rows one and two of the second column contain the value 40. This is
because in column one (OrderQty) the third and fourth values are 40 and 40. This is how
offsetting the values in the LAG and LEAD function work.

ROLLUP

ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a


specified group of dimensions. It also calculates a grand total. ROLLUP is a simple extension to
the GROUP BY clause, so its syntax is extremely easy to use. The ROLLUP extension is highly
efficient, adding minimal overhead to a query.

Syntax

ROLLUP appears in the GROUP BY clause in a SELECT statement. Its form is:

SELECT ... GROUP BY


ROLLUP(grouping_column_reference_list)
Details

ROLLUP's action is straightforward: it creates subtotals which "roll up" from the most detailed
level to a grand total, following a grouping list specified in the ROLLUP clause. ROLLUP takes

Page
40
School of Data Science Introduction to Databases

as its argument an ordered list of grouping columns. First, it calculates the standard aggregate
values specified in the GROUP BY clause. Then, it creates progressively higher-level subtotals,
moving from right to left through the list of grouping columns. Finally, it creates a grand total.

ROLLUP will create subtotals at n+1 levels, where n is the number of grouping columns. For
instance, if a query specifies ROLLUP on grouping columns of Time, Region, and Department
( n=3), the result set will include rows at four aggregation levels.

Example

This example of ROLLUP uses the data in the video store database.

SELECT Time, Region, Department,


sum(Profit) AS Profit FROM sales
GROUP BY ROLLUP(Time, Region, Dept)

CUBE

Note that the subtotals created by ROLLUP are only a fraction of possible subtotal
combinations.the departmental totals across regions (279,000 and 319,000) would not be
calculated by a ROLLUP(Time, Region, Department) clause. To generate those numbers would
require a ROLLUP clause with the grouping columns specified in a different
order: ROLLUP(Time, Department, Region). The easiest way to generate the full set of subtotals
needed for cross-tabular reports such as those needed for is to use the CUBE extension.

CUBE enables a SELECT statement to calculate subtotals for all possible combinations of a
group of dimensions. It also calculates a grand total. This is the set of information typically
needed for all cross-tabular reports, so CUBE can calculate a cross-tabular report with a
single SELECT statement. Like ROLLUP, CUBE is a simple extension to
the GROUP BY clause, and its syntax is also easy to learn.

Syntax

CUBE appears in the GROUP BY clause in a SELECT statement. Its form is:

SELECT ... GROUP BY


CUBE (grouping_column_reference_list)

Page
41

You might also like