About handling databases with multiple tables (actual concept of relational databases).
This is
the point where redundancy starts becoming an important factor, as we start expanding our
database, making it more complicated.
Tables must be efficiently related to minimize or eradicate redundancy. The relationship
between entities must be carefully analyzed and formatted in order to make the best possible
database structure.
.tables (SQLite keyword to list all tables)
Relationships
1. One-to-one
2. One-to-many
3. Many-to-many
Entity Relationship Diagram (ER Diagram) - one of the tools used to visualize these kind of
relationships. Rules for making it:
1. Zero - doesn't have to have anything related to it.
2. One - has to have at least one thing that relates to it in some other table.
3. Many - many (multiple) can be related to it in some other table.
A book has to be written by at least one author, but it can be written by multiple as well.
An author must've at least written one book but they can write multiple.
What does || mean??
Keys - "a fundamental idea in databases." They help relate one table to another.
Existence of an unique identifier for entities (such as ISBN for books) can be utilized as the
primary key of a table.
1. Primary Key - assigned to the field in which all entries are unique.
2. Foreign Key - takes primary key from one table and includes it in column of some other?
Solves one-to-many relationship.
ISBN, as a number, actually consumes more than desirable data for a single entry (about 17
bytes/ISBN, which is a lot of space). It can be fixed by having our own primary key (preferably
starting from 1).
Many-to-many approach??
We use some approaches to produce better queries, these include:
a. Subqueries - a querying technique. Puts one SQL query inside another in order to form a
nested query. For e.g. a) query to obtain list of books published by a certain publisher (one-to-
many)
-- To obtain publisher id of Fitzcarraldo Editions (5)
SELECT "id" FROM "publishers"
WHERE "publisher" = 'Fitzcarraldo Editions';
-- To obtain list of books with publisher_id as 5
SELECT "title" FROM "books"
WHERE "publisher_id" = 5;
-- Nesting both
SELECT "title" FROM "books"
WHERE "publisher_id" = (
SELECT "id" FROM "publishers"
WHERE "publisher" = 'Fitzcarraldo Editions'
);
-- Note: the query furthest inside the paranthesis runs first
-- Another example to find the average rating of a particular book
SELECT AVG("rating") FROM "ratings"
WHERE "book_id" = (
SELECT "id" FROM "books"
WHERE "title" = 'In Memory of Memory'
);
b) query to obtain which author(s) wrote a particular book (many-to-many)
-- To obtain id of Flights
SELECT "id" FROM "books"
WHERE "title" = 'Flights';
-- Nesting it within another query to find the author ID of the person who
wrote Flights
SELECT "author_id" FROM "authored"
WHERE "book_id" = (
SELECT "id" FROM "books"
WHERE "title" = 'Flights'
);
--Nesting again to find the name of that author
SELECT "name" FROM "authors"
WHERE "id" = (
SELECT "author_id" FROM "authored"
WHERE "book_id" = (
SELECT "id" FROM "books"
WHERE "title" = 'Flights'
)
);
19. IN - to check whether a key or column or any particular value in particular belongs to a set
of values. For e.g. to find all the books written by an author
SELECT "title" FROM "books"
WHERE "id" IN (
SELECT "book_id" FROM "authored"
WHERE "author_id" = (
SELECT "id" FROM "authors"
WHERE "name" = 'Fernanda Melchor'
)
);
-- We use = when we only care about one particular value, while we use IN when
we care about multiple values.
b. Joins - the idea is to take a table and combine it with some other table. It can be an
alternative to nested queries, where we join the data all in one table.
Two tables could be joined based on a unique common parameter. They keyword for doing is
JOIN. Joins are of multiple types as well.
20. JOIN - used to perform the above function.
21. ON
SELECT * FROM "sea_lions"
JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
Reason for the key column occurring twice and how to avoid
(https://stackoverflow.com/questions/45311834/in-sql-why-is-this-join-returning-the-key-column-
twice)
1. Inner Join - the regular kind of join (at least in SQLite) we just did. It combines tables by
only keeping the data which matches between them (based on a particular column). There
are multiple types of inner join:
a. Natural Join - automatically assumes which column should be the basis of joining based
on common column name. For e.g. the column id in the above example, it would not be
needed to be explicitly mentioned in natural join.
SELECT * FROM "sea_lions"
NATURAL JOIN "migrations";
-- We don't get the duplicate id column in this case!
2. Outer Join - it lets you keep data even if there isn't necessarily a match between the tables
(based on a particular column). Those unmatched values are replaced by empty or null
values. There are multiple types of outer join:
a. Left Join - Prioritizes the data on the "left table" (the first table you start with).
SELECT * FROM "sea_lions"
LEFT JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
b. Right Join - Prioritizes the data on the "right table" (the second table you start with).
SELECT * FROM "sea_lions"
RIGHT JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
c. Full Join - combines left and right join.
SELECT * FROM "sea_lions"
FULL JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
JOIN results in a temporary table or a result set. It can only be used for the duration of the
query, it would not be accessible in the database itself.
c. Sets - results obtained from queries are known as result sets.
22. UNION - used to combine results.
-- Obtains name of all authors and translators
SELECT "name" FROM "authors"
UNION
SELECT "name" FROM "translators";
-- This assumes that all the authors and translators have unique names
-- Modifying the query a bit to distinguish between authors and translators
SELECT 'author' AS "profession", "name" FROM "authors"
UNION
SELECT 'translator' AS "profession", "name" FROM "translators";
23. INTERSECT - to get common results.
-- Obtains names of those who are both authors and translators
SELECT "name" FROM "authors"
INTERSECT
SELECT "name" FROM "translators";
24. EXCEPT - to exclude certain entries from result set.
-- Obtains names of only those who're exclusively authors
SELECT "name" FROM "authors"
EXCEPT
SELECT "name" FROM "translators";
Example of a more complex query.
SELECT "title" FROM "books"
WHERE "id" = (
SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" FROM "translators"
WHERE "name" = 'Sophie Hughes'
)
INTERSECT
SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" FROM "translators"
WHERE "name" = 'Margaret Jull Cosca'
)
);
Operations like UNION , INTERSECT , or EXCEPT could be nested among themselves, you just
always have to have the same (number?) column names and the same type of columns when
performing an operation among two tables.
d. Groups -
25. GROUP BY - allows us to take a specific column, collapse some rows (based on a specified
column), and find its aggregate statistics (as specified) across each of those groups.
-- Obtains average rating of each book
SELECT "book_id", AVG("rating") AS "average rating"
FROM "ratings"
GROUP BY "book_id";
-- Modifying a bit
SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0;
-- Note that SQL uses different keyword for conditioning on rows (WHERE) and a
different keyword for conditioning on groups (HAVING)
-- The names of the books can also be displayed by using JOIN
-- Obtains the number of reviews each book has
SELECT "book_id", COUNT("rating") FROM "ratings"
GROUP BY "book_id";
-- Combining it with ORDER BY to sort the data
SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0
ORDER BY "average rating" DESC;