Learn SQL
Learn SQL
Table of Contents
Basic SQL
Mid-Level SQL
WHERE
Operators
Aggregate Functions
GROUP BY
JOIN Relationships and JOINing Tables
DATE and TIME Functions
Mid Level SQL Practice Grounds
Extras
I think it’s best to just dive right in, but it’s going to be incredibly beneficial to go over just
a few quick concepts first (trust me, we get to running your first SQL on the very next
page).
SQL
SQL might seem intimidating but it’s really fairly easy to understand. SQL stands for
Structured Query Language and simply put, it’s a search language for you to instruct a
database about what information you’d like retrieved from it.
Just think of it as an advanced, really structured google search. For example in Google you
might ask something like
And in SQL, if you had a database with that information in it, the equivalent question
might be answered with something like
Don’t worry about understanding the above query yet, you’ll get that in no time.
In this tutorial we’ll be using an example data set that has a bunch of information on
tracks, albums and artists in a music collection. Most databases architects will typically
split those items into their own tables rather than group them all in one. You’ll learn all
about linking them together when we get to the section on Table Relationships and JOINs.
After you run a query, a table with the results of your query will show up below it. All of
the queries you run on this tutorial are being executed against a real PostgreSQL.
SQLBoxes that have a quiz to them will have a checkbox to their left. Once the answer is
correct, the checkbox will be checked! Some of these will have a Hint you can view, and if
you ever get really stuck feel free to email us for some help.
Let’s start with the most simple query and just select a value back. The SQL Box is there
for you to try running your SQL in. You can put any SQL you want in there, and don’t be
afraid as you’re not going to break anything. Try whatever you’d like. Experimentation is
the best way to learn! Here’s an example query we’ll start with:
SELECT 42;
So first, try typing this statement in the SQLBox below and hit “Run SQL” to return the
number 42.
Awesome, you just returned a number. There are other things you can fetch besides
numbers like “strings” of characters. Try running this one or choosing your own string to
return.
or
Note that each query needs to end in a semi-colon. That’s just how the database knows
that you’re done giving it instructions. The SQL Box isn’t picky about it so you can get
away without using it, but other tools you use may be a bit more strict.
Math
While we’re playing with numbers we can point out that SQL can also instruct a database
to do some math on a result. Try out some queries like these:
SELECT 2 + 3;
SELECT 5 * 12;
SELECT 164 / 8;
And we’ll get more into DATE and TIME queries later but here’s a quick example of
selecting a date, which besides numbers and strings is another common data type category
in PostgreSQL.
Here we just SELECTed data that we typed in ourselves. Obviously SQL would be quite
useless if that’s all it did, but next we’ll cover how you can choose where to SELECT data
FROM!
FROM
So now you know how to SELECT data but not yet how to choose where to get that data
FROM. Let’s get into the real stuff and SELECT data FROM a specific table.
In our example database we have a table called albums, which holds info on some music
albums. It has three columns, id, title, artist_id. Here’s what it looks like in Excel:
To get data that’s in this table we need to specify what columns we want to SELECT and
FROM where we want to select it. So let’s try to get a list of all the album titles we’ve got
stored. We can use the following template to do so:
SELECT [stuff you want to select] FROM [the table that it is in];
Let’s start with a simple one and query for everything (all of it!) from the albums table.
Look at all that data! Notice at the bottom of the table we’ve paginated it for you so it
doesn’t take up the whole page. All of the columns and rows in the table albums have been
fetched. You can see the table above looks similar to what the data looks like in the Excel
image above.
But of course we don’t have to query for all of the columns if we don’t want. If we wanted
to just get all the album titles and ids (we didn’t care about artist_ids) we can query for
just those columns.
Notice that the columns will come back in the order that you list them in. Try reversing the
column order in the above query by selecting id first and then title.
* Splat
Sometimes it’s annoying to have to list out all the columns that you want to fetch. If you
simply want all the columns available SQL has the * shortcut. The * is called a “splat” and
is a handy, frequently used shortcut to get all columns.
There are a lot of other Tables in our example database like artists and tracks. See if you
can use the SELECT * FROM [tablename] structure to explore some of those tables.
Now it’s getting interesting right? Right now though we’re getting a list of all the results in
the table back. We need to learn how to filter, group, manipulate and limit these results.
ORDER BY
By default results are returned in the order that they’re stored in the database. But
sometimes you’ll want to sort them differently. You can do that with the “ORDER BY”
command at the end of your queries as shown in the expanded version of our SQL
template here
SELECT [stuff you want to select] FROM [the table that it is in] ORDER BY [column you want t
For example, the following query shows all the tracks ordered by the album_id. Try
sorting it by other columns. Can you modify it to be sorted by their name?
You can list multiple things to ORDER BY, which is useful in the case where there are a lot
of duplicate rows. In tracks for instance we can order all of the data by the composer and
then by how long the song is (milliseconds) by listing both of those sorting columns.
Try reversing the order of the colums above (ORDER BY milliseconds, composer)
and you’ll see what happens with the reverse prioritization of first sorting by milliseconds.
To test your skills, try getting all the tracks in order of most expensive to least expensive:
LIMIT and OFFSET
If want to LIMIT the number of results that are returned you can simply use the LIMIT
command with a number of rows to LIMIT by.
For example
This ensures only the first 3 results are returned. Besides returning less results, LIMITing
queries can greatly reduce the time they take to run and make your database administrator
a lot less angry with you.
Give it a try by fetching yourself the first 6 rows of the artists table:
OFFSET
You can also specify an OFFSET from where to start returning data.
Say you want to get 5 artists, but not the first five. You want to get rows 3 through 8. You’ll
want to add an OFFSET of 2 to skip the first two rows:
Here’s a challenge for you. Write a query to fetch the Artists in rows 10 through 20:
Browsing the SCHEMA
The word SCHEMA is used to describe a collection of tables and their relationships in your
database. A database instance may have several different schemas. When you’re working
with a set of data, it’s useful to be able to browse that schema to get a sense for what data is
available to you.
You can browse a schema visually using popular database interfaces like PGAdmin,
Postico and Chartio, or in a text-based manner by using SQL itself.
Typically using a visual tool is much easier, but it’s totally up to you. Here we’re going to
quickly cover both, but if you already have a handle on one of the visual editors and are
comfortable with finding out what schema is available feel free to skip this part and move
on to the Basic SQL Practice Grounds.
Schemas in PGAdmin
Once connected to a database, you can expand the trees in the left sidebar in PGAdmin to
find the database, schema, tables and columns available:
The “Properties” tab in the right top of the interface will display all of the extra properties
that the information_schema holds on the table or column including default values, data
type, and more.
Schemas in Chartio
Chartio’s schema viewer simply lists the tables in the Schema tab of any data source
connection.
Each table can be expanded to show the columns underneath. In Chartio you can actually
change the name/alias, define relationships and create custom tables and columns. This
isn’t mapped back to database, but used only for the Chartio Visual Data Explorer.
Clicking on “Visualize” from the data source Schema page will also create a nice
visualization of all of the tables, with their columns listed. In this view, relationships that
are defined are also drawn as connections from one table to another.
The above will get all the tables from all schemas. If we want to look at only the tables in
our chinook dataset we can query for only things in the public schema. The public
schema FYI is the default. We do this by just adding a condition on the table_schema
column.
You’ll notice the artists, albums and tracks tables we’ve been playing with so far in our
tutorial, but look at all those others we’ve been holding out on! There’s also actors,
employees, genres, etc. We’ll dig into looking at the columns in those tables next, but first
take a moment to change the query above to look at what tables are available in
PostgreSQL’s information_schema schema.
You’ll see that one of the tables available in the information_schema is columns, and as
you might guess, like tables held the info on what tables are available columns holds the
info on the columns.
Let’s take a look at the columns in the tracks table with the condition table_name =
'tracks'.
As shown PostgreSQL stores all kinds of information about each column including the
data_type, character_maximum_length, various precision options and an optional
default value.
Another great (and probably easier) way of checking out what columns and data is
available in a table is to simply grab a few rows of it. Don’t grab all the data, it may be a
really big table and we don’t need it all! Let’s take a look at what’s available in the playlists
table with a limited SELECT *.
And of course if you want to see how much data there is in there you can run other
diagnostic queries like a COUNT(*).
If you’re using psql as your connection to your database there are a number of helpful
schema browsing shortcuts. The following are to me the most useful for schema browsing
and, if interested, you can find the full list of psql commands here.
Shortcut Description
\d list of all tables
\d+ list of all relations
\d [table name] list of the columns, indexes and relations for the [table name]
\dn list of all schemas (namespaces)
\l list of all databases
\z list tables with access privileges
Also if you want to refer to visual representation of the schema we have provided it here:
Basic SQL Practice Grounds
You’re through the basics of SQL! This is a great place to stop and get more practice on
what you’ve learned so far. Here we’ve constructed a list of challenges to give you that
practice. Take some time to go through these before moving on to the Mid-Level SQL
section.
If no specific columns or values are called out to return, assume that it’s asking for
all the columns (splat *).
If it does ask for specific information like “names”, only return that column. If you
return other things as well it won’t be able to match the correct answer.
If you’re having trouble with a question use the ‘Hint’ button. If you’re really having
trouble or think that the answer might be wrong, send us a note at
support@chartio.com.
We check if things are correct not by the query you wrote, but by the results that are
returned. This is the best way as there are often a few different ways to get the same
result.
Good luck!
Q. Fetch the 12th through 32nd (21 total) rows of the tracks table.
Q. Fetch the 9th through 49th (41 total) rows of the artists table.
references: limit
references: from
references: order-by
Conditions
Conditions are simply statements that are either true or false. The database takes these
statements and evaluates them across all the rows as it scans through your tables and only
returns the results that are true.
Let’s say for instance that we’d like to see the name of the artist who’s id is 85. The
condition would be id = 85. Try the condition by running the following query:
The query instructed the database to scan the artists table and fetch all the rows where the
condition (id = 85) was true. As you can see, the only artist with id 85 is Frank Sinatra.
Or if we wanted to lookup all the information on ‘Santana’ the condition would be name =
'Santana'
And another quiz: return the id from one of my favorite albums in High School, ‘American
Idiot’.
Multiple matches
In the above examples we were querying on unique fields so we were only getting one
answer in response. That’s not always the case however. In the tracks table many different
tracks belong to the same album, and you can see that in the tracks database there is an
album_id column. For example if we want to get all of the tracks belonging to an album,
who’s id is 89 we could run:
89 just happens to be the same album_id as “American Idiot” had. We’ve just pulled all
the tracks from the album American Idiot! We’ll get more into how we can JOIN this data
based on the common key of album_id in a later section.
See if you can modify the query above to also filter on tracks that are longer than 200000
milliseconds.
NOT
You can invert a condition by simply putting the NOT operator infront of it. For example,
the following queries for everything that is NOT composed by Green Day.
Also like math, you can use parenthesis to specify the order of operations. As a best
practice it’s good to use parenthesis wherever it seems like the logic and order might not
be too clear. To explore let’s attempt to pull all the tracks composed Green Day AND any
track by AC/DC that is over 240,000 milliseconds.
Notice the use of parenthesis making it clear that we only wanted the longer AC/DC songs.
You can see that the Green Day songs under 240,000 milliseconds are still listed. If we
change the parenthesis however, the logic applies the millisecond condition to all Green
Day songs as well.
Practice
Test your skills out and see if you can query for all tracks with price greater than a dollar
and a genre (genre_id) of 22;
Now see if you can query for all tracks with price greater than a dollar and a genre
(genre_id) of either 22 or 19;
There are a lot more Operators than just the equal sign that enable us do some really
complex things. We’ll dive into those operators next.
Operators
So far we’ve only made conditions using the equal (=) or greater than (>) operators. There
are many more at our disposal. They are fairly self-explanatory and just need some
practice to get down. Here’s the table describing the most commonly used operators:
Operator Description
= equal
< less than
> greater than
<= less than or equal
>= greater than or equal
!= not equal
<> not equal (yup, there are two ways)
Take a few moments to get familiar with these operators by filtering out some tracks data.
Here’s a query to get started with:
If we want to find all the tracks that were composed by Green Day (either alone, or in
conjunction with other artists) we need to be able to match rows where the composer isn’t
equal to ‘Green Day’ but contain ‘Green Day’ somewhere in them.
To condition match part of a string, or identify strings following a pattern we can use
either of these string pattern matching operators.
Operator Description
LIKE a string matches a pattern
ILIKE case insensitive version of LIKE
SIMILAR TO a string matches a regex pattern
They take a bit more explanation than the simple comparators above.
LIKE
Like is the easy/lightweight way to match a string to a pattern. A pattern is a string that
can use some special symbols that represent wildcard characters. Besides regular
characters, the two wildcard symbols LIKE can use are
Symbol Description
_ matches any single character
% matches any number of characters
To make a pattern that will match ‘Green Day’ inside of any string we put % symbols on
either side, meaning any number of characters can be before or after Green Day. So with
this pattern as our condition, on running the following query the database will scan for
matches in each row and return those that are true.
Test your skill: can you create a query to return all of the artists with ‘Black’ in their
name?
ILIKE
If you want your pattern to not care about whether characters are upper or lower case you
can use ILIKE. The I stands for “case (I)nsensitive”. So if we wanted to find all composers
that had the word “day” in it regardless of case, we could use:
Note that in the above query if we switched ILIKE to LIKE we wouldn’t match any Green
Day tracks because Day is capitalized.
Here are a few more examples of what patterns will and won’t match.
'Little Richard' LIKE '%Richard' true
'Little Richard' LIKE '_______Richard' true
'Little Richard' LIKE '______Richard' false
'Little Richard' LIKE '%richard' false
'Little Richard' ILIKE '%richARD' true
'Little Richard' LIKE '_ittle %' true
You can play around with patterns yourself by switching the LIKE statements out here
SIMILAR TO
SIMILAR is the more advanced way to match a string to a pattern, using a standard
pattern format called regular expressions (regexp). These can get really advanced (too
advanced for this tutorial) so we won’t go over it in detail. If you’d like to dig in further
however we have our Full Regular PostgreSQL Expressions page here.
For a quick example of SIMILAR TO, here is a querying with a regex to match all tracks
composed by either AC/DC or Green Day.
The following query will fetch all tracks where the composer IS NOT NULL. Try running
it, and also change it up to return only the rows that do have a NULL composer.
Progress Checkin!
The above describes the main toolset of operators you’ll need, but if you’re interested in
learning more checkout the full list of PostgreSQL operators.
You’ve learned a huge chunk of SQL so far, keep it up! Are you seeing how SQL is almost
english like, or at least like an advanced Google search? I hope it’s starting to make sense
and is getting less intimidating. A few more concepts and a bit of practice and you’ll be
quite fluent in no time!
Aggregate Functions
Fetching the raw data is nice and all, but now we’re going to start actually doing some
aggregations and transformations to it! The first and probably most commonly used
aggregation function we are going to learn is COUNT. The COUNT function takes
whatever you give it and returns the count of how many there are.
The following SQL will count how many albums are in our database. Put another way,
we’re going to query for a count of the number of rows are in the ablums table. Play
around yourself and find how many are in the artists and tracks tables as well.
because the composer column has some NULL values (aka. it’s empty sometimes).
COUNT DISTINCT
A commonly used clause with the count function is DISTINCT. The DISTINCT clause
changes the count to only tally the number of unique values in the data. Above we fetched
how many tracks had composers listed. If we actually wanted to see how many unique
composers were in our tracks table we could use the COUNT with the DISTINCT clause as
shown here:
Can you modify the query above to find how many different genre_ids are the tracks
table?
Aliases
A quick aside here: Notice that the column headers on the above datasets weren’t all that
clear. SQL does a okay job of finding a name for what you’re fetching but often, especially
as we start making more complex functions, you’ll want to use your own alias for the data.
You can do so with the AS key word following your selections:
Be sure to use double quotes (“) around your Aliases as double quotes are used for column
titles.
Functions
The following is a list of the most commonly used functions in SQL. They work similar to
COUNT but perform different calculations on the data.
Function Description
MAX returns the largest (maximum) number in a sets
MIN described
COUNT returns a count of the # of values in a set
COUNT DISTINCT returns a count of the # of unique (distinct) values in a set
EVERY returns true if all data inside is true (same as bool_and)
AVG returns the average (mean) of the set of numbers
SUM returns the sum of all the values in the set
The following example gives the range and average prices of the tracks using the MIN,
MAX and AVG functions.
Can you modify the above query to return how much it would cost to buy one of every
track in the database?
We only covered the most commonly used aggregation functions here. If you’d like to see
more checkout the full list of PostgreSQL Functions
GROUP BY
So far our aggregation functions have run across all of the data, but it’s often useful to split
the aggregation into groups.
Let’s say for example that we wanted to get not a count of all of the tracks, but how many
tracks were in each genre. One way of doing this would be to write a separate query for
each genre like this:
But we’d have to know what all the genre_id’s were and use some other tool to combine all
of the results back together. Not ideal.
Luckily, we have the GROUP BY clause which makes this a whole lot simpler. The GROUP
BY clause tells the database how to group a result set, so we can more simply write the
queries above as:
How cool is that?! Can you get a count of all tracks by composer?
It’s useful here to order the results of this query by the count, so we can see which
composers have produced the largest number of tracks (at least in our database).
Above, the NULL composer is being counted as having the most tracks. That’s just noise.
Using what we just learned abour NULL operators, can you modify the query to filter out
the NULL composers?
The priority/order of the groups is the same as how you list them. You can see that
switching the order of genre_id and composer in the GROUP BY clause makes quite a
different query:
Notice that I also added ORDER BY clauses to make the output a little more clear. ORDER
BY’s are quite useful and common when using GROUP BY.
GROUP BY Rules
There are a few rules to follow when using GROUP BYs. The largest is that all data that
isn’t listed as a parameter to GROUP BY needs an aggregation function applied to it. Think
of what the following query:
It throws an error because the database doesn’t know what to do about unit_price. While
there is only one genre_id per group, there are many unit_prices. They all can’t just be
output as a value without some aggregation function.
Can you correct the above query to get the average unit_price by genre_id?
GROUP BY Errors
It’s easy to forget this rule and if so you’re going to see an error like the following
ERROR: column "tracks.composer" must appear in the GROUP BY clause or be used in an aggregat
Just remember that that means you have to either add that column to the GROUP BY or
apply an aggregation function to it so the database knows what to do.
The following example will throw this error because the database doesn’t know what to do
with all of the unit prices. Can you modify it to do return the average unit_price by
genre_id?
JOIN Relationships and JOINing
Tables
So far we’ve been working with each table separately, but as you may have guessed by the
tables being named tracks, albums, and artists and some of the columns having names
like album_id, it is possible to JOIN these tables together to fetch results from both!
There are a couple of key concepts to describe before we start JOINing data:
Relationships
PostgreSQL is a Relational Database, which means it stores data in tables that can have
relationships (connections) to other tables. Relationships are defined in each tables by
connecting Foreign Keys from one table to a Primary Key in another.
The relationships for the 3 tables we’ve been using so far are visualized here:
Primary Keys
A primary key is a column (or sometimes set of columns) in a table that is a unique
identifier for each row. It is very common for databases to have a column named id (short
for identification number) as an enumerated Primary Key for each table.
It doesn’t have to be id. It can be email, username, or any other column as long as it can be
counted on to uniquely identify that row of data in the table.
Foreign Keys
Foreign keys are columns in a table that specify a link to a primary key in another table. A
great example of this is the artist_id column in the albums table. It contains a value of the
id of the correct artist that produced that album.
Another example is the album_id in the tracks database. Earlier in this tutorial we looked
up all the tracks with an album_id of 89. We also looked up which albums had an id of 89
and found that the tracks referred to the album “American Idiot”. TODO: Fix this
paragraph/example.
It is very common for foreign key to be named in the format of [other table
name]_id as album_id and artist_id are, but again it’s not required. The foreign key
column could be of any type and link to any column in another table as long as that other
column is a Primary Key uniquely identifying a single row.
Why Relationships
If we didn’t have relationships we’d have to keep all the data in one giant table like the one
in the figure here.
Each track for example would have to hold all of the information on it’s album and on the
artist. That’s a lot of duplicate data to store, and if a parameter in any of that changes,
you’d have to update it in many different rows.
It gets messy already even for our small example, and just wouldn’t be realistic for real
company implementation. The world (and data) works better with relationships.
JOINing Tables
So let’s get to it! To specify how we join two tables we use the following format
Note that the order of table1 and table2 and the keys really doesn’t matter.
Let’s join the artists and albums tables. In the above figure we can see that their
relationship is defined by the artist_id in the albums table acting as a foreign key to the id
column in the artists table. We can get the joined data from both tables by running the
following query:
We can even join all 3 tables together if we’d like using multiple JOIN commands
JOIN types
There are a few different types of JOINs, each which specifies a different way for the
database to handle data that doesn’t match the join condition. These Venn diagrams are a
nice way of demonstrating what data is returned in these joins.
We can demonstrate each of these by doing a COUNT(*) and showing how many rows are
in each dataset. First, the following query shows us how many columns are in the artists
and albums tables.
And we know that each album does have an artist, but not all artists have an album in our
database.
INNER JOIN
The inner join is going to fetch a list of all the albums tied to their artists. So we know that
as long as each album does have an artist in the database (and it does) we’ll get back 347
rows of data as there are 347 albums in the database. And indeed, that is what we get back
from the INNER JOIN:
An OUTER JOIN is going to fetch all joined rows, and also any rows from the specified
direction (RIGHT or LEFT) that didn’t have any connections. In our database, many
artists don’t have an album stored. So if we do a RIGHT OUTER JOIN here which specifies
that the right listed artists table is the target OUTER table we will get back all matches
that we did from the INNER JOIN above AND all of the non matched rows from the
artists table. And here we show we do:
418 OUTER results minus 347 INNER results shows that there are 71 artists in the
database that aren’t associated with one of our albums. Can you double check that that’s
the case with SQL, by adding a WHERE condition to the above query filtering the results
for those where there is no albums.id?
If we chose to do a LEFT OUTER JOIN we’d be choosing the albums table as the OUTER
target. And here we are verifying that there are no extra albums that don’t have an artist
associated with them.
And finally a FULL OUTER JOIN is going to return the JOINed results and any non-
matched rows from either of the tables. We know that in the case of this dataset those will
only come from the artists table, and the result will be the same as our RIGHT OUTER
JOIN above.
Scrolling right you can see that there are a lot of columns as the result has all of the
columns of each joined set. You can also see that there’s a conflict as there are 2 columns
title name. One is from the tracks table and one is from the artists table and the result set
isn’t handling that properly. It’s just using the names from the artists table in both
columns!
We can fix this by using aliases. In the following we’re trying to get the names of 8 tracks
along with the name of the artist. Run it and you’ll see for yourself. Can you fix the mixup
in them both having the same column name using the aliases AS "Track" and AS
"Artist".
You have now unlocked the knowledge to fully enjoy most of the double entendres in this
amazing song about Relationships. Do take a moment to enjoy.
DATE and TIME Functions
DATE and TIME values in PostgreSQL have a whole special set of functions and operators
for their proper use. So many queries deal with DATE and TIME information that it’s
important to get to know the date tools. Below we’ll cover and practice the main functions
you’ll likely need. If you want to get detailed you can checkout the full list of PostgreSQL
functions here.
DATE/TIME Datatypes
There are 4 main ways to store date values in a PostgreSQL database:
YYYY-MM-DD HH:MM:SS
where the letters stand for Year, Month, Day, Hour, Minutes and Seconds. Let’s say for
example that we want to record that we got a new user on December 10, 2019 at exactly
01:14. To represent that exact date and time we would use the format:
2019-12-10 01:14:00
To get some familiarity try creating and SELECTing a few TIMESTAMPS below. I was
born on May 1st, 1983 at exactly 4:00am. Can you fetch that timestamp?
We’re just going to jump in here. We need to use a different table as none of the previous
ones we’ve been using have had date fields in them. Another table available to us in
chinook is employees. Let’s get familiar with what columns are in this table by looking at
the first few rows. Note that there are several columns so you may have to scroll right to
see all of the data:
Each employee has two TIMESTAMP columns, one for their birth_date and one for their
hire_date. You can use all of the ORDERing, GROUPing and other functions we learned
for other columns on DATE columns as well. Try getting a list of the 4 youngest employees
in the company.
where [date type] is a column or value of any of the above listed date/time data types, and
[pattern] is a string indicating how to format the output date. The main symbols you’ll
want to use to create your format patterns are here
The above patterns can be string together to get the format you eventually want. Some
common outputs are:
and
and
You don’t have to memorize these (it’s hard to!). It’s just good to get familiar with how it
works and then reference back to it when you need it in the future.
Number formatting
There are a couple of extra tools you can use on patterns that output numbers.
String Formatting
For string outputs, most of the patterns above support different casing output based on the
case you use for the pattern. Some examples using different casings of “Day”:
And you can see the following common date format in UPPERCASE, Capitalized and
lowercase formats:
Note that the case for numeric values doesn’t change. Still use DD for the day # of the
month and YYYY for year.
We’re going to move on in the tutorial but if you’d like more details checkout the full list of
PostgreSQL date formatting functions .
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
GROUPing BY DATE
In analytic queries, it’s very common to group things by dates. For example you may want
to see new users by year, month, week or day. To do so, you’ll want to use the TO_CHAR
function to convert the dates into a truncated string before you GROUP BY it. You don’t
want to simply GROUP BY the raw date as those are accurate down to the millisecond so
grouping by the unaltered date would be like making GROUPs for each millisecond.
The following examples are using the hire_date field from the employees table and show a
lot of common formats you can use for these groups. These are what we use at Chartio for
our date group formatting standards.
Feel free to try out any of the above formats on the query below:
There are only 8 employees in our database so we’re not dealing with too many groups
there. You can get a little more granular with the invoices table and it’s invoice_date
column with 250 rows.
The above query returns the number of invoices created per year. Can you modify it to get
a SUM of the total amount invoiced by month?
Mid Level SQL Practice Grounds
You’ve covered the majority of the main use cases of SQL! You know the stuff, but now
you’ve got some practicing to do to become really fluent and skilled at it. Here we’ve
constructed a large list of challenges to give you that practice. If you forgot the rules of our
practice playgrounds you can review them in the Basic SQL Practice page.
Good luck!
Q. Fetch all the tracks that are over 300000 milliseconds long.
references: where
references: where
references: operators
references: operators
Q. Get all the tracks that were composed by just Miles Davis
references: operators
Q. Get all the tracks that Miles Davis had a part in composing.
references: operators
Q. Fetch the the names of the tracks with the word ‘wild’ in it, regardless of
case
references: operators
references: aggregate
Q. How many unique composers are there in the tracks table with the
genre_id of 1.
references:
Q. What is the average length for tracks with genre_ids of either 5, 7 or 10?
references: group-by
Q. Take the above query, but order the album_id in descending order,
keeping genre_id odered the same
Q. Take the above query with the same ordering but group by album_id and
then group_id and change the order of the results to reflect that switch.
Q. Get the first_names and birth_dates of each of the employees in the format:
January 01, 1976
references: dates
Q. Get the first_names and birth_dates of each of the employees in the format:
Jan 1st, 1976
references: dates
Q. Get the first_names and birth_dates of each of the employees in the format:
09/23/1987
references: dates
Q. Get the year of the invoice_date in the format Y2012 and total number of
invoices per year.
references: dates
Q. Get the year and month of invoices and the total amount that was invoiced
for that year and month.
UNION
UNION ALL
If we were to now perform the UNION ALL on the same data set, the query would skip the
deduplication step and return the results shown.
*Note: In both of these examples, the field names from the first SELECT statement are
retained and used as the field names in the result set. These can be changed later if
desired.
UNION-ing data
UNION or UNION ALL have the same basic requirements of the data being combined:
1. There must be the same number of columns retrieved in each SELECT statement to
be combined.
2. The columns retrieved must be in the same order in each SELECT statement.
3. The columns retrieved must be of similar data types.
The next 2 examples shows that we would return results whether we used UNION or
UNION ALL since all required criteria are met.
This final example would fail. While we have the correct number of columns, they are now
queried in the wrong order in the second SELECT statement and thus the data types also
do not match up. This will result in an error being returned.
Summary
We have seen that UNION and UNION ALL are useful to concatenate data sets and to
manage whether or not we retain duplicates. UNION performs a deduplication step before
returning the final results, UNION ALL retains all duplicates and returns the full,
concatenated results. To allow success the number of columns, data types, and data order
in each SELECT must be a match.
Exclude a Column
In some cases, you may have a table with many fields and desire to write a query that
selects nearly all of them. In a situation like this, it would be nice to be able to write a
query that combines a SELECT all with a shorter list of exclusions.
Unfortunately, since SQL is a declarative language, this cannot be done. When we use
SQL, we must specify what we want, not what we do not want. The 2 best viable ways to
approach this problem are as follows:
Omit Columns
List out all columns in your query, omit the undesired fields by:
Not including
Deleting
Commenting out
Not including columns or deleting columns you don’t want in your SELECT statement is
straightforward. However if you would want to show that you are leaving out certain
columns intentionally you can comment them out by using two dashes –
SELECT
column1,
--column2,
column4
FROM
table_name;
The SQL DESCRIBE statement is useful here to obtain the full list of the fields in a table,
especially if there are a great number.
DESCRIBE table_name;
It will produce a table with all column names from the table being described and some
other meta information.
Create View
If you will often be querying a table and retrieving most of its columns, then it may make
sense to create a view. The view would persist as a “virtual table,” against which SELECT *
queries could be run.
Here is how it would look if we wanted to end up with all of the columns except column2:
SELECT *
FROM view_name;
In theory this is a very good idea, however as the view definition becomes more extensive
and/or complex, there could be a negative performance impact. Since the view is a “virtual
table” the data does not reside within the view. Each time a query is made against the view,
the view’s definition query against the original database tables must also run. The
convenience gained by being able to SELECT * should be weighed against the time and
resources needed to support the view.
Additional Practice
For practicing we will be using an online music store database. Here is the entity
relationship diagram of the schema.
Feel free to explore the data by using SELECT * FROM [table name] in the SQL editor
below:
Select Questions
Select all columns and rows from the albums table in the SQL box below:
Select all columns from the albums table where the album title is ‘Let There Be Rock’ in
the SQL box below:
Join Questions
Join the Artist and the Album table with an inner join in the SQL box below:
Join the Artist and the Album table with an left join in the SQL box below:
Join the Artist and the Album table with an outer join in the SQL box below:
The INNER JOIN found every instance where the albums.artist_id equalled an artists.id
and joined the data together to create a row in the final table.
The LEFT JOIN performed an INNER JOIN and then also added rows to the final table
where the left table (albums) did not have matches.
The OUTER JOIN performed both an INNER and LEFT JOIN and then also added rows to
the final table where the right table (artists) did not have matches.
OR is used to find where at least one out of multiple conditions are true
To get more technical, boolean logic is a way of representing how bits in a computer are
processed. Let’s explore more about these conditional statements (e.g. if-else, where, or
case-when statements) with truth tables to understand how precisely boolean logic works.
Truth Tables
For example, let’s look at the following conditional:
If: A and B
Then: C
This returns the value C, when the values A and B are true. We can represent this using
something called a truth table. A truth table is a way of representing every possible input
and it’s corresponding output. The truth table for this AND statement looks like this:
ABC
1 1 1
1 00
01 0
000
In the truth table, a 1 represents true while a 0 represents false. From looking at this table
it is evident that the only time C is true, is when both A and B are true.
If: A or B
Then: C
Truth table:
ABC
1 1 1
1 01
01 1
000
This truth table might be a little different then you were expecting. This is because an OR
statement is only false when both input values (A and B) are False.
If: (A or B) and C
Then: D
The first step to building a truth table is to decide how many rows we need. The way to
decide this is to check how many inputs we have and raise two to that number. In this case
we have 3 inputs so we need 2^3 or 8 rows.
Next we need to decide how many columns to use. In this case we will have one column for
each input, one for the output, and one for the value of A and B. The truth table will look
like this:
A B C A or B D
1 1 1 1 1
1 1 01 0
1 0 1 1 1
1 0 0 1 0
0 1 1 1 1
0 1 0 1 0
0 0 1 0 0
0 0 0 0 0
As expected, when the table is filled out, the only true output is when all 3 inputs are true.
ABC
1 1 1
1 00
01 0
000
Notice that, when A is False (0), C is also always False. This is because C is only true when
both inputs are true, therefore a single false means C is false.
If a computer is using an AND condition and the first input is false, then the second input,
B, will never be checked. OR will evaluate as true without checking the second input when
the first input is true. This ability for the computer to invalidate later boolean logic steps
can save a lot of unneeded processing power for your query.
Examples in SQL
Example of a WHERE condition:
WHERE
CASE
END
Summary
- Note: “WITH NO DATA” specifies that the new table should only copy the table structure
with no data
Copying data between tables is just as easy as querying data however it will take a bit
longer to run than a normal query. It can be used to update an inventory, create a table
that has different permissions than the original, and much more.
Example:
Take for example a shopkeeper who needs to create a master list of all the items in his
store to conduct a store-wide audit. However the data he needs exist in separate tables
containing the inventories of each department:
In order to create a master list that contains all of the store’s items and prices the
shopkeeper needs to create the table for all items and copy the data from each of the
departments into the new table.
The shopkeeper needs to first make a new table to contain the data. The master list needs
to have the same table structure (columns, data-types, etc.). The easiest way to create a
table with the same table structure as a different table is to use:
Once filled out, this command will create a new table with the same table structure, but
without any data. The shopkeeper can use this to create his master list:
With this done, the shopkeeper now has the following tables:
INSERT INTO command
Now that the shopkeeper’s master list has been created and structured, the data needs to
be inserted into the table. This can be done using the INSERT command. This command
inserts specified values into a specified table. It is often used to insert single values into
tables by running the command as such:
When using INSERT INTO with the VALUES command it is possible to add entries by
hand, however a query can also be used in place of the VALUES command. For example to
copy all items from the table “hardware” to the table “masterlist” the following query can
be run:
This query uses a subquery to find all values in “hardware” and then adds them to the
“masterlist”. In order to copy data from all the tables, the shopkeeper can use UNION to
merge the tables together in the subquery:
This gives the shopkeeper the desired result so that he can begin his audit:
Adding Conditions
Copying data with INSERT INTO can also be done with conditions. For example, if the
shopkeeper is only interested in items over $50 these values can be copied by using:
Each SELECT statement can also have its own where statement for table specific
conditions. After the table is created and filled it can be manipulated, added to or removed
from without affecting the tables the data was copied from.
Video example
Copy Data and Table Structures to Other Tables in PostgreSQL 11.4
Summary
To copy create a pre-structured table:
CREATE TABLE [Table to copy To] AS [Table to copy From] WITH NO DATA;
Table will be pre structured to handle data from the ‘table to copy from’
Copy into pre-existing table:
INSERT INTO [Table to copy To] SELECT [Columns to Copy] FROM [Table to
copy From] WHERE [Optional Condition];
Will create independent copy in the new table
References
1. https://www.postgresql.org/docs/9.5/sql-insert.html
2. https://stackoverflow.com/questions/25969/insert-into-values-select-from/25971
Export to CSV with \copy
The Commands:
In order to export a table or query to csv use one of the following commands:
Key words:
csv: this tells the copy command that the file being created should be a CSV file.
header: this tells copy command to include the headers at the top of the document.
CSV Files
Comma Separated Value (CSV) files are a useful format for storing data. Many tools
support importing data from CSV files because it is an easy to read format that is plain text
and not metadata dependent.
In psql there are two commands that can do this, both slightly different.
The first is the \copy meta-command which is used to generate a client CSV file. This
command takes the specified table or query results and writes them to the client’s
computer.
The second command, COPY, generates a CSV file on the server where the database is
running.
The \copy meta-command is used for exporting to a client computer. It is useful for
copying a database that may have somewhat restricted access and for creating a personal
copy of the data. For example, a user may want to generate a csv so that they can analyse
financial data in excel. The format of a \copy to csv is as follows:
The [Table/Query] section can be filled with a table or query. For example to copy all
entries from a table, the table name can be put here. To copy all entries that contain “saw”
in their names from the table of tools to a csv, the following commands could be run:
The [Relative Path] is the path from where psql is currently saving files to where you want
to save the file. The location that psql is currently saving can be found by using the \! pwd
command.
Note: The \! meta-command takes whatever arguments it is given and runs them as a bash
command within psql.
The pwd command prints the current working directory. The meta-command \! pwd and
\! ls are shown being used below:
This means that if the file name “myTools.csv” is used as the [Relative Path], it will be
saved in /Users/matt/ as can be seen below:
The file can also be saved elsewhere by entering a specific relative path. For example, if
‘/Desktop/[Filename].csv’ is entered as the path, the file will be saved to the desktop.
Following the Relative Path in the command is the text ‘csv header;’ This text does two
things. The ‘csv’ specifies that the data should be stored in the CSV format. Other possible
formats are ‘text’ and ‘Binary.’
The ‘header’ specifies that, when the data is copied to a csv file, the names of each column
should be saved on the first line as shown here:
Summary
To copy a table or query to a csv file, use either the \copy command or the COPY
command.
\copy should be used for a copy to local systems
\copy uses a relative path
COPY should be used to create a csv on the server’s side.
COPY uses an absolute path.
References
1. https://www.postgresql.org/docs/9.2/app-psql.html#APP-PSQL-META-
COMMANDS-COPY
2. https://www.postgresql.org/docs/9.2/sql-copy.html
3. https://tableplus.io/blog/2018/04/postgresql-how-to-export-table-to-csv-file-with-
header.html
PostgreSQL Generate_Series
Generate a Series in Postgres
generate_series([start], [stop], [{optional}step/interval]);
The function requires either 2 or 3 inputs. The first input, [start], is the starting point for
generating your series. [stop] is the value that the series will stop at. The series will stop
once the values pass the [stop] value. The third value determines how much the series will
increment for each step the default it 1 for number series
For example:
Let’s look at what happens when we start with a number that has a decimal value:
Note that the value starts at 0.5, but still increments by 1. In order to change the
increment, we have to state explicitly how much to increment by as a third option in the
function:
Timestamps
Generate_series() will also work on the timestamp datatype. This may need an explicit
cast to work.
Note the ::timestamp. This is an explicit cast to the timestamp data type. The reason for
this is because without the cast the data type is too ambiguous. This results in an error
being thrown when the query is run:
This error can be avoided by adding the typecast. This will only happen on certain inputs
which are ambiguous in terms of data type.
Interval Format
Notice the use of ‘6 hours’ for the third option in the image above. When generating a time
series there are additional options for how you define the way the series increments.
The 3rd input, the interval, follows the format of [quantity] [type] [{optional}
direction].
[quantity] => 6
[type] => hours
In the case of 6 hours, the quantity is 6, the type is hours, and the direction is omitted so
it defaults to positive. If you want the same list but opposite order you can change the
interval to ‘6 hours ago’.
Adding ago specifies that you want the timestamps to change by 6 hours in the negative
direction. This will however return 0 rows unless you reorder your start and stop values.
The interval can also be created using a shorthand form. Some of the time types can be
abbreviated as shown by this table:
Type Abbreviations
Millennium -
Century -
Decade -
Year Y
Month M
Week W
Day D
Hour H
Minute M
Second S
Millisecond -
Microsecond -
In order to use the abbreviations we can create the interval using a shorthand notation.
This follows the following format:
The P is used to show that the interval is starting and the T indicates that the date
(year/month/day) portion of the interval is over and this is now the time
(hours/minutes/seconds) portion of the interval
P5DT3H
An interval of 9 years 8 months 7 days 6 hours 5 minutes and 4 seconds would be:
P9Y8M7DT6H5M4S
PT6H
While this shorthand is much faster to write, it does sacrifice some of its readability to
achieve this.
Summary
Standard form: generate_series([start], [stop],
[{optional}step/interval]);
generate_series() can take several different sets of inputs
Can be Numeric or Timestamp data types
If start/stop are timestamps:
Use an explicit type cast
Use an interval (e.g. 6 hours or 1 week ago)
Step defaults to 1 for numeric unless otherwise specified.
Time interval can be written in shorthand:
Format: P [quantity] [unit] … T [quantity] [unit] …;
P5DT6H7M = 5 days 6 hours 7 minutes
Resources
http://www.postgresqltutorial.com/postgresql-interval/
https://www.postgresql.org/docs/current/functions-srf.html
How to Create a Copy of a Database
in PostgreSQL
To create a copy of a database, run the following command in psql:
The first step to copying a database is to open psql (the postgreSQL command line). On a
macOS this can be done when you start the server.
Now that a connection has been established, we can begin writing queries. You can switch
to other databases by typing “\c [Database Name]”. To look at all the databases, the \list or
\l meta-command can be used:
Replace the bracketed portions with your database names and username. This query will
generate a copy of the database as long as the “Database to copy” is not currently being
accessed. If the “Database to copy” is being accessed by a user, that connection will have to
be terminated before copying the database. To do this, run the following command:
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = '[Database to copy]'
AND pid <> pg_backend_pid();
This query will terminate any open connections to the “Database to copy”, and will cause
brief interruptions to anyone accessing the “Database to copy”. It will disconnect users
from the database, however psql will automatically reconnect a user whenever they run
their next query as shown below:
Once they reconnect they can then run queries again against the database.
Note: They will not be able to reconnect until the database is completely copied.
Once you terminate the connections, create the copy using the first command to CREATE
a copy of the specified database. Due to the fact that people can reconnect between the
time you terminate and the time you copy, you may want to structure your commands like
so:
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = '[Database to copy]'
AND pid <> pg_backend_pid();
CREATE DATABASE [Database to create]
WITH TEMPLATE [Database to copy]
OWNER [Your username];
When structured and run like this, the CREATE DATABASE command will run
immediately after terminating connections. This will help ensure no connections form
between terminating connections and copying the database.
How to Export PostgreSQL Data to a
CSV or Excel File
PostgreSQL has some nice commands to help you export data to a Comma Separated
Values (CSV) format, which can then be opened in Excel or your favorite text editor.
To copy data out first connect to your PostgreSQL via command line or another tool like
PGAdmin.
Note, PostgreSQL requires you to use the full path for the file.
For example, the following query exports all the blues (genre #6) tracks from a table.
Opening
After you have run the copy command you can then open the .CSV file(s) with Excel or
your favorite text editor.
Did you know, that you can also import data from CSV or Excel files into PostgreSQL?
How to Replace Nulls with 0s in SQL
UPDATE [table]
SET [column]=0
WHERE [column] IS NULL;
Null Values can be replaced in SQL by using UPDATE, SET, and WHERE to search a
column in a table for nulls and replace them. In the example above it replaces them with 0.
Cleaning data is important for analytics because messy data can lead to incorrect analysis.
Null values can be a common form of messy data. In aggregation functions they are
ignored from the calculation so you need to make sure this is the behavior you are
expecting, otherwise you need to replace null values with relevant values.
Video
UPDATE takes a table and uses the SET keyword to control what row to change and what
value to set it to. The WHERE keyword checks a condition and, if true, the SET portion is
run and that row is set to the new value. If false, it is not set to the new value.
UPDATE [table]
SET [column]=[column]+1;
UPDATE [table]
SET [column]=1+random()*9::int;
Generates a random double precision (float8) type number from [0,1), multiplies it by 9,
and adds 1 to that value and casts it to an integer type for each row.
UPDATE [table]
SET [column]=MOD([column],2);
Uses MOD to set the column values to the remainder of the column values divided by 2.
Summary
To replace Nulls with 0s use the UPDATE command.
Can use filters to only edit certain rows within a column
Update can also be used for other problems like:
Generating random data
Adding one to every row in a column (or where a condition is true)
Setting Values based on if a column is even or odd
Etc.
How to Start a PostgreSQL Server
on Mac OS X
There are two main ways to install PostgreSQL on mac OS X.
Using Homebrew
Homebrew can be installed by running the following command in a terminal:
brew update
brew doctor
Homebrew is a powerful package manager with many uses, including installing and
running postgreSQL. This can be done by typing the following command into a terminal:
Now that postgres is installed the default server can be started by running the command:
This will start up a postgres server hosted locally on port 5432. The server will be run out
of the directory /usr/local/var/postgres.
psql postgres
This will connect to the server and access the postgres database. Once this is done:
This shows that the server has been started and can be connected to.
(Optional) Creating a Custom Data Directory
A custom data directory can also be used for a server. To do this, first create a directory to
be used as the server location. For example, create a directory called myData in the home
directory:
Once the directory is created, the server can be initialized. This means that we configure
the directory and add the necessary files to run the server. To do this run the initdb
command as shown:
This will fill the myData directory with files necessary to run the server:
Now that the server is initialized and the log file is created, you can start the server from
this directory. To do this use the command and substitute in for the specified values:
The “Data Directory” refers to the directory that was just initialized (in this case myData).
The “Log file” is a file that will record server events for later analysis. Generally log files are
formatted to contain the date in the file name (e.g. “2018-05-27.log” or “myData-logfile-
2018-05-27.log”) and should be stored outside of the database that they are logging so as
to avoid unnecessary risks. Log files can be dense to read but are very useful for security
and debugging purposes:
The command above will generate a log file like the one shown, start the server, and tie the
log file to the server. If a log file is not specified, events will be logged to the terminal:
The server will only start if the port is free. If the default server is running it must first be
stopped using the pg_ctl -D /usr/local/var/postgres stop command:
psql postgres
This will start the server. Details on the server can be found by opening the server settings:
This interface shows all the essential information regarding the server. It also allows the
port to be changed very easily. This is useful because multiple PostgreSQL servers can
Note: To change the port in the terminal, the ‘postgres.conf’ file (which can be found in the
data directory) must be edited. This looks like the following:
For example, the ‘postgres’ database on the server can be connected to using the psql tool
with postgres as an argument:
/Applications/Postgres.app/Contents/Versions/latest/bin/psql postgres
Rather than typing out the full path each time however, the path can be added to a file that
will allow significantly easier access to the tools, allowing the tools be accessed from any
directory on the computer. To do this, the following command can be run in the terminal:
Once this is done, the ‘postgres’ database can be accessed by simply typing:
psql postgres
Summary
Homebrew:
Download/update Homebrew
Use Homebrew to install postgres
(Optional) Create New Data Directory
initdb
Start Server
App:
Download app and move to Applications
Run App
(Optional) Set different port for multiple servers
Start Server
(Optional) Add path so that command line tools are easy to access
References
1. https://www.postgresql.org/docs/10/app-initdb.html
2. https://postgresapp.com
3. https://www.postgresql.org/docs/10/app-pg-ctl.html
Importing Data from CSV in
PostgreSQL
Importing from CSV in PSQL
As mentioned in this article on exporting data to CSV files, CSV files are a useful format for
storing data. They are usually human readable and are useful for data storage. As such, it
is important to be able to read data from CSV articles and store the data in tables. This can
be done in psql with a few commands.
Syntax:
COPY [Table Name](Optional Columns)
FROM '[Absolute Path to File]'
DELIMITER '[Delimiter Character]' CSV [HEADER];
Key Details:
There are a few things to keep in mind when copying data from a csv file to a table before
importing the data:
1. Make a Table: There must be a table to hold the data being imported. In order to
copy the data, a table must be created with the proper table structure (number of
columns, data types, etc.)
2. Determine the Delimiter: While CSV files usually separate values using commas, this
is not always the case. Values can be separated using ‘|’s or tabs (\t) among other
characters. (NOTE: for tab delimited CSV files (also known as TSV files however the
CSV command is still used for TSV) use: “DELIMITER E’\t’ ” The ‘E’ allows for the
tab character to be used)
3. Does the Data Have a Header: Some CSV files will have Headers while others will
not. A Header is a file which contains the column names as the first line of values in
the file. If a header is present, include HEADER at the end of the query. If there is not
a header in the data, do not include HEADER.
Video:
Example:
Take this list of items as an example:
This data contains two columns: ‘name’ and ‘price.’ Name appears to be a VARCHAR due
to it’s different lengths. Price appears to be MONEY. This will help in creating the table to
load the CSV file into.
The first step, as stated before, is to create the table. It must have at least two columns, one
a VARCHAR type and the other a MONEY type:
Note: It is also possible to import the csv data to a table with more than 2 columns,
however the columns that should be copied to will need to be specified in the query (e.g.
COPY items(item, value) FROM…).
Now that a table, ‘items,’ has been created to house the data from the csv file, we can start
writing our query. The second step in copying data from CSV is to check the delimiter and
the third step is to check for a header. In this case, the delimiter is ‘,’ and there is a header
in the file:
Since the header and the delimiter is known, the query can now be written. As before, the
syntax is:
So in order to import the csv we will fill out the necessary parts of the query:
The message COPY 31 indicates that 31 rows were successfully copied from the CSV file to
the specified table.
Summary
Make a table to store the data
Determine what delimiter was used
Verify if a header is exists
Copy from the csv file
Meta commands in PSQL
Meta commands are a feature that psql has which allows the user to do powerful
operations without querying a database. There are lots of metacommands. Here is a list of
some of the more common meta commands along with a very brief description:
Multiple meta commands can be used in one line. For example you could use \dt\di to
list all tables and then list all indexes with additional technical information on the indexes.
Extra Details
Adding + to the end of the meta command that lists items will provide a small amount of
extra technical information. This will work on any \d commands as well as some others.
Common Error
Meta commands are delimited by a new line as opposed to a ;. This means that you would
never see a meta command look like this: \x; The semicolon at the end is unnecessary
and will throw an error:
There are quite a few more meta commands that were not listed as they were for relatively
niche usages and not as commonly used. For a full list of meta commands use \?. This will
bring up the meta command help page with a full list of every meta command and a brief
description of its functionality.
This image shows the first commands from \? There are 102 commands in total.
Summary
Meta commands are useful commands that can be run from a psql client.
All metacommands begin with \
Adding + will provide extra detail on certain metacommands
For a full list of meta commands use \? in psql
Do not end meta commands with ;
Outputting Query Results to Files
with \o
\o [filename].txt
[Query or Queries to write to file];
\o
Outputting query results to a file instead of the terminal allows the data to be saved later
analysis. The results can be shared easily and provide a snapshot of the data at the time of
the query.
In order to output to a file, several methods can be employed. In this article the \o method
of writing to files will be explored. One other method is using either \copy or COPY which
are discussed in this article.
Video
The \o metacommand
\o is a metacommand. This means that it is delimited by a new line in the terminal rather
than being part of the query. Simply write the metacommand and then press enter/return
to run the command.
\o [filename].txt
This will start writing the results of subsequent queries and certain
metacommands to the specified file.
[Query or Queries to write to file];
Since these lines are after \o [filename], these queries will be logged to the
file.
Depending on version, the results of \d as well as \di, \dt, etc will be
printed to the new file. (see example below)
\! Commands will not be printed to the file.
If the output of a command is logged on the console, this means that it
was not written to the file. If the result is not shown on the console, then
the result was sent to the file
\o
Using \o again will close the file. This means that after running \o the file is
done being written and can not be reopened using \o [filename].txt.
Running \o [filename].txt again with the same filename will overwrite
the file.
Can also be terminated with \q
Example use
Let’s look at an example of \o being used:
As you can see, the output of \dt and the SELECT query are not shown on the console.
This indicates that they are being logged in the file. We can confirm this if we check the
file. This can be done manually outside of psql or through psql using the \! meta
command:
\! allows the user to use terminal commands and see the results without leaving the psql
environment. As such, the file contents can be checked quickly using commands like cat
which displays the contents of the text file to the screen.
Summary
\o can be used to write query results to a file instead of the console:
\o [filename].txt
[Query or Queries to log to file];
\o
Can write the results of certain meta commands to the file.
Can be checked using: \! cat [filename].txt
How To Generate Random Data in
PostgreSQL
There are occasionally reasons to use random data, or even random sequences of data.
PostgreSQL supports this with the random SQL function. The following are some nice
examples of how to use this.
If you’d like to scale it to be between 0 and 20 for example you can simply multiply it by
your chosen amplitude:
And if you’d like it to have some different offset you can simply subtract or add that. The
following will return values between -10 and 10:
Notice that it returns a random result as expected, but unlike above, it’s the same random
result every time. Change the seed value (.123) in the setseed function above and notice
that it will now choose a different random value but maintain that on multiple runs. To get
the answer correct to the above SQLBox, set the seed to .42.
To understand what’s happening, imagine that there is a long list of random numbers that
the computer chooses from. Setting the seed is like telling PostgreSQL to always start at
the same spot every time.
A quick tip: some SQL interfaces’s (like Chartio’s) won’t let you run/return multiple
queries in a connection, which is necessary to set the seed. This can be worked around by
using the WITH function as shown here:
Random Sequences
If you’d like full sequences of random data you can use the generate_series function
to generate a series of dates.
The following example gets a random value for each day between February 2017 and April
2017.
We’ve visualized the sequence with Chartio here to make it more clear what’s going on
with the data.
Random Sequence
0.8
0.6
0.4
0.2
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
The above results are all between 0 and 1 as again that is what’s returned from random().
As above, to add an amplitude and minimum offset to it we can simply multiple and add to
the random value. The following makes a random sequence with values in the range of 10
to 17.
Random Sequence
16
12
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
Multiplying the row number by our random makes our data linearly increase as you can
see in the chart.
800
600
400
200
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
4K
3K
2K
1K
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
Similarly to get a exponential decay we can take the power of a number less than 1 (see
(.9^(row_number() over()))).
1.2K
800
400
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
And PostgreSQL also has a log function we can use to model random logarithmic growth:
2.4K
1.6K
800
0
Feb 5, 2017 Feb 15, 2017 Feb 25, 2017 Mar 7, 2017 Mar 17, 2017 Mar 27, 2017
There are a lot great things you can do with PostgreSQL’s random() function combined
with generating series to get sequences. Feel free to play around with a few yourself in the
SQLBox below, or using Chartioif you’d like to visualize them as well.
Using ALTER in PostgreSQL
In SQL, tables, databases, schemas, groups, indexes, servers, and more can be modified
using the ALTER command. This command enables the user to modify a specific aspect of
the table, database, group, etc. while leaving the rest of the data untouched.
There are many alterable things in postgreSQL heavily discussed in the PostgreSQL
Documentation. This article will only focus on a few main uses of ALTER (ALTER TABLE
and ALTER DATABASE.) For a comprehensive list, check the documentation here.
Warning: Altering tables and databases alters critical parts of their structure. As such,
queries that ran on tables/databases that were altered may no longer work and may need
to be rewritten.
Video
ALTER TABLE
Altering tables is a very common use of ALTER. Using ALTER TABLE is very useful for
adding, removing, and editing columns:
ALTER can also be used to change the datatype of a pre-existing column. For example,
you can change a boolean to a char:
This usage of ALTER takes a column and converts it into a different type using a specified
method for this (in this case the cast: belts::char).
Table Constraints
Another usage of ALTER TABLE is to add table constraints. For example, if a column
should be unique:
This command can also be used to add a constraint to the whole table.
NOTE: An error will be thrown if a constraint is added to a column that already breaks
that constraint (e.g. adding the UNIQUE constraint to a non-unique column will throw an
error).
Common constraints include: NOT NULL, PRIMARY KEY, and UNIQUE (full list included in
the documation). The constraint can also be dropped using the same command with the
DROP CONSTRAINT command instead:
ALTER TABLE can also be used to rename the table or column that is being accessed. To
do this, use the rename command:
Or
ALTER DATABASE
Databases can also be modified using the ALTER command. There are fewer things that
can be modified in a Database, however they have very serious effects. As such they often
have required permissions to execute them. The things that can be changed using ALTER
DATABASE are:
Allow Connections: Whether the database allows connections to itself. NOTE: this will
block all connections when true, even connections from localhost. It will need to be
set to false before it can be connected to again.
Owner: Can set the owner of the database. Only the current database owner and
superusers can change the owner.
Example:
References
https://www.postgresql.org/docs/11/sql-alterdatabase.html
https://www.postgresql.org/docs/11/ddl-constraints.html
https://www.postgresql.org/docs/11/sql-altertable.html