Chapter 5 - Introduction To SQL: JJM/IT/IT-Portal/2011/DBS
Chapter 5 - Introduction To SQL: JJM/IT/IT-Portal/2011/DBS
5
Chapter 5 Introduction to SQL
Structured Query Language (SQL) is a high-level language that allows users to manipulate relational data. One of the strengths of SQL is that users need only specify the information they need without having to know how to retrieve it. The database management system is responsible for developing the access path needed to retrieve the data. SQL works at a set level, meaning that it is designed to retrieve rows of one or more tables. SQL has three categories based on the functionality involved: DDL Data definition language used to define, change, or drop database objects DML Data manipulation language used to read and modify data DCL Data control language used to grant and revoke authorizations In this chapter, you will learn the history of SQL and how to work with this powerful language. We will focus on four basic SQL operations commonly used by most applications: Create, Read, Update, and Delete (CRUD).
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
116
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
The statement above creates a table with the name myTable, having one column with the name col1 that can store data of type integer. This table will accept any integer or NULL value as a valid value for col1. NULL values are described later in this section. 5.2.2.1 Default Values When data is inserted into a table, you may want to automatically generate default values for a few columns. For example, when users to your Web site register to your site, if they leave the profession field empty, the corresponding column in the USERS table defaults to Student. This can be achieved with the following statement:
CREATE TABLE USERS (NAME CHAR(20), AGE INTEGER, PROFESSION VARCHAR(30) with default 'Student')
To define a column that will generate a department number as an incremented value of the last department number, we can use following statement:
CREATE TABLE DEPT (DEPTNO SMALLINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 500, INCREMENT BY 1), DEPTNAME VARCHAR(36) NOT NULL, MGRNO CHAR(6), ADMRDEPT SMALLINT NOT NULL, LOCATION CHAR(30))
The SQL statement above creates a table DEPT, where the column DEPTNO will have default values generated starting from 500 and incremented by one. When you insert rows into this table, do no provide a value for DEPTNO, and the database will automatically generate the value as described, incrementing the value for this column for each row inserted. 5.2.2.2 NULL values A NULL represents an unknown state. For example, a table that stores the course marks of students can allow for NULL values. This could mean to the teacher that the student did not submit an assignment, or did not take an exam. It is different from a mark of zero, where a student did take the exam, but failed on all the questions. There are situations
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
118
when you don't want a NULL to be allowed. For example, if the country field is required for your application, ensure you prevent NULL values as follows:
create table myTable (name varchar(30), country varchar(20) NOT NULL)
The statement above indicates that NULL values are not allowed for the country column; however, duplicate values are accepted. 5.2.2.3 Constraints Constraints allow you to define rules for the data in your table. There are different types of constraints: A UNIQUE constraint prevents duplicate values in a table. This is implemented using unique indexes and is specified in the CREATE TABLE statement using the keyword UNIQUE. A NULL is part of the UNIQUE data values domain. A PRIMARY KEY constraint is similar to a UNIQUE constraint, however it excludes NULL as valid data. Primary keys always have an index associated with it. A REFERENTIAL constraint is used to support referential integrity which allows you to manage relationships between tables. This is discussed in more detail in the next section. A CHECK constraint ensures the values you enter into a column are within the rules specified in the constraint. The following example shows a table definition with several CHECK constraints and a PRIMARY KEY defined:
CREATE TABLE EMPLOYEE (ID INTEGER NOT NULL PRIMARY KEY, NAME VARCHAR(9), DEPT SMALLINT CHECK (DEPT BETWEEN 10 AND 100), JOB CHAR(5) CHECK (JOB IN ('Sales','Mgr','Clerk')), HIREDATE DATE, SALARY DECIMAL(7,2), CONSTRAINT YEARSAL CHECK ( YEAR(HIREDATE) > 1986 OR SALARY > 40500 ) )
For this table, four constrains should be satisfied before any data can be inserted into the table. These constraints are: PRIMARY KEY constraint on the column ID This means that no duplicate values or nulls can be inserted. CHECK constraint on DEPT column Only allows inserting data if the values are between 10 and 100. CHECK constraint on JOB column Only allows inserting data if the values are Sales, Mgr or Clerk'.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 119 CHECK constraint on the combination of the HIREDATE and SALARY columns Only allows to insert data if the hire date year is greater than 1986 and the SALARY is greater than 40500. 5.2.2.4 Referential integrity As discussed in Chapter 2, referential integrity establishes relationships between tables. Using a combination of primary keys and foreign keys, it can enforce the validity of your data. Referential integrity reduces application code complexity by eliminating the need to place data level referential validation at the application level. A table whose column values depend on the values of other tables is called dependant, or child table; and a table that is being referenced is called the base or parent table. Only tables that have columns defined as UNIQUE or PRIMARY KEY can be referenced in other tables as foreign keys for referential integrity. Referential integrity can be defined during table definition or after the table has been created as shown in the example below where three different syntaxes are illustrated:
Syntax 1: CREATE TABLE DEPENDANT_TABLE (ID INTEGER REFERENCES BASE_TABLE(UNIQUE_OR_PRIMARY_KEY), NAME VARCHAR(9), : : : );
Syntax 2: CREATE TABLE DEPENDANT_TABLE (ID INTEGER, NAME VARCHAR(9), : : :, CONSTRAINT constraint_name FOREIGN KEY (ID) REFERENCES BASE_TABLE(UNIQUE_OR_PRIMARY_KEY) );
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
ALTER TABLE DEPENDANT_TABLE ADD CONSTRAINT constraint_name FOREIGN KEY (ID) REFERENCES BASE_TABLE(UNIQUE_OR_PRIMARY_KEY);
120
In the above sample code, when the constraint name is not specified, the DB2 system will generate the name automatically. This generated string is 15 characters long, for example CC1288717696656. What happens when an application needs to delete a row from the base table but there are still references from dependant tables? As discussed in Chapter 2, there are different rules to handle deletes and updates and the behavior depends on the following constructs used when defining the tables: CASCADE As the name suggests, with the cascade option the operation is cascaded to all rows in the dependant tables that are referencing the row or value to be modified or deleted in the base table. SET NULL With this option all the referring cells in dependant tables are set to NULL NO ACTION With this option no action is performed as long as referential integrity is maintained before and after the statement execution. RESTRICT With this option, the update or delete of rows having references to dependant tables are not allowed to continue. The statement below shows where the delete and update rules are specified:
ALTER TABLE DEPENDANT_TABLE ADD CONSTRAINT constraint_name FOREIGN KEY column_name ON DELETE <delete_action_type> ON UPDATE <update_action_type> ;
A delete action type can be a CASCADE, SET NULL, NO ACTION, or RESTRICT. An update action type can be a NO ACTION, or RESTRICT.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
To create a table with the above schema, explicitly include it in the CREATE TABLE statement as follows:
create table mySchema.myTable (col1 integer)
When the schema is not specified, DB2 uses an implicit schema, which is typically the user ID used to connect to the database. You can also change the implicit schema for your current session with the SET CURRENT SCHEMA command as follows:
set current schema mySchema
Once the view is created, you can use it just like any table. For example, you can issue a simple SELECT statement as follows:
SELECT * FROM MYVIEW
Views allow you to hide data or limit access to a select number of columns; therefore, they can also be used for security purposes.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
122
Similarly, other modifications to the table like adding or dropping a column, defining or dropping a primary key, and so on, can be achieved using the appropriate alter table syntax. The ALTER statement can also be used with other database objects.
Where the object type can be for example, a table, table space, or index. Not all database objects can be renamed after they are created. To rename a column, the ALTER TABLE SQL statement should be used in conjunction with RENAME. For example:
ALTER TABLE <table name> RENAME COLUMN <column name> TO <new name>
The special character *, represents all the columns from the table. Using the * in a query is not recommended unless specifically required because you may be asking more information than what you really need. Typically, not all columns of a table are required; in which case, a selective list of columns should be specified. For example,
select col1, col2 from myTable
retrieves col1 and col2 for all rows of the table myTable where col1 and col2 are the names of the columns to retrieve data from. 5.3.1.1 Ordering the result set A SELECT statement returns its result set in no particular order. Issuing the same SELECT statement several times may return the same set of rows, but in different order. To guarantee the result set is displayed in the same order all the time, either in ascending or descending order of a column or set of columns, use the ORDER BY clause. For example this statement returns the result set based on the order of col1 in ascending order:
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
ASC stands for ascending, which is the default. Descending order can be specified using DESC as shown below:
SELECT col1 FROM myTable ORDER BY col1 DESC
5.3.1.2 Cursors A cursor is a result set holding the result of a SELECT statement. The syntax to declare, open, fetch, and close a cursor is shown below:
DECLARE <cursor name> CURSOR [WITH RETURN <return target>] <SELECT statement>; OPEN <cursor name>; FETCH <cursor name> INTO <variables>; CLOSE <cursor name>;
Rather than returning all the rows of an SQL statement to an application at once, a cursor allows the application to process rows one at a time. Using FETCH statements within a loop in the application, developers can navigate through each row pointed by the cursor and apply some logic to the row or based on the row contents. For example, the following code snippet sums all the salaries of employees using a cursor.
... DECLARE p_sum INTEGER; DECLARE p_sal INTEGER; DECLARE c CURSOR FOR SELECT SALARY FROM EMPLOYEE; DECLARE SQLSTATE CHAR(5) DEFAULT '00000'; SET p_sum = 0; OPEN c; FETCH FROM c INTO p_sal; WHILE(SQLSTATE = '00000') DO SET p_sum = p_sum + p_sal; FETCH FROM c INTO p_sal; END WHILE; CLOSE c; ...
Cursors are the most widely used method for fetching multiple rows from a table and processing them inside applications.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals In this first example, the statements insert one row at a time into the table myTable.
insert into myTable values (1); insert into myTable values (1, myName, 2010-01-01);
124
In this second example, the statements insert multiple (three) rows into the table myTable.
insert into myTable values (1),(2),(3); insert into myTable values (1, myName1,2010-01-01), (2, myName2,2010-02-01), (3, myName3,2010-03-01);
Finally, in this third example, the statement inserts all the rows of the sub-query select * from myTable2 into the table myTable.
insert into myTable (select * from myTable2)
Note that care should be taken when issuing a delete statement. If the WHERE clause is not used, the DELETE statement will delete all rows from the table.
Note that care should be taken when issuing an update statement without the WHERE clause. In such cases, all the rows in the table will be updated.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
5.4.1.2 Natural join A natural join is an improved version of an equi-join where the joining column does not require specification. The system automatically selects the column with same name in the tables and applies the equality operation on it. A natural join will remove all duplicate attributes. Below is an example.
SELECT * FROM STUDENT NATURAL JOIN ENROLLMENT
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
126
Natural joins bring more doubt and ambiguity than the ease it provides. For example, there can be problems when tables to be joined have more than one column with the same name, or when the tables do not have same name for the joining column. Most commercial databases do not support natural joins. 5.4.1.3 Cross join A cross join is simply a Cartesian product of the tables to be joined. For example:
SELECT * FROM STUDENT, ENROLLMENT
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 127 In the next sections, we describe each of these types in more detail. For a better understanding of each case, examples are provided using the tables shown in Figure 5.2.
Figure 5.2 - Input tables to use in outer-join examples 5.4.2.1 Left outer join In a left outer join, the result set is a union of the results of an equi-join, including any nonmatching rows from the LEFT table. For example, the following statement would return the rows shown in Figure 5.3.
SELECT * FROM STUDENT LEFT OUTER JOIN ENROLLMENT ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
Figure 5.3 - Output of a left outer join 5.4.2.2 Right outer join In a right outer join, the result set is the union of results of an equi-join, including any nonmatching rows from the RIGHT table. For example, the following statement would return the rows shown in Figure 5.4.
SELECT * FROM STUDENT RIGHT OUTER JOIN ENROLLMENT
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
128
Figure 5.4 - Output of a right outer join 5.4.2.3 Full outer join In a full outer join, the result set is the union of results of an equi- join, including any nonmatching rows of the LEFT and the RIGHT table. For example, the following statement would return the rows shown in Figure 5.5.
SELECT * FROM STUDENT FULL OUTER JOIN ENROLLMENT ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
Figure 5.5 - Output of a full outer join Different outer joins return different data sets; therefore, these should be used explicitly as per the business requirements. For example, if we need a list of students who have enrolled in any subject as well as those who have not yet enrolled, then probably what we need is a left outer join.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
5.5.1 Union
The Union operator can be used to join two data sets having the same column definitions and in the same order. The union operator removes any duplicate rows from the resulting data set. For example, the following statement returns the rows shown in Figure 5.6.
SELECT * FROM student_table_a UNION SELECT * FROM student_table_b
Figure 5.6 - Example of a Union operator In Figure 5.6, note that the union operator removed duplicate rows. There may be situations where duplicate removal is not required. In such a case, UNION ALL should be used instead of UNION, as follows:
SELECT * from student_table_a UNION ALL SELECT * from student_table_b
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
130
5.5.2 Intersection
The intersection operator INTERSECT returns a result set common to both data sets as shown in the following statement:
select * from student_table_a INTERSECT select * from student_table_b
Figure 5.8 shows the sample output for the above statement.
Figure 5.8 - Sample output for the INTERSECT operator The intersect operator will return all common data sets that exist in both tables A and B, however common data sets are listed only once, even if there are multiple duplicate rows in either table A or B. In order to return all data sets with duplicates in the result set, use the INTERSECT ALL operator. For example:
select * from student_table_a INTERSECT ALL select * from student_table_b
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Figure 5.9 - Sample output for the EXCEPT operator The EXCEPT operator will return a data set that exists in table A, but not in table B; however, common data sets are listed only once, even if there are multiple duplicate rows in table A. In order to return all data sets with duplicates in a result set, use the EXCEPT ALL operator. For example:
select * from student_table_a EXCEPT ALL select * from student_table_b
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
132
students and the courses that they are enrolled in. Each student can enroll in multiple courses and each course has multiple students enrolled in the course. To get a count of all students we can simply execute the following SQL statement:
select count(*) from students_enrollment
However, if we need a count of students that are enrolled for each course offered, then we need to order the table data by offered courses and then count the corresponding list of students. This can be achieved by using the GROUP BY operator as follows:
select course_enrolled, count(*) from students_enrollment group by course_enrolled ---------------Resultset---------------COURSE_ENROLLED STUDENT_COUNT ------------------------- ------------English 10 Mathematics 30 Physics 60
Grouping can also be performed over multiple columns together. In that case, the order of grouping is done from the leftmost column specified to the right.
To filter out scalar data sets, a WHERE clause can be used; however, it cannot be used for the grouped data set.
5.7 Sub-queries
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 133 When a query is applied within a query, the outer query is referred to as the main query or parent query and the internal query is referred as the sub-query or inner query. This sub query may return a scalar value, single or multiple tuples, or a NULL data set. Sub-queries are executed first, and then the parent query is executed utilizing data returned by the subqueries.
The above query returns a list of students who are the youngest among all students. The sub-query SELECT min(age) FROM students returns a scalar value that indicates the minimum age among all students. The parent query returns a list of all students whose age is equal to the value returned by the sub-query.
Here the sub-query returns a list of all courses that are offered in the Computer Science department and the outer query lists all students enrolled in the courses of the sub-query result set. Note that there may be multiple ways to retrieve the same result set. The examples provided in this chapter demonstrate various methods of usage, not the most optimal SQL statement.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
SELECT dept, name, marks FROM final_result a WHERE marks = ( SELECT max(marks) FROM final_result WHERE dept = a.dept )
134
The above statement searches for a list of students with their departments, who have been awarded maximum marks in each department. For each row on the LEFT table, the subquery finds max(marks) for the department of the current row and if the values of marks in the current row is equal to the sub-query result set, then it is added to the outer query result set.
The above query uses a sub-query in the FROM clause. The sub-query returns maximum, minimum and average marks for each department. The outer query uses this data and filters the data further by adding filter conditions in the WHERE clause of the outer query.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 135 Object-relational mapping (ORM) libraries such as Hibernate are popular to provide a framework for this mapping between the object-oriented world and the relational world. pureQuery, a new technology from IBM provides further support and performance improvements in this area. For more information about pureQuery refer to the free eBook Getting started with pureQuery which is part of this book series.
Relation variable, R Relation Tuple Attribute, A1, A2, etc. A pair of primary key foreign key Primary key
Unique identifier
Primary key
The transformation from the logical model to the physical model is straightforward. From the logical model you have all the relations and associations you need to create the Library Management System database. All you have to do now is specify the sub domain (data type) for every attribute domain you encounter within every table, and the corresponding constraint. Every constraint name has its own prefix. We suggest the following prefixes for constraint names: PRIMARY KEY: pk_ UNIQUE: uq_ DEFAULT: df_ CHECK: ck_ FOREIGN KEY: fk_ Let's take a look at each relation again adding the sub domain and constraint:
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals BORROWER relation Attribute name BORROWER_ID FIRST_NAME LAST_NAME EMAIL PHONE ADDRESS CITY COUNTRY Domain Text Text Text Text Text Text Text Text Sub-domain CHAR(5) VARCHAR(30) VARCHAR(30) VARCHAR(40) VARCHAR(15) VARCHAR(75) CHAR(3) DATE Optional No No No Yes Yes Yes No No Constraints Pk_
136
AUTHOR relation Attribute name AUTHOR_ID FIRST_NAME LAST_NAME EMAIL PHONE ADDRESS CITY COUNTRY Domain Text Text Text Text Text Text Text Text Sub-domain CHAR(5) VARCHAR(30) VARCHAR(30) VARCHAR(40) VARCHAR(15) VARCHAR(75) VARCHAR(40) VARCHAR(40) Optional No No No Yes Yes Yes Yes Yes Constraints Pk_
BOOK relation Attribute name BOOK_ID Domain Text Sub-domain CHAR(5) Optional No Constraints Pk_
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 137 TITLE EDITION YEAR PRICE ISBN PAGES AISLE DECRIPTION Text Numeric Numeric Numeric Text Numeric Text Text VARCHAR(40) INTEGER INTEGER DECIMAL(7,2) VARCHAR(20) INTEGER VARCHAR(10) VARCHAR(100) No Yes Yes Yes Yes Yes Yes Yes
LOAN relation Attribute name BORROWER_ID COPY_ID LOAN_DATE RETURN_DATE Domain Text Text Text Text Sub-domain CHAR(5) VARCHAR(30) DATE DATE Optional No No No No Constraints Pk_, fk_ Pk_, fk_ < RETURN_DATE
COPY relation Attribute name COPY_ID BOOK_ID STATUS Domain Text Text Text Sub-domain CHAR(5) VARCHAR(30) VARCHAR(30) Optional No No No Constraints Pk_ Fk_
AUTHOR_LIST relation Attribute name AUTHOR_ID Domain Text Sub-domain CHAR(5) Optional No Constraints Pk_, fk_
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals BOOK_ID ROLE Text Text VARCHAR(30) VARCHAR(30) No No Pk_, fk_
138
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
InfoSphere Data Architect can automatically transform the logical model into a physical model, and also generate the DDL for you.
5.9 Summary
In this chapter we provided a high-level overview of SQL and some of its features. In addition to the ISO/ANSI SQL standard requirements, various vendors implement additional features and functionalities. These features leverage internal product design and architecture and therefore provide enhanced performance in comparison to standard SQL functions. One example of such a feature is the indexing mechanism in databases. Basic index behavior is the same in all databases, however all vendors provide additional features on top of the default ones to enhance data read/write via their proprietary algorithms. For detailed information and an exhaustive list of SQL commands, keywords and statement syntax, please refer to the SQL Reference Guide [5.3].
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Database Fundamentals
140
5.10 Exercises
1. Create a table with columns of type integer, date, and char having default values. 2. Insert 10 rows in this table using one INSERT SQL statement. 3. Write an SQL statement with a sub-query with INNER JOIN. 4. Write an SQL statement with correlated sub-query with GROUP BY clause. 5. Write an SQL statement with aggregate functions and WHERE, HAVING clauses. 6. Write an SQL statement with ORDER BY on multiple columns.
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
Chapter 5 Introduction to SQL 141 D. All the above E. None of the above 5. Which of the following functions are specialized for date and time manipulations? A. Year B. Dayname C. Second D. All the above E. None of the above 6. What is the default sorting mode in SQL? A. Ascending B. Descending C. Randomly selected order D. None of the above E. All of the above 7. An INSERT statement can not be used to insert multiple rows in single statement? A. True B. False 8. Which of the following are valid types of inner join? A. Equi-join B. Natural join C. Cross join D. All the above E. None of the above 9. Which of the following are valid types of outer join? A. Left outer join B. Right outer join C. Full outer join D. All the above E. None of the above 10. The Union operator keeps duplicate values in the result set. A. True
JJM/IT/IT-Portal/2011/DBS JJM/IT/IT-Portal/2011/DBS
142