Geoff Noel
Geoff Noel
OH MY!!!
Geoff Noel
You may ask yourself –
Well, How did I get
here….
David Byrne – Once in a lifetime
Databases Overview
Databases come in all different shapes and sizes. They can be flat files of ASCII
data (like Access or Q&A) or complex binary tree structures (Oracle or Sybase).
In any form, a database is a data store, or a place that holds data.
That is the job of the database management system, or DBMS. Some DBMSs are
relational. Those are RDBMS. The relational part refers to the fact that separate
collections of data within the reaches of the RDBMS can be looked at together in
unison. The RDBMS is responsible for ensuring the integrity of the database.
Sometimes, things will get out of whack and the RDBMS will keep all that data in
line.
What Is a Database?
What is a database and database management system.
A database is a collection of related data.
A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database.
The evolution of relational data storage began in 1970 with the work of Dr.
E. F. Codd, who proposed a set of 12 rules for identifying relationships
between pieces of data. Codd's rules formed the basis for the development
of systems to manage data. Today, Relational Database Management
Systems (RDBMS) are the result of Codd's vision.
Two-tier model appeared with the advent of server technology. Communication-protocol development and
extensive use of local and wide area networks allowed the database developer to create an application front
end that accessed data through a connection ( socket) to the back-end server. A two-tier database design,
where the client software is connected to the database through a socket connection.
Client programs (applying a user interface) send SQL requests to the database server. The server returns
the appropriate results, and the client is responsible for the formatting and display of the data. Clients still
use a vendor-provided library of functions that manage the communication between client and server. Most
of these libraries are written in either the C language or Perl.
Commercial database vendors realized the potential for adding intelligence to the database server. They
created proprietary techniques that allowed the database designer to develop macro programs for simple
data manipulation. These macros, called stored procedures, can cause problems relating to version control
and maintenance. Because a stored procedure is an executable program living on the database, it is
possible for the stored procedure to attempt to access named columns of a database table after the table
has been changed.
For example, if a column with the name id is changed to cust_id, the meaning of the original stored
procedure is lost. The advent of triggers, which are stored procedures executed automatically when some
action (such as insert) happens with a particular table or tables, can compound these difficulties when the
data returned from a query are not expected. Again, this can be the result of the trigger reading a table
column that has been altered.
Limitations of Two-Tier Database Design
All of the intelligence associated with using and manipulating the data is
implemented in the client application, creating large client-side runtimes. This
drives up the cost of each client set.
Three-Tier Database Design
In a multi-tier design, the client communicates with an intermediate server that provides a
Layer of abstraction from the RDBMS. The intermediate layer is designed to handle multiple
Client requests and manage the connection to one or more database servers. There does not
have to be just three tiers, but conceptually this is the next step.
Oracle
7.x
8.0.x
8.1.x ( aka 8i)
9.2
10G
Other Databases
DB2/UDB
DB2/UDB
NCR
NCR Teradata
Teradata
Ingres
Ingres
Informix
Informix
Sybase
Sybase
MySQL
MySQL
Gupta/Centura
Gupta/Centura -SQLbase
-SQLbase
DBase
DBase
Paradox
Paradox .. .. .. Many
Many others
others
Various Methods of Connection
ODBC
JDBC
Native
OLE/ADO
BCP
SQL-NET
SQL-LOADER
HPL
Tools
city, state, high, and low are the San Diego California 77 60
columns. The rows contain the
data for this table:
87 Wendy Jones -
954 fearful 10
In a one-to-many relationship, the
primary key has the "one" value, and the
foreign key has the "many" values. The 979 jealous 30
trick to remembering this is to keep in
mind that the primary key must be 991 furious 20
unique.
It Ain’t nothing – it’s NULL
Null means either "don't know" or "not
applicable" -- it's not really the same as zero or
any other default value, but more importantly,
null is treated quite differently from other Student TestResult
values in SQL, because it literally has no
value. Joe 87
Bill 73
Here's an example of a "don't know" null ->
Mary 56
As you can see, Fred's value is null, which you
could interpret as meaning that Fred didn't take Fred null
the test (maybe he has a medical exemption
and will take the test another day). It would be Sam 92
wrong to assign a zero, because that would be
interpreted as Fred having taken the test and
not getting a single answer right!
Now consider the following query –
Aggregate functions like AVG() and SUM() ignore nulls, so this query will return (87+73+56+92)/4=77,
which is certainly better than 87+73+56+0+92)/5=61.6 which you'd get using a zero default. Often a default
value is just wrong for a column where you expect to take aggregates.
An example of a column that would take a "not applicable" null is Date Terminated in a human resources
database, where the value would be null for all active employees. To test for nulls, you can filter them out in
the WHERE clause –
which would give results only for terminated employees. If you didn't have the WHERE clause, the above
query would return null for every active employee, because any expression involving a null yields a null
result.
Alternatively, you can use the COALESCE function to supply a non-null value –
returns today's date and therefore provides an accurate measure for the length of service of active
employees. So for terminated employees, DateTerminated is not null, and the calculation is the same as
above, while for active employees, DateTerminated is null so COALESCE uses today's date instead.
What SQL?
SQL or SEQUEL ?
In any case, SQL is a database query language that was adopted as an industry standard
in 1986. It has undergone two important revisions, SQL2 (also called SQL-92), and SQL3
(also called SQL-99).
Selecting Data
The select statement is used to query the
database and retrieve selected data that
= Equal
match the criteria that you specify. Here is
the format of a simple select statement:
> Greater than
select "column1" [,"column2",etc] from
"tablename" [where "condition"]; < Less than
The column names that follow the select <= Less than or equal
keyword determine which columns will be
<> Not equal to
returned in the results. You can select as
many column names that you'd like, or you LIKE *See next page – Special Operator
can use a "*" to select all columns.
The table name that follows the keyword
from specifies the table that will be
queried to retrieve the desired results.
The where clause (optional) specifies
which data values or rows will be returned
or displayed, based on the criteria
described after the keyword where.
Conditional selections used in the where
clause:
Sample Table: empinfo
select first, last, city from empinfo; select last, first last id
ag
city state
e
city, age from empinfo where age > 30;
9998
John Jones 45 Payson Arizona
0
select first, last, city, state from empinfo where Mary Jones
9998
25 Payson Arizona
2
first LIKE 'J%';
Edward 8823 Californi
Eric 32 San Diego
s 2 a
select first, last, city from empinfo; Mary Jones 99982 25 Payson Arizona
Mary Jones 99982 25 Payson Arizona John Jones 99980 45 Payson Arizona
Mary Ann Edwards 88233 32 Phoenix Arizona Mary Jones 99982 25 Payson Arizona
Sebastian Smith 92001 23 Gila Bend Arizona Eric Edwards 88232 32 San Diego California
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona Mary Ann Edwards 88233 32 Phoenix Arizona
Erica Williams 32327 60 Show Low Arizona
John Jones
Sebastian Smith 92001 23 Gila Bend Arizona
Mary Jones
Gus Gray 22322 35 Bagdad Arizona
Eric Edwards
first last id age city state Eric Edwards 88232 32 San Diego California
Eric Edwards 88232 32 San Diego California Mary Ann Edwards 88233 32 Phoenix Arizona
This will only select rows where the first name equals
Elroy Cleaver 32382 22 Globe Arizona 'Eric' exactly.
The JOIN concept
JOIN is a query clause that can be used with the SELECT, UPDATE, and DELETE
data query statements to simultaneously affect rows from multiple tables. There
are several distinct types of JOIN statements that return different data result sets.
Joined tables must each include at least one field in both tables that contain
comparable data. For example, if you want to join a Customer table and a
Transaction table, they both must contain a common element, such as
CustomerID column, to serve as a key on which the data can be matched. Tables
can be joined on multiple columns so long as the columns have the potential to
supply matching information. Column names across tables don't have to be the
same, although for readability this standard is generally preferred.
When you do use like column names in multiple tables, you must use fully
qualified column names. This is a “dot” notation that combines the names of
tables and columns. For example, if I have two tables, Customer and Transaction,
and they both contain the column CustomerID, I’d use the dot notation, as in
Customer.CustomerID and Transaction.CustomerID, to let the database know
which column from which table I’m referring.
Now that we’ve examined the basic theory, let’s take a look at the various types of
joins and examples of each.
The basic JOIN statement
A basic JOIN statement has the following format:
In practice, you'd never use the example above because the type of join is not specified. In this case,
SQL Server assumes an INNER JOIN. You can get the equivalent to this query by using the
statement:
Another addition to your SQL toolbox Although the JOIN statement is often perceived as a
complicated concept, you will see that it’s a powerful timesaving resource that’s relatively easy to
understand. Use this functionality to get related information from multiple tables with a single query
and to skillfully reference normalized data. Once you’ve mastered JOINs, you can elegantly maneuver
within even the most complex database.
Inner join
In relational databases, a join operation matches
records in two tables. The two tables must be joined
by at least one common field. That is, the join field is
a member of both tables. Typically, a join operation
is part of a SELECT query.
The CROSS JOIN has earned a bad reputation because it’s very resource
intensive and returns results of questionable usefulness. When you use the CROSS JOIN,
you're given a result set containing every possible combination of the rows returned from
each table. Take the following example:
With the CROSS JOIN, you aren’t actually free to limit the results, but you can use the ORDER BY
clause to control the way they are returned. If the tables joined in this example contained only five
rows each, you would get 25 rows of results. Every CustomerName would be listed as associated with
every TransDate and TransAmt.
I really did try to come up with examples where this function was useful, and they were all very
contrived. However, I’m sure someone out there is generating lists of all their products in all possible
colors or something similar, or we wouldn’t have this wonderful but dangerous feature.
The INNER JOIN drops rows
When you perform an INNER JOIN, only rows that match up are returned. Any
time a row from either table doesn’t have corresponding values from the other
table, it is disregarded. Because stray rows aren’t included, you don’t have any of
the “left” and “right” nonsense to deal with and the order in which you present
tables matters only if you have more than two to compare. Since this is a simple
concept, here’s a simple example:
If a row in the Transaction table contains a CustomerID that’s not listed in the
Customer table, that row will not be returned as part of the result set. Likewise, if
the Customer table has a CustomerID with no corresponding rows in the Transaction
table, the
row from the Customer table won’t be returned.
The OUTER JOIN can include mismatched rows
OUTER JOINs, sometimes called “complex joins,” aren’t actually complicated. They are so-called
because SQL Server performs two functions for each OUTER JOIN.
The first function performed is an INNER JOIN. The second function includes the rows that the INNER
JOIN would have dropped. Which rows are included depends on the type of OUTER JOIN that is used
and the order the tables were presented.
There are three types of an OUTER JOIN: LEFT, RIGHT, and FULL. As you’ve probably guessed, the
LEFT OUTER JOIN keeps the stray rows from the “left” table (the one listed first in your query
statement). In the result set, columns from the other table that have no corresponding data are filled
with NULL values. Similarly, the RIGHT OUTER JOIN keeps stray rows from the right table, filling
columns from the left table with NULL values. The FULL OUTER JOIN keeps all stray rows as part of
the result set. Here is your example:
SELECT CustomerName, TransDate, TransAmt FROM Customer LEFT OUTER JOIN Transaction ON
Customer.CustomerID = Transaction.CustomerID;
Customer names that have no associated transactions will still be displayed. However, transactions
with no corresponding customers will not, because we used a LEFT OUTER JOIN and the Customer
table was listed first.
The clauses LEFT JOIN, RIGHT JOIN, and FULL JOIN are equivalent to LEFT OUTER JOIN, RIGHT OUTER
JOIN, and FULL OUTER JOIN, respectively.
SQL Subquery
and we want to use a subquery to find the sales of all stores in the West region.
To do so, we use the following SQL statement:
In this example, instead of joining the two tables directly and then adding up only the
sales amount for stores in the West region, we first use the subquery to find out which
stores are in the West region, and then we sum up the sales amount for these stores.
select 'ALTER '||OBJECT_TYPE||‘ '||OWNER||'.'||OBJECT_NAME||' compile;' from
dba_objects where status='INVALID';
'ALTER'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||'COMPILE;'
ALTER PROCEDURE MAXDATA.P_REPLACE_PROD compile;
ALTER PROCEDURE MAXDATA.P_SETPROTOTYPE compile;
ALTER PROCEDURE MAXDATA.UPDATE_LV10MAST compile;
ALTER PROCEDURE MAXAPP.P_COMPPROC compile;
Microsoft SQL Server Reporting Services
Originally not to be available until the Yukon release of SQL Server, Microsoft decided to release
Reporting Services early because of the customer excitement they heard. Why the excitement?
Reporting Services fills a need that many organizations are faced with—the need to build
business intelligence and reporting solutions. Until now, developers were required to embed
reports into their applications, or organizations were required to purchase expensive and
sometimes problematic third-party reporting solutions. Now, Reporting Services offer a complete
solution for distributing reports across the enterprise; enabling businesses to make decisions
better and faster.
Overview of Reporting Services
Reporting Services is a scalable, secure, robust reporting solution for SQL Server. It supports the
complete reporting lifecycle by including tools for report creation, execution, distribution, and
management. New users can have Reporting Services installed and new reports published within
a matter of hours instead of days or weeks.
Reporting Services consists of the following key components:
Report Designer: Supports the report creation phase of the report lifecycle. It is an add-on tool for
any edition of Visual Studio .NET 2003, suitable for both programmers and non-programmers.
Report Server: Provides services for execution and distribution of reports.
Report Manager: A Web-based administration tool for managing the Report Server.
Report Designer
Report Designer is a Visual Studio .NET 2003 add-on and is included with Reporting Services
(see Figure A). As the name implies, it provides developers and non-developers an intuitive tool
to create sophisticated reports. Users get standard reporting functionality such as grouping,
sorting, and report formatting. This should be sufficient for most reporting needs. For more
advanced reports, the Report Designer has full VB.NET support. Plus, designers can add ActiveX
controls to their reports to create rich, live, interactive reports.
One of the more compelling features of Report Designer is the ability to have dynamic, query-based
parameters. This eliminates the administrator having to maintain parameter lists for all the reports (i.e.,
department names, office locations, employee names, etc.). You simply have to create a new dataset
and tie the results to the parameter. It even allows cascading parameters.
At the heart of Reporting Services architecture is the Report Definition Language (RDL), which is an
XML-based standard for defining reports. RDL is key to the Reporting Services success by allowing
third parties to publish reports to the Reporting Server. There are already product offerings from
independent software vendors (ISVs) today.
Though Reporting Services requires SQL Server as its repository, Report Designer can connect to all
types of data sources including OLE DB, ODBC, Oracle, SQL Server, and others. It also has many
rendering options such as HTML, Microsoft Excel, PDF, CSV, XML, and others. The list can also be
extended by third parties or by using the Reporting Services extension library.
Report Server
The Report Server provides the repository, management, execution, and delivery functions. It is
scalable and secure, and can support the most demanding reporting needs. It consists of several
subcomponents, including:
Request Handler: Handles all inbound server requests and routes them to the appropriate component.
Scheduling and Delivery Processor: Provides the scheduling and delivering functionality, and it can
be extended to deliver reports to other devices such as fax machines or printers.
Report Processor: Provides the execution functionality and it, too, can be extended to render to new
output formats such as a Microsoft Word document.
Report Server Database: All the data required by Reporting Services is stored in the Report Server
Database, which must be a SQL Server. This includes everything from server settings to report
definitions, even to cached data from a report execution.
Report Manager
Report Manager provides administrators an easy-to-use tool for configuring the server and managing
the reports. Using Report Manager, administrators can configure security, change server settings,
schedule reports for execution, and maintain the structure of the report folders (see Figure B).
Administrators have a variety of options for executing reports, including on-demand, cached reporting
with an adjustable expiration period, and flexible report scheduling. All of these options are
configurable at the report level through Report Manager.
Report Manager supports both push and pull distribution options. For e-mail delivery, Report Manager
can include a link to the report, attach the report to the message, or embed reports directly into the
message using Web archive. This eliminates one extra step for the reader of the report.
Here are a few typical scenarios where Reporting Services will be invaluable:
Existing applications:
Because of the complexity and time required to embed reports into applications, reports are often
created outside of the application using third-party tools and distributed manually or through batch
jobs. By using Reporting Services, these applications can easily be extended to include a complete
reporting solution, embedded within the application.
Executive dashboard
Executive dashboard is the buzzword for providing executives a comprehensive view of their
business,
commonly in an Enterprise Portal. Reporting Services includes several key features for an executive
dashboard, such as My Reports, My Subscriptions, push/pull delivery, numerous rendering options,
and support for Web services.
Though very powerful, there are a couple of scenarios that would not be suitable for Reporting
Services: