0% found this document useful (0 votes)
291 views75 pages

Recursive SQL

The document describes a session on recursive SQL and its power and capabilities. Recursive SQL allows achieving things with a single SQL statement that previously required procedural logic. The session will cover recursion basics, mathematical and business case studies, performance aspects, and pitfalls and recommendations for recursive SQL. The speaker is Suresh Sane from DST Systems and has extensive experience with DB2 for z/OS.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
291 views75 pages

Recursive SQL

The document describes a session on recursive SQL and its power and capabilities. Recursive SQL allows achieving things with a single SQL statement that previously required procedural logic. The session will cover recursion basics, mathematical and business case studies, performance aspects, and pitfalls and recommendations for recursive SQL. The speaker is Suresh Sane from DST Systems and has extensive experience with DB2 for z/OS.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 75

Session: G13

Recursive SQL –
Unleash the Power!

Suresh Sane
DST Systems Inc.

May 22, 2008 • 08:00 a.m. – 09:00 a.m.


Platform: DB2 for z/OS

Recursive SQL is one of the most fascinating and powerful (and dangerous!)
features offered in DB2 for z/OS Version 8. In this session, we will introduce
the feature and show numerous examples of how it can be used to achieve
things you would not have imagined being possible with SQL – all in one SQL
statement! Fasten your seat belts and come join us in this exciting journey!

1
Session Outline

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

Recursion basics
•Theory and introduction
Case studies - mathematical
•String of numbers
•Factorial
•Primes
•Fibonacci Series
Case studies - business
•Org chart
•Generating test data
•Missing data
•Rollup
•Allocation
•Weighted allocation
•RI children
•Cheapest fare
•Account linking
Performance aspects
•Comparison to procedural logic
•Org chart
•Rollup
•Allocate
Pitfalls and recommendations
•Best practices

2
About the Instructor

Suresh Sane
♦ Co-author-IBM Redbooks
¾ SG24-6418, May 2002
¾ SG24-7083, March 2004
¾ SG24-7111, July 2006
♦ Educational seminars and presentations at IDUG North
America, Asia Pacific, Canada and Europe
♦ IDUG Solutions Journal article – Winter 2000
♦ Numerous DB2 courses at various locations
♦ IBM Certified Solutions Expert for both platforms for
Application Development and Database Administration

Suresh Sane works as a Database Architect at DST Systems in Kansas City,


MO and manages the Database Consulting Group, which provides strategic
direction for deployment of database technology at DST and has overall
responsibility for the DB2 curriculum for about 1,500 technical associates.

Contact Information:

sssane@dstsystems.com

Suresh Sane
DST Systems, Inc.
1055 Broadway
Kansas City, MO 64105
USA

(816) 435-3803

3
About DST Systems

http://www.dstsystems.com
♦Leading provider of computer software solutions and
services, NYSE listed – “DST”
♦Revenue $2.24 billion
♦110 million+ shareowner accounts

♦24,000 MIPS
♦150 TB DASD
♦145,000 workstations
♦462,000 DB2 objects
♦Non-mainframe: 600 servers (DB2, Oracle, Sybase) with
2.3 million objects

If you have ever invested in a mutual fund, have had a prescription filled, or
are a cable or satellite television subscriber, you may have already had
dealings with our company.

DST Systems, Inc. is a publicly traded company (NYSE: DST) with


headquarters in Kansas City, MO. Founded in 1969, it employs about 12,000
associates domestically and internationally.

The three operating segments - Financial Services, Output Solutions and


Customer Management - are further enhanced by DST’s advanced technology
and e-commerce solutions.

4
It’s cool but I will never use it…

♦ Of pure academic interest …”cool stuff” but


♦ … does it have any business value?
♦ THINK AGAIN!

“…400,000 customers in a hierarchy of about 4,000 nodes


resulting in over 2.5 million queries, the application took days
to calculate total sales.”
“ …a single recursive query processed …in about 5 minutes”

Dan Luksetich, IDUG Solutions Journal, May 2004.

See ref #7 for details on this article. I have to admit my first reaction was that
this feature was “cool” but of little business value. Dan’s article provoked my
interest in this fascinating area of SQL.

5
Where Are We?

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

We will begin with a simple explanation which introduces the concept of


recursion in general (not just for SQL).

6
Recursion Simplified

What is 4! ? 24

It is 4 * 3! 4*6

But what is 3! ? 6

It is 3 * 2! 3*2
But what is 2! ? 2

It is 2 * 1! 2*1

But what is 1! ? 1

A simple example to illustrate the concept.

7
Case 1 – String of Numbers

CTE WITH NUMBERS (LEVEL, NEXTONE) AS 1


( 2
Prime SELECT 1, 1 3
FROM SYSIBM.SYSDUMMY1 4
UNION ALL 5
SELECT LEVEL + 1, LEVEL + 1 …
Pump
FROM NUMBERS
more
WHERE LEVEL < 100 What makes
) this recursive
Use SELECT NEXTONE
the FROM NUMBERS
result ORDER BY NEXTONE

Coloring scheme used throughout this


presentation to denote each of the 4 parts 8

The best way to introduce recursive SQL? Look at a simple example. Let’s
dive right in.

The definition – WITH NUMBERS – is an example of a Common Table


Expression (CTE) which is similar to the Nested Table Expression (NTE) most
of us already familiar with. It is required for using recursive SQL.

Also note the coloring scheme used consistently throughout this presentation
to denote the 4 parts of the SQL

8
Recursion Basics

♦ A technique where a function calls itself to perform some part of


the task
♦ Requires:
¾ An initialization select (“priming the pump”)
¾ An iterative select (“pump more”) – from the previous, how do I get the
next one?
¾ A main select (“use the result”)
♦ Must use a common table expression (CTE)
♦ Allows for looping “procedural” logic – all in one statement!
¾ DO WHILE, REPEAT etc from SQL Procedures can sometimes be
replaced by one SQL statement

Some of the basics. The “pump” terminology is borrowed from Tink Tysor
(see ref #1). Tink provides very thorough introduction to this fascinating
topic.

9
Rules for CTE

♦ First full select of first UNION must not reference the CTE
♦ All selects within CTE cannot use DISTINCT
¾ This is a major limitation when cycles are present –
DISTINCT on the outer query is expensive
¾ Need an ability to specify DISTINCT without the level
♦ All selects within CTE cannot use GROUP BY or HAVING
♦ Include only 1 reference to the CTE
♦ Initialization select and Iterative select columns must match (data
types, lengths, CCSIDs)
♦ UNION must be a UNION ALL
♦ Outer joins cannot be part of any recursion cycle
♦ Subquery cannot be part of any recursion cycle

10

Some of the restrictions on what can be coded within the common table
expression (CTE).

10
Where Are We?

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

11

Let us look at some mathematical applications starting with some simple


examples. We will look at Factorial, Generating primes and Fibonacci series.

If mathematics scares you, do not get discouraged – these are actually easier to
illustrate the concept. We will build on this foundation and cover several
business cases in the next section.

11
Case 2 – Factorial

1 1
2 2
WITH NUMBERS (LEVEL, FACTO) AS 3 6
( 4 24
SELECT 1, 1 5 120
FROM SYSIBM.SYSDUMMY1 …
UNION ALL
SELECT LEVEL + 1, FACTO * (LEVEL + 1)
FROM NUMBERS
WHERE LEVEL < 12
)
SELECT LEVEL AS NUMBER, FACTO AS FACTORIAL
FROM NUMBERS
ORDER BY LEVEL

12

Just a little bit more complex – we simply generate the numbers and report
them in the “use the results” section of the SQL.

12
Case 3 – Generating Primes - SQL

WITH NUMBERS (LEVEL, PRIME) AS


( SELECT 1, 1
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1, LEVEL + 1
FROM NUMBERS
WHERE LEVEL < 5000 )

SQL Continued on next slide


13

SQL to generate prime numbers.

13
Case 3 – Generating Primes – SQL (cont)
Prime
SELECT X.PRIME
FROM 1
(SELECT NUMBERS.LEVEL, 2
NUMBERS.PRIME 3
FROM NUMBERS ) AS X 5
WHERE NOT EXISTS 7
(SELECT 1 11
FROM NUMBERS Y
Non-prime
WHERE Y.PRIME BETWEEN 2 AND
SQRT(X.PRIME) 4
AND MOD(X.PRIME, Y.PRIME) = 0) 6
ORDER BY X.PRIME 8
9
Has a factor 10
14

Actual SQL to generate prime numbers, continued.

Note that we can stop checking for factors once we cross SQRT of that number
(a big gain in performance – for example, instead of checking 5000, we can
stop at 70).

On a theoretical note – the answer to the question “Is 1 a prime?” depends on


the definition. Let’s leave that discussion to Math forums. Here, I assume it is
a prime.

14
Case 4 – Fibonacci Series

♦ The first two numbers in the series are one and one.
♦ To obtain each number of the series, you simply add the two
numbers that came before it. In other words, each number of the
series is the sum of the two numbers preceding it.

The Fibonacci Spiral


The Parthenon
with the 1 1 2 3 5 8 13 21
“Golden ratio” 15

Some historical background information about Fibonacci Series (made


popular by the recent “Da Vinci Code”):

•A sequence of numbers first created by Leonardo Fibonacci in 1202.


•It is a deceptively simple series, but its ramifications and applications are
nearly limitless.
•The number 1.618..., or Phi, is the ratio of the next number to the previous
number. Called the “Golden ratio”, it has implications in art, architecture and
numerous other disciplines.

15
Case 4 – Fibonacci Series - SQL

WITH NUMBERS (LEVEL, NEXTNUM, TOTAL) AS


(SELECT 1, 1, 1
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1 Level Nextnum Total
, (NEXTNUM + TOTAL)
, (NEXTNUM + TOTAL + TOTAL) 1 1 1
FROM NUMBERS 2 2 3
WHERE LEVEL = LEVEL 3 5 8
AND LEVEL <= 22 ) 4 13 21
5 34 55

SQL Continued on next slide


16

SQL to generate Fibonacci series.

16
Case 4 – Fibonacci Series – SQL (cont)

SELECT A.NEXTNUM Series


FROM NUMBERS A 1
WHERE A.LEVEL <= 22 1
UNION ALL 2
SELECT B.TOTAL 3
FROM NUMBERS B 5
Level Nextnum Total
WHERE B.LEVEL <= 22 8
ORDER BY 1 1 1 1 13
2 2 3 21
3 5 8 34
4 13 21 55
5 34 55 ..

17

Actual SQL to generate Fibonacci series, continued.

17
Where Are We?

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

18

Now let us start to explore recursive SQL for business applications.

18
Case studies - Business

♦ Case 5 – Org chart


♦ Case 6 – Generating test data
♦ Case 7 – Missing data
♦ Case 8 – Rollup
♦ Case 9 – Even allocation
♦ Case 10 – Weighted allocation
♦ Case 11– RI children
♦ Case 12 – Cheapest fare
♦ Case 13 – Account linking

19

A list of case studies we explore in this section.

19
Case 5 - Org Chart

1-Wolf

2-Sontag 3-Truscott 4-Rodwell

5-Hamman 6-Jacoby 7-Wei 8-Kelsey 9-Reese

10-Schapiro

20

Case of how recursive SQL can be used effectively for traversing hierarchical
structures.

20
Case 5 – RECASE05 Table

RECASE05 EMPID EMPNAME MGRID


1 Wolf -
2 Sontag 1
3 Truscott 1
4 Rodwell 1
5 Hamman 2
6 Jacoby 2
7 Wei 3
8 Kelsey 4
9 Reese 4
10 Schapiro 9
21

Contents of the table RECASE05 that defines the hierarchy.

21
Case 5 - SQL
WITH OC (LEVEL, MGRID, MGRNAME, EMPID,
EMPNAME) AS
( SELECT 0, 0, ' ', EMPID, EMPNAME
FROM RECASE05
WHERE MGRID IS NULL
UNION ALL
SELECT BOSS.LEVEL + 1, SUB.MGRID,
BOSS.EMPNAME
, SUB.EMPID, SUB.EMPNAME
FROM OC BOSS, RECASE05 SUB
WHERE BOSS.EMPID = SUB.MGRID My direct reports
AND BOSS.LEVEL < 5 )
SELECT OC.LEVEL, OC.MGRNAME, OC.EMPNAME
FROM OC
WHERE LEVEL > 0
ORDER BY OC.LEVEL , OC.MGRID, OC.EMPID 22

Actual SQL to generate the org chart.

The initialization select (WHERE MGRID IS NULL) could have been written
instead as: (WHERE EMPID = 1) .

22
Case 5 - Intermediate Result

Common Table Expression


OC
LEVEL MGRID MGRNAME EMPID EMPNAME
0 0 NULL 1 WOLF
1 1 WOLF 2 SONTAG
1 1 WOLF 3 TRUSCOTT
1 1 WOLF 4 RODWELL
2 2 SONTAG 5 HAMMAN
2 2 SONTAG 6 JACOBY
2 3 TRUSCOTT 7 WEI
2 4 RODWELL 8 KELSEY
2 4 RODWELL 9 REESE
3 9 REESE 10 SCHAPIRO

23

Contents of the common table expression OC as the SQL is executed.

23
Case 5 - Result

LEVEL MGRNAME EMPNAME


1 WOLF SONTAG
1 WOLF TRUSCOTT
1 WOLF RODWELL
2 SONTAG HAMMAN
2 SONTAG JACOBY
2 TRUSCOTT WEI
2 RODWELL KELSEY
2 RODWELL REESE
3 REESE SCHAPIRO

24

… and the result.

24
Case 6 – Generating Test Data

RECASE06

EMPID SMALLINT 1 thru 10000


FNAME CHAR(20) 3-7 chars
LNAME CHAR(20) 3-10 chars
SALARY DEC(7,2) 1000 to 5000
HIREDATE DATE 21 years thru 1 year

25

Table structure for a table containing various data types which needs to be
populated with test data.

If you needed 10,000 rows of test data on this table you would use a table
editor (or SPUFI) to insert some rows and repeat them. How “random” would
this data really be? In real life, not really random at all. This technique allows
you to do so quite easily.

25
Case 6 – Generating Test Data - SQL
INSERT INTO RECASE06
WITH NUMBERS (LEVEL, NEXTONE) AS
(SELECT 1, 1
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1, LEVEL + 1
1 thru 10,000
FROM NUMBERS
WHERE LEVEL < 10 )
SELECT INTEGER(ROUND(RAND()*9999,0)) + 1
, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',
INTEGER(ROUND(RAND()*17,0))+1, 1) 5 sets of letters +
CONCAT vowels
SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)
<<< REPEAT 5 TIMES>>>
, INTEGER(ROUND(RAND()*4,0)) + 3) Min 3, max 7
SQL Continued on next slide 26

SQL used for this purpose.

26
Case 6 – Generating Test Data – SQL (cont)

, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',
INTEGER(ROUND(RAND()*17,0))+1, 1) Same for
CONCAT SUBSTR('AEIOUY', Last name
INTEGER(ROUND(RAND()*5,0))+1, 1)
<<< REPEAT 5 TIMES>>>
, INTEGER(ROUND(RAND()*7,0)) + 3) Min 3, max 10
, DECIMAL((1000.00 + RAND()*4000),7,2) Min 1000,
, CURRENT DATE - 1 YEAR max 5000
- INTEGER(20*365*RAND()) DAYS
FROM NUMBERS
1 year thru
21 years

27

SQL continued.

27
Case 6 – Generating Test Data - Result

EMPID FNAME LNAME SALARY HIREDATE

1369 ZITO RYDI 2250.93 2003-02-03


2714 KUMU PUDIN 3112.80 1997-02-02
3485 FOCAC VOGYNO 1155.08 1991-09-11
5351 CINIFI DELOV 2651.32 2003-02-16
6297 CIKEZ TAGINIMUW 1252.73 1990-10-29
6474 COLO VOLECECU 2948.95 1993-01-28
6629 LIHU DACACOMILY 4721.00 1989-06-07
8463 VIVO VOLEZO 1634.31 2002-01-15
8494 FALIHED SUPO 1542.59 1988-06-06
8787 MUDOD SIC 3874.86 1999-06-04

28

…and the result from one of my test runs (we come up with some interesting
names!). It could be adjusted to reflect the regional demographics.

28
Case 7 – Missing Data Non-recursive

EX7EMPL EMPID EMPNAME


1. Alan Sontag
2. Bobby Wolf
3. Dorothy Truscott

EMPID FRIDAY_DATE HOURS


1 2005-03-04 41.0
EX7TIME 1 2005-03-11 42.0
1 2005-03-18 43.0
1 2005-03-25 44.0
2 2005-03-04 39.0
2 2005-03-18 46.0

For the month, notice that empid 3 has not reported any time;
2 has entered only partially.
29

A simple case involving a set of two tables with time reported by week. Some
employees may fail to report their time.

29
Case 7 – Missing Data Non-recursive

Required Report
EMPNAME FRIDAY_DATE TOTHOURS
Alan Sontag 2005-03-04 41.0
Alan Sontag 2005-03-11 42.0
Alan Sontag 2005-03-18 43.0
Alan Sontag 2005-03-25 44.0
Bobby Wolf 2005-03-04 39.0
Bobby Wolf 2005-03-11 0.0
Bobby Wolf 2005-03-18 46.0
Bobby Wolf 2005-03-25 0.0
Dorothy Truscott 2005-03-04 0.0
Dorothy Truscott 2005-03-11 0.0
Dorothy Truscott 2005-03-18 0.0
Dorothy Truscott 2005-03-25 0.0
List all employees and the reported time for March 2005. If
an employee failed to report time, show it as zeroes.
30

We need a report that includes any missing information. Since we are dealing
with optional data, our first thought would be a Left Outer Join.

However, how do you create a “left” table that has all the data when rows
themselves are missing? See next slide to see how we can generate such a
“left” table.

30
Case 7 – Missing Data Non-recursive

SELECT
CART.EMPNAME, CART.FRIDAY_DATE,
SUM(COALESCE(TIME.HOURS,0.0)) AS TOTHOURS
FROM
( SELECT
EMPL.EMPID, EMPL.EMPNAME,
FRIDAYS.FRIDAY_DATE
FROM EX7EMPL EMPL Employees
INNER JOIN All Fridays
( SELECT DISTINCT FRIDAY_DATE
FROM EX7TIME
WHERE FRIDAY_DATE BETWEEN
'2005-03-01' AND '2005-03-31‘) AS FRIDAYS
ON 1=1 Cartesian Product
) AS CART SQL Continued on next slide 31

CART is a nested table expression (NTE) of all employee-Fridays


combinations that is such a “left” table, mentioned in the previous slide.

31
Case 7 – Missing Data Non-recursive (cont)

LEFT OUTER JOIN


EX7TIME TIME … and their time if any
ON CART.EMPID = TIME.EMPID
AND CART.FRIDAY_DATE = TIME.FRIDAY_DATE
GROUP BY
CART.EMPNAME
, CART.FRIDAY_DATE
ORDER BY
CART.EMPNAME
, CART.FRIDAY_DATE

32

We now perform a Left Outer Join of CART with the TIME table to show the
time reported, if any.

Notice that if all of them cooperate and no one enters the time, this query will
fail!

32
Case 7 – Missing Data - Recursive

WITH
DATES (LEVEL, NEXTDAY) AS
(
SELECT 1, DATE('2005-03-01')
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1, NEXTDAY + 1 DAY
FROM DATES
WHERE LEVEL < 32
)

SQL Continued on next slide


33

An alternative solution to the same problem using recursive SQL.

Also note that unlike the previous solution that requires at least one employee
to enter the time, this solution works irrespective of who has entered the time.

33
Case 7 – Missing Data – Recursive (cont)

SELECT EMPL.EMPNAME
, DATES.NEXTDAY AS FRIDAY_DATE
, COALESCE(TIME.HOURS,0.0) AS TOTHOURS
FROM DATES
INNER JOIN
EX7EMPL EMPL
ON 1 = 1 Cartesian Product
LEFT OUTER JOIN
EX7TIME TIME
ON DATES.NEXTDAY = TIME.FRIDAY_DATE
AND EMPL.EMPID = TIME.EMPID
WHERE NEXTDAY BETWEEN '2005-03-01' AND '2005-03-31'
AND DAYOFWEEK(NEXTDAY) = 6 is a Friday
ORDER BY EMPL.EMPNAME
, DATES.NEXTDAY
34

SQL to generate the missing data, continued.

Alternatively, we could generate just the Friday dates by pushing these


constructs into the CTE itself. However, care must be taken to start with a
Friday (2005-03-04), otherwise the initial select adds zero rows to the CTE.
You must also increment by 7 since first non-Friday will terminate the loop
(this is left as an exercise for the reader).

34
Case 8 – Rollup

1-CORP
(500)

2-I.T. 3-H.R.
(400) (100)

4-DB SVC 5-PROG 6-EMPL REL 7-BENEFITS


(1000) (5000) (50) (30)

35

Case of a hierarchy where the salary amounts must be rolled up to the higher
costs center (e.g. Database Services and Programming are to be rolled up to
the IT salary budget).

35
Case 8 – Rollup SQL

WITH ROLLUP (ACCT_NUM, ACCT_NAME,


PARENT_ACCT_NUM, TOTAL_BUDGET) AS
(SELECT ACCT_NUM, ACCT_NAME, me
PARENT_ACCT_NUM, BUDGET_AMT
FROM RECASE08
UNION ALL
SELECT A.ACCT_NUM, A.ACCT_NAME,
A.PARENT_ACCT_NUM, B.TOTAL_BUDGET
FROM RECASE08 A
, ROLLUP B
WHERE A.ACCT_NUM = B.PARENT_ACCT_NUM
AND B.PARENT_ACCT_NUM IS NOT NULL
)
and my boss
SQL Continued on next slide
36

SQL used for this purpose.

The table RECCASE09 contains:

ACCT_NUM INTEGER NOT NULL (PK)


ACCT_NAME CHAR(20) NOT NULL
PARENT_ACCT_NUM INTEGER
BUDGET_AMT DEC(9,2) NOT NULL

36
Case 8 – Rollup SQL (cont)

SELECT ACCT_NUM, ACCT_NAME,


SUM(TOTAL_BUDGET) AS TOTAL_BUDGET
FROM ROLLUP
GROUP BY ACCT_NUM, ACCT_NAME
ORDER BY ACCT_NUM

37

SQL continued.

37
Case 8 – Rollup Result

ACCT_NUM ACCT_NAME TOTAL_BUDGET

1 CORP 500 7080


2 I.T. 400 6400
3 H.R. 100 180
4 DB SVC 1000
5 PROG 5000
6 EMPL REL 50
7 BENEFITS 30

38

… and the result.

38
Case 9 – Even Allocation

1-CORP
(6000)

2-I.T. 3-H.R.
(4000) (1000)

4-DB SVC 5-PROG 6-EMPL REL 7-BENEFITS


(5000) (1000) (100) (50)

39

Case of a hierarchy where bonus amounts are to be allocated evenly across all
subordinates. For example, the $6,000 for DST is to be split into 2 parts
($3,000 each) for IT and HR. We must allocate from top down.

39
Case 9 – Even Allocation SQL

WITH ALLOC (ACCT_NUM, ACCT_NAME,


PARENT_ACCT_NUM, TOTAL_BONUS) AS me
(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM,
BONUS_AMT FROM RECASE10
UNION ALL
SELECT A.ACCT_NUM, A.ACCT_NAME,
A.PARENT_ACCT_NUM , B.TOTAL_BONUS /
(SELECT COUNT(*) FROM RECASE10 X Divide
WHERE X.PARENT_ACCT_NUM = Evenly
A.PARENT_ACCT_NUM)
FROM RECASE10 A and my
, ALLOC B children
WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM)

SQL Continued on next slide


40

SQL used for this purpose.

The table RECCASE10 contains:

ACCT_NUM INTEGER NOT NULL (PK)


ACCT_NAME CHAR(20) NOT NULL
PARENT_ACCT_NUM INTEGER
BONUS_AMT DEC(9,2) NOT NULL

40
Case 9 – Even Allocation SQL (cont)

SELECT ACCT_NUM, ACCT_NAME,


SUM(TOTAL_BONUS)
FROM ALLOC
GROUP BY ACCT_NUM, ACCT_NAME
ORDER BY ACCT_NUM

41

SQL continued.

41
Case 9 – Even Allocation Result

ACCT_NUM ACCT_NAME TOTAL_BONUS

1 CORP 6000
2 I.T. 4000 7000
3 H.R. 1000 4000
4 DB SVC 5000 8500
5 PROG 1000 4500
6 EMPL REL 100 2100
7 BENEFITS 50 2050

42

…and the result.

42
Case 10 – Weighted Allocation

1-CORP
(5000,R=null)

2-I.T. 3-H.R.
(4000,R=4) (1000,R=1)

5-DB SVC 6-PROG 7-EMPL REL 8-BENEFITS


(5000,R=5) (1000,R=3) (100,R=3) (50,R=2)

43

Similar hierarchy as before but this time, we need to accomplish a weighted


allocation based on rating. For example the $6,000 bonus for CORP is to be
divided using the ratings of 5 and 1 for IT and HR - IT will obtain 4 / (4+1) =
4/5 of the $5,000. As before, we must process top down.

43
Case 10 – Weighted Allocation SQL

WITH ALLOC (ACCT_NUM, ACCT_NAME,


PARENT_ACCT_NUM, TOTAL_BONUS) AS
(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM,
BONUS_AMT Divide
FROM RECASE11 Based
UNION ALL On Rating
SELECT A.ACCT_NUM, A.ACCT_NAME,
A.PARENT_ACCT_NUM , B.TOTAL_BONUS * A.RATING
/ (SELECT SUM(RATING) FROM RECASE11 X
WHERE X.PARENT_ACCT_NUM = A.PARENT_ACCT_NUM)
FROM RECASE11 A
, ALLOC B My children
WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM
)
SQL Continued on next slide
44

SQL for this purpose.

The table RECCASE11 contains:

ACCT_NUM INTEGER NOT NULL (PK)


ACCT_NAME CHAR(20) NOT NULL
RATING SMALLINT
PARENT_ACCT_NUM INTEGER
BONUS_AMT DEC(9,2) NOT NULL

44
Case 10 – Weighted Allocation SQL (cont)

SELECT ACCT_NUM, ACCT_NAME,


SUM(TOTAL_BONUS)
FROM ALLOC
GROUP BY ACCT_NUM, ACCT_NAME
ORDER BY ACCT_NUM

45

SQL continued.

45
Case 10 – Weighted Allocation Result

ACCT_NUM ACCT_NAME TOTAL_BONUS

1 CORP 5000
2 I.T. 4000 8000 4
3 H.R. 1000 2000 1
4 DB SVC 5000 10000 5
5 PROG 1000 4000 3
6 EMPL REL 100 1300 3
7 BENEFITS 50 850 2

46

… and the result.

46
Case 11 – RI Children

B C D

E F G H I

47

An example of a hierarchy that show the referential integrity (RI) implemented


using foreign keys. These relationships are visible in the catalog table
SYSIBM.SYSRELS.

Notice the recursive relationship between B and itself, as well as the circular
definition of constraints from D to H to J and back to D.

47
Case 11 – RI Children SQL

WITH OC (LEVEL, PARENTTAB, CHILDTAB) AS


(SELECT 0 , 'Z', ‘your start table'
FROM SYSIBM.SYSDUMMYU
UNION ALL Unicode
SELECT PARENT.LEVEL + 1 in catalog
, SUBSTR(CHILD.REFTBNAME,1,1) my
, SUBSTR(CHILD.TBNAME,1,1) children
FROM OC PARENT, SYSIBM.SYSRELS CHILD
WHERE PARENT.CHILDTAB = CHILD.REFTBNAME
AND CHILD.CREATOR = … Eliminate
AND CHILD.REFTBCREATOR = … Self-ref
AND CHILD.REFTBNAME <> CHILD.TBNAME
AND PARENT.LEVEL < 10 )

SQL Continued on next slide


48

SQL used for this purpose.

48
Case 11 – RI Children SQL (cont)

SELECT DISTINCT OC.PARENTTAB,


OC.CHILDTAB
FROM OC
WHERE OC.LEVEL > 0
ORDER BY OC.PARENTTAB, OC.CHILDTAB

49

SQL continued.

Notice that the CTE generates much more data than necessary (e.g. node D is
revisited multiple times) and the DISTINCT is needed to eliminate the
duplicates. A very handy SQL enhancement would be for DB2 to allow us to
specify DISTINCT (without level) within the CTE itself. V(future)) perhaps?

49
Case 11 – RI Children Result

PATRENTTAB CHILDTAB

A B
A C
A D
B E
B F
C G
C J
D H
D I
H J
J D

50

… and the result.

50
Case 12 – Cheapest Fare

Start Kansas City


City MCI
200
100
500 St Louis
50
ORD STL

Chicago 150
100 25

IAH DFW
Dallas
50
Houston End
City
51

A case study to list and evaluate multiple paths.

51
Case 12 – RECASE12 Table

RECASE12

FROM_CITY TO_CITY FARE


MCI ORD 100
MCI CDG 200
MCI DFW 500
ORD STL 50
ORD IAH 100
STL IAH 25
STL DFW 150
IAH DFW 50
52

Contents of the table RECASE12 that defines the available routes and the
associated fares.

52
Case 12 – Cheapest Fare SQL
WITH
ROUTING (LEVEL, FROM_CITY, TO_CITY, CHAIN, FARE) AS
( SELECT 0
, CAST('-->' AS CHAR(3) )
, CAST('MCI' AS CHAR(3) )
, CAST('-->MCI' AS VARCHAR(60) )
,0
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT PARENT.LEVEL + 1
, CAST(SUBSTR(CHILD.FROM_CITY,1,3) AS CHAR(3) )
, CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) Build
, PARENT.CHAIN CONCAT '->' CONCAT
CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) itinerary
, PARENT.FARE + CHILD.FARE Add fare
FROM ROUTING PARENT, RECASE12 CHILD
WHERE PARENT.TO_CITY = CHILD.FROM_CITY From here to?
AND PARENT.LEVEL < 10)
SQL Continued on next slide 53

SQL used for this purpose.

53
Case 12 – Cheapest Fare SQL (cont)

SELECT DISTINCT
SUBSTR(ROUTING.CHAIN,4,56) AS ROUTE
, ROUTING.FARE
FROM ROUTING
WHERE ROUTING.LEVEL > 0
AND ROUTING.CHAIN LIKE '-->MCI%'
AND ROUTING.CHAIN LIKE '%DFW'
ORDER BY ROUTE

54

SQL continued.

The length of the generated path (60 bytes, of which 56 are reported) is
arbitrary and truncated here to make the output readable.

54
Case 12 – Cheapest Fare Result

ROUTE FARE

MCI->DFW 500
MCI->ORD->IAH->DFW 250
MCI->ORD->STL->DFW 300
MCI->ORD->STL->IAH->DFW 225
MCI->STL->DFW 350
MCI->STL->IAH->DFW 275

55

… and the result.

A further enhancement (including the flight start and end days/times) would
allow us to select connecting flights (e.g. next one must leave 1 hour to 4 hours
from the arrival of the previous flight etc) and prepare a true itinerary like
Travelocity or other search engines (… all with no pop-up ads!!).

55
Case 13 – Account Linking

90,100 8 110,130 9 customer


account

40,90
5 50,60,110 6 80,120
7

10,40,50 2 20,60,70 30,80


3 4

Start
140 140,150 10,20,30
11 1
10 56

A case study to group accounts that share at least one customer. Such linked
accounts may receive favorable pricing.

56
Case 13 – Account Linking SQL
WITH LINKING (LEVEL, ACCT_NUM, CUST_NUM) AS
( SELECT 0
, ACCT_NUM
, CUST_NUM
FROM RECASE13
WHERE ACCT_NUM = 4 Starting acct num
UNION ALL
SELECT PARENT.LEVEL + 1
, CHILD2.ACCT_NUM
, CHILD2.CUST_NUM
FROM LINKING PARENT
, RECASE13 CHILD1 Link by customer
, RECASE13 CHILD2
WHERE PARENT.CUST_NUM = CHILD1.CUST_NUM
AND PARENT.ACCT_NUM <> CHILD1.ACCT_NUM
AND CHILD1.ACCT_NUM = CHILD2.ACCT_NUM Diff acct
AND PARENT.LEVEL < 5)
SELECT DISTINCT ACCT_NUM All customers
FROM LINKING for that acct 57

SQL used for this purpose.

57
Where Are We?

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

58

OK, it is complex and compact but from a performance perspective, how does
recursive SQL look? We will explore this issue by comparing it with
procedural logic (COBOL program) that accomplishes the same tasks.

58
Hierarchy
Level 0
Level 1
Level 2
Level 0 : 1
Level 1 : 4
Level 2 : 16
… Level 3 : 64
Level 4 : 256
Level 5 : 1,024
Level 6 : 4,096
Level 7 : 16,384
…..
Level 8 : 65,536
Level 8
(total 87,381 nodes)
59

The hierarchical structure used for the case study – each node with 4 children,
8 level deep.

59
Table Structure

RECASE14

ACCT_NUM INTEGER Index K0 (uniq)



ACCT_NAME CHAR(20)
PARENT_ACCT_NUM INTEGER Index K1
AMOUNT DEC(15,2)
ACCT_LEVEL INTEGER Index K2

Derived column

60

Table structure and the indexes available.

60
Procedural Logic Setting Level

SELECT C.PARENT_ACCT_NUM , C.ACCT_NUM


FROM RECASE14 P
, RECASE14 C
WHERE P.ACCT_LEVEL = :WS-LOOP-LEVEL
AND P.ACCT_NUM = C.PARENT_ACCT_NUM
ORDER BY PARENT_ACCT_NUM
, ACCT_NUM my
children
SET LEVEL = 0 FOR ROOT
FOR EACH LEVEL (0 TO 7 BY 1)
FOR EACH ROW OF CHILD-CURSOR
UPDATE CHILD WITH LEVEL = PARENT + 1
NEXT CHILD
NEXT LEVEL
61

SQL for setting the level (used in all 3 cases) later.

61
Procedural Logic for Org Chart

<<< using the pre-set level-number >>>

SELECT BOSS.LEVEL_NUM
, BOSS.ACCT_NAME
, SUB.ACCT_NAME
FROM RECASE14 BOSS
, RECASE14 SUB
WHERE SUB.PARENT_ACCT_NUM = BOSS.ACCT_NUM
ORDER BY BOSS.LEVEL_NUM
, BOSS.ACCT_NUM, SUB.ACCT_NUM my
children

62

Logic for Org chart.

62
Procedural Logic for Rollup

<<< using the pre-set level-number >>>

FOR EACH LEVEL 8 TO 1 BY -1


FOR EACH ROW OF CHILD-CURSOR
ADD AMOUNT TO PARENT AND UPDATE
NEXT CHILD
NEXT LEVEL

SHOW ALL ROWS

63

Logic for rollup.

63
Procedural Logic for Allocate

<<< using the pre-set level-number >>>

FOR EACH LEVEL 0 TO 7 BY 1


FOR EACH ROW OF PARENT-CURSOR
DETERMINE # OF CHILDREN
FOR EACH ROW OF CHILD-CURSOR
ADD PRO-RATED AMOUNT
AND UPDATE CHILD
NEXT CHILD
NEXT PARENT
NEXT LEVEL

SHOW ALL ROWS

64

Logic for allocation.

64
Benchmarks

Comparison of procedural logic with recursive logic

35

Procedural
30
Recursive

25

20
CPU sec

15

10

Org chart Roll up Allocate


Function

65

…and a comparison of they stack up.

Unlike the Dan Luksetich case mentioned earlier, my results are not dazzling
in favor of recursive SQL – it is better, but not by an order of magnitude.

Admittedly, other variables will also impact the benchmarks. For example, if
the procedural logic is needlessly complex or inefficient, recursive SQL can
look much better. I have avoided a biased approach – the best SQL and the
best procedural logic are being compared. Other variables that could affect are
the presence of indexes, depth of the hierarchy and how balanced the hierarchy
is (very evenly balanced in this “lab” exercise). Perhaps, the size and number
of SORTWORK datasets for DSNDB07 may also affect the performance.

65
Where Are We?

1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations

66

Let’s summarize what we have learned and point out some pitfalls.

66
Limiting the Depth

♦ Prone to infinite cycles unless controlled properly


♦ DB2 imposes no limit on the depth of recursion but issues a
warning (SQLSTATE = 01605, SQLCODE=+347) for an SQL
statement that does not use a control variable to limit the depth
¾ Warning is essentially useless in SPUFI (displayed too late)
♦ Warning is based on the absence of:
¾ An integer column that increments by a constant
¾ A predicate of the type LEVEL < constant or LEVEL < :hv
¾ LEVEL < (subselect query) is allowed but issues a warning
¾ The SQL parser logic is quite primitive – for example:
• a loop from 10 to 1 is not infinite but will still generate a warning
• Subtracting -1 (same as adding 1) will still generate a warning

67

The potential to create an infinite loop is ever present since DB2 will not limit
it to a specified depth.

67
Gotchas

B C D

E F G H I

68

Danger lurks around the corner! Lets’ revisit case 11.

68
Causing Infinite Loops
WITH OC (LEVEL, PARENTTAB,
CHILDTAB) AS
(SELECT 0 , 'Z', ‘your start table' DSNT404I SQLCODE = 347,
FROM SYSIBM.SYSDUMMYU WARNING: THE RECURSIVE COMMON
UNION ALL TABLE EXPRESSION OC MAY
SELECT PARENT.LEVEL + 1 CONTAIN AN INFINITE LOOP
, SUBSTR(CHILD.REFTBNAME,1,1) DSNT418I SQLSTATE = 01605
, SUBSTR(CHILD.TBNAME,1,1) SQLSTATE RETURN CODE
FROM OC PARENT, SYSIBM.SYSRELS DSNT415I SQLERRP = DSNXODML
CHILD SQL PROCEDURE DETECTING ERROR
WHERE PARENT.CHILDTAB = DSNT416I SQLERRD = 0 0 0 1209090530
CHILD.REFTBNAME 0 0 SQL DIAGNOSTIC INFORMATION
AND CHILD.CREATOR = … DSNT416I SQLERRD = X'00000000'
AND CHILD.REFTBCREATOR = … X'00000000' X'00000000' X'481141E2'
AND CHILD.REFTBNAME <> X'00000000' X'00000000'
CHILD.TBNAME SQL DIAGNOSTIC INFORMATION
AND PARENT.LEVEL < 10 ) LEVEL PARENTTAB CHILDTAB

Removed
69

Without the extra protection offered by the level clause, the warning and
possible infinite loop.

69
Recommendations

♦ Use controls to limit the depth of recursion


♦ Consider possible future changes that could cause infinite loops
(e.g. recursive or circular references)
♦ Very useful for reporting data that does not exist!
♦ Allows looping logic that would otherwise require procedural
code or SQL Procedures
♦ In most cases, simplifies processing logic
♦ In specific instances, can lead to a huge performance gain

70

Powerful but dangerous in the wrong hands.

70
But it seems too hard…

♦ Seems too complex for “the average programmer”?


♦ Perhaps, but consider what Prof Dijkstra had to say…

“Don't blame me for the fact that competent programming,


as I view it as an intellectual possibility, will be too difficult
for 'the average programmer', you must not fall into the trap
of rejecting a surgical technique because it is beyond the
capabilities of the barber in his shop around the corner.”

E. F. Dijkstra

71

Just because it is not for the masses, let it not stop you.

71
Summary

1. Recursion basics
¾ Introducing the new feature
2. Case studies - mathematical
¾ String of numbers, factorial, primes and Fibonacci
¾ It’s not that hard!
3. Case studies - business
¾ Org chart, Generating test data, Missing data, Rollup, Even
allocation, Weighted allocation, RI children, Cheapest fare, Account
linking
¾ A little complex but compact
4. Performance aspects
¾ Org chart, rollup, allocate
¾ Compact and a better performer
5. Pitfalls and recommendations
¾ Powerful but dangerous!

72

I trust this session has empowered you with the knowledge to exploit
Recursive SQL fully. Good Luck!

72
References

1. Recursive SQL for Dummies – B.L. “Tink” Tysor - IDUG


NA 2005 - Session A1
2. Recursive SQL – Unleash the Power! – Suresh Sane - IDUG
EU 2007 - Session E5
3. DB2 UDB for z/OS V8– Everything You Ever Wanted to
Know, … and More – SG24-6079
4. DB2 UDB for z/OS Version 8 Performance Topics – SG24-
6465
5. Having Fun with Complex SQL – Suresh Sane - IDUG AP
2005 - Session B5
6. Parlez-Vous Klingon – Alexander Kopac - IDUG NA 2007 -
Session ALT
7. “Rinse, Lather, Repeat: Utilizing Recursive SQL on DB2
UDB for z/OS” – Daniel L. Luksetich – IDUG Solutions
73
Journal May 2004

Some of the many useful references.

73
Recursive SQL –
Unleash the Power!

74

Not really…in the unlikely event that we actually have time ….I doubt it!

74
Session G13
Recursive SQL –
Unleash the Power!

Suresh Sane
DST Systems Inc.
sssane@dstsystems.com

75

Thank you and good luck with Recursive SQL!

75

You might also like