Recursive SQL
Recursive SQL
Recursive SQL –
Unleash the Power!
Suresh Sane
DST Systems Inc.
Recursive SQL is one of the most fascinating and powerful (and dangerous!)
features offered in DB2 for z/OS Version 8. In this session, we will introduce
the feature and show numerous examples of how it can be used to achieve
things you would not have imagined being possible with SQL – all in one SQL
statement! Fasten your seat belts and come join us in this exciting journey!
1
Session Outline
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
Recursion basics
•Theory and introduction
Case studies - mathematical
•String of numbers
•Factorial
•Primes
•Fibonacci Series
Case studies - business
•Org chart
•Generating test data
•Missing data
•Rollup
•Allocation
•Weighted allocation
•RI children
•Cheapest fare
•Account linking
Performance aspects
•Comparison to procedural logic
•Org chart
•Rollup
•Allocate
Pitfalls and recommendations
•Best practices
2
About the Instructor
Suresh Sane
♦ Co-author-IBM Redbooks
¾ SG24-6418, May 2002
¾ SG24-7083, March 2004
¾ SG24-7111, July 2006
♦ Educational seminars and presentations at IDUG North
America, Asia Pacific, Canada and Europe
♦ IDUG Solutions Journal article – Winter 2000
♦ Numerous DB2 courses at various locations
♦ IBM Certified Solutions Expert for both platforms for
Application Development and Database Administration
Contact Information:
sssane@dstsystems.com
Suresh Sane
DST Systems, Inc.
1055 Broadway
Kansas City, MO 64105
USA
(816) 435-3803
3
About DST Systems
http://www.dstsystems.com
♦Leading provider of computer software solutions and
services, NYSE listed – “DST”
♦Revenue $2.24 billion
♦110 million+ shareowner accounts
♦24,000 MIPS
♦150 TB DASD
♦145,000 workstations
♦462,000 DB2 objects
♦Non-mainframe: 600 servers (DB2, Oracle, Sybase) with
2.3 million objects
If you have ever invested in a mutual fund, have had a prescription filled, or
are a cable or satellite television subscriber, you may have already had
dealings with our company.
4
It’s cool but I will never use it…
See ref #7 for details on this article. I have to admit my first reaction was that
this feature was “cool” but of little business value. Dan’s article provoked my
interest in this fascinating area of SQL.
5
Where Are We?
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
6
Recursion Simplified
What is 4! ? 24
It is 4 * 3! 4*6
But what is 3! ? 6
It is 3 * 2! 3*2
But what is 2! ? 2
It is 2 * 1! 2*1
But what is 1! ? 1
7
Case 1 – String of Numbers
The best way to introduce recursive SQL? Look at a simple example. Let’s
dive right in.
Also note the coloring scheme used consistently throughout this presentation
to denote the 4 parts of the SQL
8
Recursion Basics
Some of the basics. The “pump” terminology is borrowed from Tink Tysor
(see ref #1). Tink provides very thorough introduction to this fascinating
topic.
9
Rules for CTE
♦ First full select of first UNION must not reference the CTE
♦ All selects within CTE cannot use DISTINCT
¾ This is a major limitation when cycles are present –
DISTINCT on the outer query is expensive
¾ Need an ability to specify DISTINCT without the level
♦ All selects within CTE cannot use GROUP BY or HAVING
♦ Include only 1 reference to the CTE
♦ Initialization select and Iterative select columns must match (data
types, lengths, CCSIDs)
♦ UNION must be a UNION ALL
♦ Outer joins cannot be part of any recursion cycle
♦ Subquery cannot be part of any recursion cycle
10
Some of the restrictions on what can be coded within the common table
expression (CTE).
10
Where Are We?
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
11
If mathematics scares you, do not get discouraged – these are actually easier to
illustrate the concept. We will build on this foundation and cover several
business cases in the next section.
11
Case 2 – Factorial
1 1
2 2
WITH NUMBERS (LEVEL, FACTO) AS 3 6
( 4 24
SELECT 1, 1 5 120
FROM SYSIBM.SYSDUMMY1 …
UNION ALL
SELECT LEVEL + 1, FACTO * (LEVEL + 1)
FROM NUMBERS
WHERE LEVEL < 12
)
SELECT LEVEL AS NUMBER, FACTO AS FACTORIAL
FROM NUMBERS
ORDER BY LEVEL
12
Just a little bit more complex – we simply generate the numbers and report
them in the “use the results” section of the SQL.
12
Case 3 – Generating Primes - SQL
13
Case 3 – Generating Primes – SQL (cont)
Prime
SELECT X.PRIME
FROM 1
(SELECT NUMBERS.LEVEL, 2
NUMBERS.PRIME 3
FROM NUMBERS ) AS X 5
WHERE NOT EXISTS 7
(SELECT 1 11
FROM NUMBERS Y
Non-prime
WHERE Y.PRIME BETWEEN 2 AND
SQRT(X.PRIME) 4
AND MOD(X.PRIME, Y.PRIME) = 0) 6
ORDER BY X.PRIME 8
9
Has a factor 10
14
Note that we can stop checking for factors once we cross SQRT of that number
(a big gain in performance – for example, instead of checking 5000, we can
stop at 70).
14
Case 4 – Fibonacci Series
♦ The first two numbers in the series are one and one.
♦ To obtain each number of the series, you simply add the two
numbers that came before it. In other words, each number of the
series is the sum of the two numbers preceding it.
15
Case 4 – Fibonacci Series - SQL
16
Case 4 – Fibonacci Series – SQL (cont)
17
17
Where Are We?
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
18
18
Case studies - Business
19
19
Case 5 - Org Chart
1-Wolf
10-Schapiro
20
Case of how recursive SQL can be used effectively for traversing hierarchical
structures.
20
Case 5 – RECASE05 Table
21
Case 5 - SQL
WITH OC (LEVEL, MGRID, MGRNAME, EMPID,
EMPNAME) AS
( SELECT 0, 0, ' ', EMPID, EMPNAME
FROM RECASE05
WHERE MGRID IS NULL
UNION ALL
SELECT BOSS.LEVEL + 1, SUB.MGRID,
BOSS.EMPNAME
, SUB.EMPID, SUB.EMPNAME
FROM OC BOSS, RECASE05 SUB
WHERE BOSS.EMPID = SUB.MGRID My direct reports
AND BOSS.LEVEL < 5 )
SELECT OC.LEVEL, OC.MGRNAME, OC.EMPNAME
FROM OC
WHERE LEVEL > 0
ORDER BY OC.LEVEL , OC.MGRID, OC.EMPID 22
The initialization select (WHERE MGRID IS NULL) could have been written
instead as: (WHERE EMPID = 1) .
22
Case 5 - Intermediate Result
23
23
Case 5 - Result
24
24
Case 6 – Generating Test Data
RECASE06
25
Table structure for a table containing various data types which needs to be
populated with test data.
If you needed 10,000 rows of test data on this table you would use a table
editor (or SPUFI) to insert some rows and repeat them. How “random” would
this data really be? In real life, not really random at all. This technique allows
you to do so quite easily.
25
Case 6 – Generating Test Data - SQL
INSERT INTO RECASE06
WITH NUMBERS (LEVEL, NEXTONE) AS
(SELECT 1, 1
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1, LEVEL + 1
1 thru 10,000
FROM NUMBERS
WHERE LEVEL < 10 )
SELECT INTEGER(ROUND(RAND()*9999,0)) + 1
, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',
INTEGER(ROUND(RAND()*17,0))+1, 1) 5 sets of letters +
CONCAT vowels
SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)
<<< REPEAT 5 TIMES>>>
, INTEGER(ROUND(RAND()*4,0)) + 3) Min 3, max 7
SQL Continued on next slide 26
26
Case 6 – Generating Test Data – SQL (cont)
, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',
INTEGER(ROUND(RAND()*17,0))+1, 1) Same for
CONCAT SUBSTR('AEIOUY', Last name
INTEGER(ROUND(RAND()*5,0))+1, 1)
<<< REPEAT 5 TIMES>>>
, INTEGER(ROUND(RAND()*7,0)) + 3) Min 3, max 10
, DECIMAL((1000.00 + RAND()*4000),7,2) Min 1000,
, CURRENT DATE - 1 YEAR max 5000
- INTEGER(20*365*RAND()) DAYS
FROM NUMBERS
1 year thru
21 years
27
SQL continued.
27
Case 6 – Generating Test Data - Result
28
…and the result from one of my test runs (we come up with some interesting
names!). It could be adjusted to reflect the regional demographics.
28
Case 7 – Missing Data Non-recursive
For the month, notice that empid 3 has not reported any time;
2 has entered only partially.
29
A simple case involving a set of two tables with time reported by week. Some
employees may fail to report their time.
29
Case 7 – Missing Data Non-recursive
Required Report
EMPNAME FRIDAY_DATE TOTHOURS
Alan Sontag 2005-03-04 41.0
Alan Sontag 2005-03-11 42.0
Alan Sontag 2005-03-18 43.0
Alan Sontag 2005-03-25 44.0
Bobby Wolf 2005-03-04 39.0
Bobby Wolf 2005-03-11 0.0
Bobby Wolf 2005-03-18 46.0
Bobby Wolf 2005-03-25 0.0
Dorothy Truscott 2005-03-04 0.0
Dorothy Truscott 2005-03-11 0.0
Dorothy Truscott 2005-03-18 0.0
Dorothy Truscott 2005-03-25 0.0
List all employees and the reported time for March 2005. If
an employee failed to report time, show it as zeroes.
30
We need a report that includes any missing information. Since we are dealing
with optional data, our first thought would be a Left Outer Join.
However, how do you create a “left” table that has all the data when rows
themselves are missing? See next slide to see how we can generate such a
“left” table.
30
Case 7 – Missing Data Non-recursive
SELECT
CART.EMPNAME, CART.FRIDAY_DATE,
SUM(COALESCE(TIME.HOURS,0.0)) AS TOTHOURS
FROM
( SELECT
EMPL.EMPID, EMPL.EMPNAME,
FRIDAYS.FRIDAY_DATE
FROM EX7EMPL EMPL Employees
INNER JOIN All Fridays
( SELECT DISTINCT FRIDAY_DATE
FROM EX7TIME
WHERE FRIDAY_DATE BETWEEN
'2005-03-01' AND '2005-03-31‘) AS FRIDAYS
ON 1=1 Cartesian Product
) AS CART SQL Continued on next slide 31
31
Case 7 – Missing Data Non-recursive (cont)
32
We now perform a Left Outer Join of CART with the TIME table to show the
time reported, if any.
Notice that if all of them cooperate and no one enters the time, this query will
fail!
32
Case 7 – Missing Data - Recursive
WITH
DATES (LEVEL, NEXTDAY) AS
(
SELECT 1, DATE('2005-03-01')
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT LEVEL + 1, NEXTDAY + 1 DAY
FROM DATES
WHERE LEVEL < 32
)
Also note that unlike the previous solution that requires at least one employee
to enter the time, this solution works irrespective of who has entered the time.
33
Case 7 – Missing Data – Recursive (cont)
SELECT EMPL.EMPNAME
, DATES.NEXTDAY AS FRIDAY_DATE
, COALESCE(TIME.HOURS,0.0) AS TOTHOURS
FROM DATES
INNER JOIN
EX7EMPL EMPL
ON 1 = 1 Cartesian Product
LEFT OUTER JOIN
EX7TIME TIME
ON DATES.NEXTDAY = TIME.FRIDAY_DATE
AND EMPL.EMPID = TIME.EMPID
WHERE NEXTDAY BETWEEN '2005-03-01' AND '2005-03-31'
AND DAYOFWEEK(NEXTDAY) = 6 is a Friday
ORDER BY EMPL.EMPNAME
, DATES.NEXTDAY
34
34
Case 8 – Rollup
1-CORP
(500)
2-I.T. 3-H.R.
(400) (100)
35
Case of a hierarchy where the salary amounts must be rolled up to the higher
costs center (e.g. Database Services and Programming are to be rolled up to
the IT salary budget).
35
Case 8 – Rollup SQL
36
Case 8 – Rollup SQL (cont)
37
SQL continued.
37
Case 8 – Rollup Result
38
38
Case 9 – Even Allocation
1-CORP
(6000)
2-I.T. 3-H.R.
(4000) (1000)
39
Case of a hierarchy where bonus amounts are to be allocated evenly across all
subordinates. For example, the $6,000 for DST is to be split into 2 parts
($3,000 each) for IT and HR. We must allocate from top down.
39
Case 9 – Even Allocation SQL
40
Case 9 – Even Allocation SQL (cont)
41
SQL continued.
41
Case 9 – Even Allocation Result
1 CORP 6000
2 I.T. 4000 7000
3 H.R. 1000 4000
4 DB SVC 5000 8500
5 PROG 1000 4500
6 EMPL REL 100 2100
7 BENEFITS 50 2050
42
42
Case 10 – Weighted Allocation
1-CORP
(5000,R=null)
2-I.T. 3-H.R.
(4000,R=4) (1000,R=1)
43
43
Case 10 – Weighted Allocation SQL
44
Case 10 – Weighted Allocation SQL (cont)
45
SQL continued.
45
Case 10 – Weighted Allocation Result
1 CORP 5000
2 I.T. 4000 8000 4
3 H.R. 1000 2000 1
4 DB SVC 5000 10000 5
5 PROG 1000 4000 3
6 EMPL REL 100 1300 3
7 BENEFITS 50 850 2
46
46
Case 11 – RI Children
B C D
E F G H I
47
Notice the recursive relationship between B and itself, as well as the circular
definition of constraints from D to H to J and back to D.
47
Case 11 – RI Children SQL
48
Case 11 – RI Children SQL (cont)
49
SQL continued.
Notice that the CTE generates much more data than necessary (e.g. node D is
revisited multiple times) and the DISTINCT is needed to eliminate the
duplicates. A very handy SQL enhancement would be for DB2 to allow us to
specify DISTINCT (without level) within the CTE itself. V(future)) perhaps?
49
Case 11 – RI Children Result
PATRENTTAB CHILDTAB
A B
A C
A D
B E
B F
C G
C J
D H
D I
H J
J D
50
50
Case 12 – Cheapest Fare
Chicago 150
100 25
IAH DFW
Dallas
50
Houston End
City
51
51
Case 12 – RECASE12 Table
RECASE12
Contents of the table RECASE12 that defines the available routes and the
associated fares.
52
Case 12 – Cheapest Fare SQL
WITH
ROUTING (LEVEL, FROM_CITY, TO_CITY, CHAIN, FARE) AS
( SELECT 0
, CAST('-->' AS CHAR(3) )
, CAST('MCI' AS CHAR(3) )
, CAST('-->MCI' AS VARCHAR(60) )
,0
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT PARENT.LEVEL + 1
, CAST(SUBSTR(CHILD.FROM_CITY,1,3) AS CHAR(3) )
, CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) Build
, PARENT.CHAIN CONCAT '->' CONCAT
CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) itinerary
, PARENT.FARE + CHILD.FARE Add fare
FROM ROUTING PARENT, RECASE12 CHILD
WHERE PARENT.TO_CITY = CHILD.FROM_CITY From here to?
AND PARENT.LEVEL < 10)
SQL Continued on next slide 53
53
Case 12 – Cheapest Fare SQL (cont)
SELECT DISTINCT
SUBSTR(ROUTING.CHAIN,4,56) AS ROUTE
, ROUTING.FARE
FROM ROUTING
WHERE ROUTING.LEVEL > 0
AND ROUTING.CHAIN LIKE '-->MCI%'
AND ROUTING.CHAIN LIKE '%DFW'
ORDER BY ROUTE
54
SQL continued.
The length of the generated path (60 bytes, of which 56 are reported) is
arbitrary and truncated here to make the output readable.
54
Case 12 – Cheapest Fare Result
ROUTE FARE
MCI->DFW 500
MCI->ORD->IAH->DFW 250
MCI->ORD->STL->DFW 300
MCI->ORD->STL->IAH->DFW 225
MCI->STL->DFW 350
MCI->STL->IAH->DFW 275
55
A further enhancement (including the flight start and end days/times) would
allow us to select connecting flights (e.g. next one must leave 1 hour to 4 hours
from the arrival of the previous flight etc) and prepare a true itinerary like
Travelocity or other search engines (… all with no pop-up ads!!).
55
Case 13 – Account Linking
40,90
5 50,60,110 6 80,120
7
Start
140 140,150 10,20,30
11 1
10 56
A case study to group accounts that share at least one customer. Such linked
accounts may receive favorable pricing.
56
Case 13 – Account Linking SQL
WITH LINKING (LEVEL, ACCT_NUM, CUST_NUM) AS
( SELECT 0
, ACCT_NUM
, CUST_NUM
FROM RECASE13
WHERE ACCT_NUM = 4 Starting acct num
UNION ALL
SELECT PARENT.LEVEL + 1
, CHILD2.ACCT_NUM
, CHILD2.CUST_NUM
FROM LINKING PARENT
, RECASE13 CHILD1 Link by customer
, RECASE13 CHILD2
WHERE PARENT.CUST_NUM = CHILD1.CUST_NUM
AND PARENT.ACCT_NUM <> CHILD1.ACCT_NUM
AND CHILD1.ACCT_NUM = CHILD2.ACCT_NUM Diff acct
AND PARENT.LEVEL < 5)
SELECT DISTINCT ACCT_NUM All customers
FROM LINKING for that acct 57
57
Where Are We?
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
58
OK, it is complex and compact but from a performance perspective, how does
recursive SQL look? We will explore this issue by comparing it with
procedural logic (COBOL program) that accomplishes the same tasks.
58
Hierarchy
Level 0
Level 1
Level 2
Level 0 : 1
Level 1 : 4
Level 2 : 16
… Level 3 : 64
Level 4 : 256
Level 5 : 1,024
Level 6 : 4,096
Level 7 : 16,384
…..
Level 8 : 65,536
Level 8
(total 87,381 nodes)
59
The hierarchical structure used for the case study – each node with 4 children,
8 level deep.
59
Table Structure
RECASE14
Derived column
60
60
Procedural Logic Setting Level
61
Procedural Logic for Org Chart
SELECT BOSS.LEVEL_NUM
, BOSS.ACCT_NAME
, SUB.ACCT_NAME
FROM RECASE14 BOSS
, RECASE14 SUB
WHERE SUB.PARENT_ACCT_NUM = BOSS.ACCT_NUM
ORDER BY BOSS.LEVEL_NUM
, BOSS.ACCT_NUM, SUB.ACCT_NUM my
children
62
62
Procedural Logic for Rollup
63
63
Procedural Logic for Allocate
64
64
Benchmarks
35
Procedural
30
Recursive
25
20
CPU sec
15
10
65
Unlike the Dan Luksetich case mentioned earlier, my results are not dazzling
in favor of recursive SQL – it is better, but not by an order of magnitude.
Admittedly, other variables will also impact the benchmarks. For example, if
the procedural logic is needlessly complex or inefficient, recursive SQL can
look much better. I have avoided a biased approach – the best SQL and the
best procedural logic are being compared. Other variables that could affect are
the presence of indexes, depth of the hierarchy and how balanced the hierarchy
is (very evenly balanced in this “lab” exercise). Perhaps, the size and number
of SORTWORK datasets for DSNDB07 may also affect the performance.
65
Where Are We?
1. Recursion basics
2. Case studies - mathematical
3. Case studies - business
4. Performance aspects
5. Pitfalls and recommendations
66
Let’s summarize what we have learned and point out some pitfalls.
66
Limiting the Depth
67
The potential to create an infinite loop is ever present since DB2 will not limit
it to a specified depth.
67
Gotchas
B C D
E F G H I
68
68
Causing Infinite Loops
WITH OC (LEVEL, PARENTTAB,
CHILDTAB) AS
(SELECT 0 , 'Z', ‘your start table' DSNT404I SQLCODE = 347,
FROM SYSIBM.SYSDUMMYU WARNING: THE RECURSIVE COMMON
UNION ALL TABLE EXPRESSION OC MAY
SELECT PARENT.LEVEL + 1 CONTAIN AN INFINITE LOOP
, SUBSTR(CHILD.REFTBNAME,1,1) DSNT418I SQLSTATE = 01605
, SUBSTR(CHILD.TBNAME,1,1) SQLSTATE RETURN CODE
FROM OC PARENT, SYSIBM.SYSRELS DSNT415I SQLERRP = DSNXODML
CHILD SQL PROCEDURE DETECTING ERROR
WHERE PARENT.CHILDTAB = DSNT416I SQLERRD = 0 0 0 1209090530
CHILD.REFTBNAME 0 0 SQL DIAGNOSTIC INFORMATION
AND CHILD.CREATOR = … DSNT416I SQLERRD = X'00000000'
AND CHILD.REFTBCREATOR = … X'00000000' X'00000000' X'481141E2'
AND CHILD.REFTBNAME <> X'00000000' X'00000000'
CHILD.TBNAME SQL DIAGNOSTIC INFORMATION
AND PARENT.LEVEL < 10 ) LEVEL PARENTTAB CHILDTAB
Removed
69
Without the extra protection offered by the level clause, the warning and
possible infinite loop.
69
Recommendations
70
70
But it seems too hard…
E. F. Dijkstra
71
Just because it is not for the masses, let it not stop you.
71
Summary
1. Recursion basics
¾ Introducing the new feature
2. Case studies - mathematical
¾ String of numbers, factorial, primes and Fibonacci
¾ It’s not that hard!
3. Case studies - business
¾ Org chart, Generating test data, Missing data, Rollup, Even
allocation, Weighted allocation, RI children, Cheapest fare, Account
linking
¾ A little complex but compact
4. Performance aspects
¾ Org chart, rollup, allocate
¾ Compact and a better performer
5. Pitfalls and recommendations
¾ Powerful but dangerous!
72
I trust this session has empowered you with the knowledge to exploit
Recursive SQL fully. Good Luck!
72
References
73
Recursive SQL –
Unleash the Power!
74
Not really…in the unlikely event that we actually have time ….I doubt it!
74
Session G13
Recursive SQL –
Unleash the Power!
Suresh Sane
DST Systems Inc.
sssane@dstsystems.com
75
75