Simpson - DynamicProgramminginExcelVer1.0Withabstract
Simpson - DynamicProgramminginExcelVer1.0Withabstract
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262281816
CITATIONS READS
0 558
3 authors:
Catherine Stringfellow
Midwestern State University
30 PUBLICATIONS 156 CITATIONS
SEE PROFILE
All content following this page was uploaded by Richard Simpson on 13 August 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
DYNAMIC PROGRAMMING FROM AN EXCEL PERSPECTIVE
ABSTRACT
Many students have difficulty understanding the concept of dynamic programming, a problem
solving approach appropriate to use when a problem can be broken down into overlapping sub-
problems. In addition, computer science majors seldom take a course in which a spreadsheet
application is taught or used, although spreadsheet applications can be a great tool in visualizing
and solving problems. This paper attempts to address these issues by providing examples of
problems that can be solved in a spreadsheet environment using dynamic programming in
various computer science courses.
INTRODUCTION
There is a well-known approach to solving certain algorithms called Dynamic Programming (DP),
an unusual term that resulted from the fact that the 1950 Secretary of Defense hated
mathematics [2]. It is primarily a bottom-up method that has been around since the 1940s
used for developing a recursive, top-down solution for many problems. It is appropriate for use
with problems that can be solved by breaking the problem down to overlapping sub-problems.
The popularity of this approach is heavily due to its efficiency. A solution having an exponential
complexity for a top-down recursive algorithm may have a polynomial complexity with the
associated dynamic programming method. When students are first introduced to this approach
as a problem solving method, they are often uncomfortable or at least have trouble visualizing
the process. The teaching method discussed in this paper attempts to address these educational
road-bumps. A collection of problems from very simple to somewhat complicated are solved in
a spreadsheet environment using the dynamic programming approach to demonstrate its
algorithm visualization value. A positive side effect is that computer science majors become
better acquainted with spreadsheet programming.
BACKGROUND
There has been previous work in combining MS Excel and Dynamic Programming (DP)
particularly in the area of operations research modeling. Jensens website Operations Research
Models and Methods is a useful site from which to download Excel add-ins that solves a variety
of problems using dynamic programming. This software is in support of his book of the same
name [4]. Professor Mark Gerstein in his Yale bioinformatics lab has been using similar
techniques for visualizations of DNA sequence matching (basically a variation of the longest
common subsequence problem) as well as other problems [3]. Consequently many of the
techniques that will be discussed may well have been used in a variety of educational settings
although it doesnt appear to be widespread in dynamic programming. It is the purpose of this
paper to demonstrate educational applications of Excel to DP. It is not the purpose of this paper
to explain the dynamic programming algorithmic approach.
SIMPLE EXAMPLES
Algorithm designers generally consider problems suitable for dynamic programming if the
problem can be modeled by top-down recursive representations of sub-problems, which have
an optimal substructure. The efficiency of the approach is greatly increased if the sub-problems
have a large amount of overlap. This will become clear within the context of some of the
examples.
We begin with some very simple examples that many may not even consider to be DP due to
their simplicity. Skipping probably the simplest, which is the calculation of factorial(n), we will
start with the calculation of the nth Fibonacci number. Many computer science students have
written this function recursively as one of the traditional problems presented when introducing
the concept of recursion. Due to the exponential complexity one would not usually use the
algorithm shown in Figure 1 so we turn out attention to the algorithm in Figure 2. This
algorithm is a bottom-up solution to the top-down recursion solution shown in Figure 1. In
other words, it is DP in a very simple form. The resulting complexity is O(n) not O(cn), a
significant improvement.
Switching from C++ to a spreadsheet environment (MS Excel, in this case) one can quickly build a
simple visualization of the Fibonacci process. Place the value 1 in cells A1 and B1 and the
formula A1+B1 in cell C1. Copying C1 to the cells to the right immediately constructs the
Fibonacci sequence in a bottom-up DP manner. This is, of course, a very simple example in
which the value in each cell is determined by the sum of the previous two cells.
1. If xm=yn then zk=xm=yn and Zk-1 is the LCS of Xm-1 and Yn-1
2. If xmyn then zk xm implies that Z is an LCS of Xm-1 and Y
3. If xmyn then zk ym implies that Z is an LCS of X and Yn-1
where c[i,j] is defined to be the length of the LCS of prefixes Xi and Yj . Note, the length of the
LCS is defined, not the actual sequence.
The above definition specifies the value of the cells in a 2-dimensional table. For example,
placing two strings, e.g. BDCABA and ABCBDAB in the indicated row and column respectively,
the initial table is built as shown in Figure 4. This table is filled with the two strings and the
initial 0 values in row 3 and column C. The cells in the D4:I10 rectanuglar area represent the
values for c[i,j]. In particular c[1,1] is stored in cell D4 and c[2,3] is stored in F5 and so forth. To
evaluate this table enter an Excel function for cell D4 and copy to the remaining cells. This
function is the previously defined formula for c[i,j]; that is, IF(D$2=$B4,C3+1,MAX(D3,C4)). (It is
important to note that many computer science students have little experience with Excel. They
occasionaly build tables and charts for papers and such, but seldom take a course that trains one
in MS Office tools. Thus, a simple review of absolute addressing and functions such as IF is
probably called for.) Placing this formula in cell D4 and copying to the remaining cells generates
the table on the right of Figure 4. From the lower right hand cell in the table one can see that
the length of the longest common subsequence is 4. The length of the LCS is increased by 1
every time two of the characters from the strings match. Note that cell F6 is obtained by
incrementing E5 by 1, a result of the fact that both F2 and B6 contain a C. The sequence of
characters that generate this behavior is the LSC, i.e. BCBA in this case.
As a final example we show that it is possible to push this concept too far. There are problems
for which a DP solution is complex enough that retrofitting the algorithm within Excel becomes
overly complicated. An example of this is the well-known Optimal Binary Search Tree problem
[1]. The problem is to construct a binary search tree in such a way as to optimize search time,
given the probability of searching for each key. In other words, the tree should produce the
minimal expected search cost. In the interest of brevity, let us consider the DP recurrence
relation that results from this problem and omit the reasoning process that generated the
relation. Solving this problem requires the constructions four tables (tables) e, weight (w), root
and probability (Figure 5). The cells c(i,j) in the e array contains the cost of searching the sub-
trees containing nodes i to j. The weight array contains the sums of the probabilities of the
indicated nodes. The final answer is contained in the root array which indicates the structure of
the optimal tree. The recurrence for e is
{ { }
where
From the recurrence relation, the tables shown in Figure 5 are built. The w array and the p,q
array , which contains the initial set of probabilities of accessing the nodes, are easily
established leaving the e array and the root array to be calculated. DP is applied to the e array
and during the process the root array is filled.
To clarify the problem, first note that the e array is constructed from the lower right to upper
left with the last step being the determination of cell(5,1) which is 2.75. Each cell is evaluated
by used the previously calculated values below and to the right of the cell as well as using the
corresponding weight cell from the w array. For example cell(6,2) which is 1.2 can be
determined as follows. (Figure 6.)
CONCLUSION
Over the years it has become clear to the authors that tools such as Excel (or other spreadsheet
software) are not appropriately appreciated by many computer science students. Most never
take a course in Excel and hence are quite unaware of its advanced features beyond simple
applications.
We make a conscious attempt to embed Excel into many of our courses so that the students will
at least be exposed to spreadsheets concepts. In addition, it is a good tool to visualize table
construction algorithms. In particular, it has been used in Data Structures to graph data, in
Introduction to Computing for Science Majors to build tables such as Fibonacci and Pascal and in
Bioinformatics in which the Longest Common Subsequence Algorithm is always investigated. In
other words, since most computer science programs do not include a course in spreadsheets, it
is desirable to include it throughout a computer science program. Dynamic programming
techniques with their associated visualization are one of the most interesting, as well as
educational applications, of a spreadsheet in a computer science program.
REFERENCES
[1] Cormen T, Leiserson C, Rivest R, Stein C. Introduction to Algorithms. 3rd ed. Cambridge, Mass: MIT
Press; 2009.
[2] Dreyfus S. Richard Bellman on the Birth of Dynamic Programming. Operating Research.
2002;50(1):48-51.
[3] Gerstein M. Gerstein Lab. [Internet]. 2011 [cited 2012 November 1]. Available from:
http://www.gersteinlab.org/.
[4] Jensen PA, Bard JF. Operations Research Models and Methods. Hoboken, NJ: John Wiley & Sons, Inc;
2003.