Analysis_of_algorithms
Analysis_of_algorithms
Cost models
Time efficiency estimates depend on what we define to be a step.
For the analysis to correspond usefully to the actual run-time, the
time required to perform a step must be guaranteed to be bounded
above by a constant. One must be careful here; for instance, some Graphs of functions commonly used
analyses count an addition of two numbers as one step. This in the analysis of algorithms,
showing the number of operations N
assumption may not be warranted in certain contexts. For
versus input size n for each function
example, if the numbers involved in a computation may be
arbitrarily large, the time required by a single addition can no
longer be assumed to be constant.
the uniform cost model, also called unit-cost model (and similar variations), assigns a
constant cost to every machine operation, regardless of the size of the numbers involved
the logarithmic cost model, also called logarithmic-cost measurement (and similar
variations), assigns a cost to every machine operation proportional to the number of bits
involved
The latter is more cumbersome to use, so it is only employed when necessary, for example in the analysis
of arbitrary-precision arithmetic algorithms, like those used in cryptography.
A key point which is often overlooked is that published lower bounds for problems are often given for a
model of computation that is more restricted than the set of operations that you could use in practice and
therefore there are algorithms that are faster than what would naively be thought possible.[8]
Run-time analysis
Run-time analysis is a theoretical classification that estimates and anticipates the increase in running time
(or run-time or execution time) of an algorithm as its input size (usually denoted as n) increases. Run-
time efficiency is a topic of great interest in computer science: A program can take seconds, hours, or
even years to finish executing, depending on which algorithm it implements. While software profiling
techniques can be used to measure an algorithm's run-time in practice, they cannot provide timing data
for all infinitely many possible inputs; the latter can only be achieved by the theoretical methods of run-
time analysis.
Take as an example a program that looks up a specific entry in a sorted list of size n. Suppose this
program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm,
and on Computer B, a much slower machine, using a binary search algorithm. Benchmark testing on the
two computers running their respective programs might look something like the following:
16 8 100,000
63 32 150,000
Based on these metrics, it would be easy to jump to the conclusion that Computer A is running an
algorithm that is far superior in efficiency to that of Computer B. However, if the size of the input-list is
increased to a sufficient number, that conclusion is dramatically demonstrated to be in error:
63 32 150,000
Computer A, running the linear search program, exhibits a linear growth rate. The program's run-time is
directly proportional to its input size. Doubling the input size doubles the run-time, quadrupling the input
size quadruples the run-time, and so forth. On the other hand, Computer B, running the binary search
program, exhibits a logarithmic growth rate. Quadrupling the input size only increases the run-time by a
constant amount (in this example, 50,000 ns). Even though Computer A is ostensibly a faster machine,
Computer B will inevitably surpass Computer A in run-time because it is running an algorithm with a
much slower growth rate.
Orders of growth
Informally, an algorithm can be said to exhibit a growth rate on the order of a mathematical function if
beyond a certain input size n, the function f(n) times a positive constant provides an upper bound or limit
for the run-time of that algorithm. In other words, for a given input size n greater than some n0 and a
constant c, the run-time of that algorithm will never be larger than c × f(n). This concept is frequently
expressed using Big O notation. For example, since the run-time of insertion sort grows quadratically as
its input size increases, insertion sort can be said to be of order O(n2).
Big O notation is a convenient way to express the worst-case scenario for a given algorithm, although it
can also be used to express the average-case — for example, the worst-case scenario for quicksort is
O(n2), but the average-case run-time is O(n log n).
It is clearly seen that the first algorithm exhibits a linear order of growth indeed following the power rule.
The empirical values for the second one are diminishing rapidly, suggesting it follows another rule of
growth and in any case has much lower local orders of growth (and improving further still), empirically,
than the first one.
A given computer will take a discrete amount of time to execute each of the instructions involved with
carrying out this algorithm. Say that the actions carried out in step 1 are considered to consume time at
most T1, step 2 uses time at most T2, and so forth.
In the algorithm above, steps 1, 2 and 7 will only be run once. For a worst-case evaluation, it should be
assumed that step 3 will be run as well. Thus the total amount of time to run steps 1-3 and step 7 is:
The loops in steps 4, 5 and 6 are trickier to evaluate. The outer loop test in step 4 will execute ( n + 1 )
times,[10] which will consume T4( n + 1 ) time. The inner loop, on the other hand, is governed by the
value of j, which iterates from 1 to i. On the first pass through the outer loop, j iterates from 1 to 1: The
inner loop makes one pass, so running the inner loop body (step 6) consumes T6 time, and the inner loop
test (step 5) consumes 2T5 time. During the next pass through the outer loop, j iterates from 1 to 2: the
inner loop makes two passes, so running the inner loop body (step 6) consumes 2T6 time, and the inner
loop test (step 5) consumes 3T5 time.
Altogether, the total time required to run the inner loop body can be expressed as an arithmetic
progression:
The total time required to run the inner loop test can be evaluated similarly:
As a rule-of-thumb, one can assume that the highest-order term in any given function dominates its rate
of growth and thus defines its run-time order. In this example, n2 is the highest-order term, so one can
conclude that f(n) = O(n2). Formally this can be proven as follows:
Prove that
Therefore
A more elegant approach to analyzing this algorithm would be to declare that [T1..T7] are all equal to one
unit of time, in a system of units chosen so that one unit is greater than or equal to the actual times for
these steps. This would mean that the algorithm's run-time breaks down as follows:[12]
In this instance, as the file size n increases, memory will be consumed at an exponential growth rate,
which is order O(2n). This is an extremely rapid and most likely unmanageable growth rate for
consumption of memory resources.
Relevance
Algorithm analysis is important in practice because the accidental or unintentional use of an inefficient
algorithm can significantly impact system performance. In time-sensitive applications, an algorithm
taking too long to run can render its results outdated or useless. An inefficient algorithm can also end up
requiring an uneconomical amount of computing power or storage in order to run, again rendering it
practically useless.
Constant factors
Analysis of algorithms typically focuses on the asymptotic performance, particularly at the elementary
level, but in practical applications constant factors are important, and real-world data is in practice always
limited in size. The limit is typically the size of addressable memory, so on 32-bit machines 232 = 4 GiB
(greater if segmented memory is used) and on 64-bit machines 264 = 16 EiB. Thus given a limited size, an
order of growth (time or space) can be replaced by a constant factor, and in this sense all practical
algorithms are O(1) for a large enough constant, or for small enough data.
This interpretation is primarily useful for functions that grow extremely slowly: (binary) iterated
logarithm (log*) is less than 5 for all practical data (265536 bits); (binary) log-log (log log n) is less than 6
for virtually all practical data (264 bits); and binary log (log n) is less than 64 for virtually all practical
data (264 bits). An algorithm with non-constant complexity may nonetheless be more efficient than an
algorithm with constant complexity on practical data if the overhead of the constant time algorithm
results in a larger constant factor, e.g., one may have so long as and
.
For large data linear or quadratic factors cannot be ignored, but for small data an asymptotically
inefficient algorithm may be more efficient. This is particularly used in hybrid algorithms, like Timsort,
which use an asymptotically efficient algorithm (here merge sort, with time complexity ), but
switch to an asymptotically inefficient algorithm (here insertion sort, with time complexity ) for small
data, as the simpler algorithm is faster on small data.
See also
Amortized analysis
Analysis of parallel algorithms
Asymptotic computational complexity
Best, worst and average case
Big O notation
Computational complexity theory
Master theorem (analysis of algorithms)
NP-Complete
Numerical analysis
Polynomial time
Program optimization
Profiling (computer programming)
Scalability
Smoothed analysis
Termination analysis — the subproblem of checking whether a program will terminate at all
Time complexity — includes table of orders of growth for common algorithms
Information-based complexity
Notes
1. "Knuth: Recent News" (https://web.archive.org/web/20160828152021/http://www-cs-faculty.
stanford.edu/~uno/news.html). 28 August 2016. Archived from the original (http://www-cs-fac
ulty.stanford.edu/~uno/news.html) on 28 August 2016.
2. Cormen, Thomas H., ed. (2009). Introduction to algorithms (https://www.worldcat.org/title/31
1310321) (3rd ed.). Cambridge, Mass: MIT Press. pp. 44–52. ISBN 978-0-262-03384-8.
OCLC 311310321 (https://search.worldcat.org/oclc/311310321).
3. Alfred V. Aho; John E. Hopcroft; Jeffrey D. Ullman (1974). The design and analysis of
computer algorithms (https://archive.org/details/designanalysisof00ahoarich). Addison-
Wesley Pub. Co. ISBN 9780201000290., section 1.3
4. Juraj Hromkovič (2004). Theoretical computer science: introduction to Automata,
computability, complexity, algorithmics, randomization, communication, and cryptography (ht
tps://books.google.com/books?id=KpNet-n262QC&pg=PA177). Springer. pp. 177–178.
ISBN 978-3-540-14015-3.
5. Giorgio Ausiello (1999). Complexity and approximation: combinatorial optimization problems
and their approximability properties (https://books.google.com/books?id=Yxxw90d9AuMC&p
g=PA3). Springer. pp. 3–8. ISBN 978-3-540-65431-5.
6. Wegener, Ingo (2005), Complexity theory: exploring the limits of efficient algorithms (https://
books.google.com/books?id=u7DZSDSUYlQC&pg=PA20), Berlin, New York: Springer-
Verlag, p. 20, ISBN 978-3-540-21045-0
7. Robert Endre Tarjan (1983). Data structures and network algorithms (https://books.google.c
om/books?id=JiC7mIqg-X4C&pg=PA3). SIAM. pp. 3–7. ISBN 978-0-89871-187-5.
8. Examples of the price of abstraction? (https://cstheory.stackexchange.com/q/608),
cstheory.stackexchange.com
9. How To Avoid O-Abuse and Bribes (http://rjlipton.wordpress.com/2009/07/24/how-to-avoid-o
-abuse-and-bribes/) Archived (https://web.archive.org/web/20170308175036/https://rjlipton.
wordpress.com/2009/07/24/how-to-avoid-o-abuse-and-bribes/) 2017-03-08 at the Wayback
Machine, at the blog "Gödel's Lost Letter and P=NP" by R. J. Lipton, professor of Computer
Science at Georgia Tech, recounting idea by Robert Sedgewick
10. an extra step is required to terminate the for loop, hence n + 1 and not n executions
11. It can be proven by induction that
12. This approach, unlike the above approach, neglects the constant time consumed by the
loop tests which terminate their respective loops, but it is trivial to prove that such omission
does not affect the final result
References
Sedgewick, Robert; Flajolet, Philippe (2013). An Introduction to the Analysis of Algorithms
(2nd ed.). Addison-Wesley. ISBN 978-0-321-90575-8.
Greene, Daniel A.; Knuth, Donald E. (1982). Mathematics for the Analysis of Algorithms
(Second ed.). Birkhäuser. ISBN 3-7643-3102-X.
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. & Stein, Clifford (2001).
Introduction to Algorithms. Chapter 1: Foundations (Second ed.). Cambridge, MA: MIT
Press and McGraw-Hill. pp. 3–122. ISBN 0-262-03293-7.
Sedgewick, Robert (1998). Algorithms in C, Parts 1-4: Fundamentals, Data Structures,
Sorting, Searching (https://archive.org/details/algorithmsinc00sedg) (3rd ed.). Reading, MA:
Addison-Wesley Professional. ISBN 978-0-201-31452-6.
Knuth, Donald. The Art of Computer Programming. Addison-Wesley.
Goldreich, Oded (2010). Computational Complexity: A Conceptual Perspective. Cambridge
University Press. ISBN 978-0-521-88473-0.
External links
Media related to Analysis of algorithms at Wikimedia Commons