SEARCHING AND SORTING
SEARCHING AND SORTING
Searching
and sorting are fundamental operations in
computer programming
CSE 110 (Intro to
Computer Science)
Searching
and sorting go hand in hand
Sorted
data is much easier to search
Stony Brook University
LINEAR SEARCH
each element in turn to see if its the one
youre looking for
LINEAR SEARCH EXAMPLE
Examine
Original
list of numbers: 34, 6, 22, 10, 48, 63, 59, 77
On
average, you have to examine half of the data set to
find what youre looking for
Works
The
even if the list is unsorted
time required grows in a linear fashion
The
time needed increases at the same rate as the size
of the data set increases
It
takes 4 comparisons to find 10 in this list
We
It
look at 34, 6, 22, and finally 10
takes 8 comparisons to show that 11 is not in this list
We
need to look at every element in sequence
BINARY SEARCH
Method: choose
an element at random (usually the
middle) and decide whether to search the left or right half
At
each decision point, the space to be searched is cut in
half
Requires
Very
sorted data to work properly
efficient: only needs log2(n) comparisons
Twice
as much data only needs one more step
BINARY SEARCH EXAMPLE
PSEUDOCODE
If (range contains only one element):
Look for desired value
Else:
1. Get midpoint of range
2. Determine which half of the range is likely to contain
the desired value
3. Repeat the binary search on just that half of the range
SORTING TECHNIQUES
Many
Ex. Find
51
Start: [4, 10, 13, 14, 22, 29, 35, 37, 44, 51, 63]
Round
1: middle element is 29, so ignore whats to the left
Round
2: new middle element is 44, so go right again
Round
3: middle element is 51 (left bias) FOUND
sorting techniques exist: bubble sort, insertion sort,
selection sort, mergesort, quicksort, shell sort, radix sort,
etc.
These
techniques differ in their efficiency
Different
sorting techniques take different amounts of
time (and memory/disk space) to sort the same data
Some
sorting algorithms are better (faster) than others
for larger data sets
BUBBLE SORT
PSEUDOCODE
Method: compare
pairs of adjacent items, and swap them
if they are out of order
Elements bubble to
their proper places
At
the end of each pass, the largest remaining element is
in its proper place
This
is trivial to implement, but very inefficient
Each
For
Let N be the number of elements in the data set
Repeat N-1 times:
Repeat N-1 times:
pass may only sort a single value
if A[x] > A[x+1], swap them
N values, we need (N-1) passes
BUBBLE SORT EXAMPLE
BUBBLE SORT, ROUND 2
Start: [29, 10, 14, 37, 13]
Start: [10, 14, 29, 13, 37]
Round 1:
Round 2:
Compare 29 and 10, swap their order
Compare 10 and 14, leave them as-is
Compare 29 and 14, swap their order
Compare 14 and 29, leave them as-is
Compare 29 and 37, leave them as-is
Compare 29 and 13, swap their order
Compare 37 and 13, swap their order
Compare 29 and 37, leave them as-is
BUBBLE SORT, ROUND 3
Start: [10, 14, 13, 29, 37]
Round 3:
Compare 10 and 14, leave them as-is
Compare 14 and 13, swap their order
Compare 14 and 29, leave them as-is
Compare 29 and 37, leave them as-is
PSEUDOCODE
INSERTION SORT
Method: select
one element at a time and insert it into
its proper sorted position
Begin
by dividing the array into two regions: sorted and
unsorted
For
each pass, move the first unsorted item into its
proper position in the sorted region
Slightly
more efficient than bubble sort, since it swaps
fewer elements per round
INSERTION SORT EXAMPLE
Start: [
1. A[0] is sorted; A[1]-A[N-1] are unsorted
2. Repeat N times:
1. nextItem = first unsorted element
2. Shift sorted elements > nextItem over one position
(A[x] = A[x-1])
3. Insert nextItem into correct position
29 ][ 10, 14, 37, 13 ]
Move
10: [ 10, 29 ][ 14, 37, 13 ]
Move
14: [ 10, 14, 29 ][ 37, 13 ]
Move
37: [ 10, 14, 29, 37 ][ 13 ]
Move
13: [ 10, 13, 14, 29, 37 ][ ]
End: [
10, 13, 14, 29, 37 ]
Blue
values are sorted; black values are unsorted
SELECTION SORT
search for the largest unsorted item, and
put it into its sorted position
PSEUDOCODE
Repeatedly
Again, we
divide the data set into unsorted and sorted
regions (the sorted region goes at the end)
On
each pass, swap the largest unsorted item with the
last unsorted element
This
is more efficient than bubble and insertion sort;
it only needs one exchange per round
1. Repeat N-1 times:
1. Find the largest (unsorted) element
2. Swap A[last] with A[largest]
3. Mark the unsorted region as being one element smaller
SELECTION SORT EXAMPLE
RADIX SORT
Start: [ 29, 10, 14, 37, 13 ] [ ]
After 1st pass: [ 29, 10, 14, 13 ] [ 37 ]
After 2nd pass: [ 13, 10, 14 ] [ 29, 37 ]
After 3rd pass: [ 13, 10 ] [ 14, 29, 37 ]
After 4th pass: [ 10 ] [ 13, 14, 29, 37 ]
End: [ ] [ 10, 13, 14, 29, 37 ]
Blue values are unsorted; black values are sorted
Method: Form
groups (based on digits in the same place),
then combine those groups
i.e., all
items with 3 in the tens place
This
requires d iterations, where d is the number of digits
in the largest element
Worst-case
running time: O(dn)
PSEUDOCODE
AN ILLUSTRATION
Start:
for (J = d down to 1):
0123, 2154, 0222, 0004, 0283, 1560, 1061, 2150
Pass 1: group based on
the ones place
1. Initialize 10 groups to empty
2. for (I = 0 through N-1):
1. Place A[I] at the end of group K
2. Increment Kth counter
3. Replace A with group 0 + group 1 + etc.
RADIX SORT, ROUND 2
Start:
1560, 2150, 1061, 0222, 0123, 0283, 2154, 0004
Pass 2: group based on
the tens place
0
1
2
3
4
5
6
7
8
9
4
222, 123
2150, 2154
1560, 1061
283
0
1
2
3
4
5
6
7
8
9
1560, 2150
1061
222
123, 283
2154, 4
RADIX SORT, ROUND 3
Start:
0004, 0222, 0123, 2150, 2154, 1560, 1061, 0283
Pass 3: group based on
the hundreds place
0
1
2
3
4
5
6
7
8
9
4, 1061
123, 2150, 2154
222, 283
1560
RADIX SORT, ROUND 4
Start:
EXTRACTING DIGITS
0004, 1061, 0123, 2150, 2154, 0222, 0283, 1560
Pass 4: group based on
the thousands place
End result: a completely
sorted list of 4-digit
numbers in only 4 passes
0
1
2
3
4
5
6
7
8
9
4, 123, 222, 283
1061, 1560
2150, 2154
How
Use
do we extract the dth digit of an integer?
a combination of / and %
Ones
Tens
digit: n % 10
digit: (n / 10) % 10
Hundreds
digit: (n / 100) % 10