Optimizing Conway's Game of Life
Optimizing Conway's Game of Life
Chapt
I;:
nj"i
sjnin
si".
,*si
.,*a8
of Algorithmic Optimization
.&tomata Game
'"I&
I've spent a lot of m cussing assembly language optimization,which I con-
derappreciated topic. However, I'd like to take this
t there is much, much more to optimization than as-
s essential for absolute maximum performance, but
ecessary but notsufficient, if you catch my drift-and
ing forimproved but not maximum performance.
imes: Optimize your algorithm first. Devise new ap-
th said,Premature optimization is the root of all evil.
This is, of course, o&hat, stuff you knowlike the back of your hand. Oris it? As Jeff
Duntemann pointed out to me the otherday, performance programmers are made,
not born. While I'm merrily gallivanting around in this book optimizing 486
pipelining and turning simple tasks into horriblycomplicated and terrifylngly fast
state machines, many of youare still developing your basic optimization skills. I don't
want to shortchange thoseof you in the latter category, so in this chapter, we'll dis-
cuss some high-level language optimizations that can be applied by mere mortals
within a reasonable periodof time. We're going to examine a complete optimization
process, from start to
finish, and what we will find is that it's possible to get 50-times
a
speed-up withoutusing one byte of assembly! It's all a matterof perspective-how you
look at your code and data.
323
Previous Home Next
Conway‘s Game
The program that we’re going to optimize is Conway’s famous Game of Life, long-
ago favorite of the hackers at MIT’s AI Lab. If you’venever seen it, let meassure you:
Life isneat, and more than a little hypnotic. Fractals have been the hotgraphics topic
in recent years, but foreye-catching dazzle, Life is hard to beat.
Of course, eye-catching dazzle requires real-time performance-lots of pixels help
too-and there’s the rub. When there are, say, 40,000 cells to process and display, a
simple, straightforward implementation just doesn’t cut it, even on a 33 MHz 486.
Happily, though, there are many, many ways to speed up Life, and they illustrate a
variety of important optimization principles, as this chapter will show.
First, I’ll describe the groundrules of Life, implement avery straightforward version
in C++, and then speed that version up by about eight times without using any dras-
tically different approaches or any assembly. This may be a little tame for some of
you, but be patient; for after that, we’ll haul out thebig guns andmove into the30 to
40 times speed-up range. Then in the next chapter,I’ll show you how several pro-
grammers really floored it in taking me up on my second Optimization Challenge,
which involved the Game of Life.
324 Chapter 17
Previous Home Next
original cellmap. This keeps us from corrupting the currentgeneration’s cellmap
before we’re done using it to calculate the next generation.
All in all, Listing 17.1 is a clean, compact, and elegant implementation
of the Game
of Life. Wereit not that the codeis as slow asmolasses, we could stop right here.
/* C o n t r o l st h es i z eo ft h ec e l l map. Mustbe w i t h i nt h ec a p a b i l i t i e s
o ft h ed i s p l a y mode,andmustbe limitedtoleave room f o r t e x t
d i s p l a ya tr i g h t . */
u n s i g n e di n tc e l l m a p - w i d t h -96;
u n s i g n e di n tc e l l m a p - h e i g h t = 96:
/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l as d i s p l a y e d on s c r e e n . * /
u n s i g n e di n tm a g n i f i e r - 2:
u n s i g n e dl o n gg e n e r a t i o n
chargen-textC801;
-
u n s i g n e di n ti n i t - l e n g t h .x .y ,s e e d :
0;
l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :
c e l l m a p current-map(cel1map-height. cellmap-width);
c e l l m a p next-map(cel1map-height. cellmap-width):
11 Gettheseed:seedrandomly i f 0 entered
c o u t << "Seed ( 0 f o r randomseed): ":
c i n >> seed:
i f (seed -
0 ) seed -
( u n s i g n e d )t i m e ( N U L L 1 :
11 Randomly i n i t i a l i z e t h e i n i t i a l c e l l map
c o u t << " I n i t i a l i z i n g . . .";
srand(seed);
init-length
do {
- ( c e l l m a p - h e i g h t * c e l l m a p - w i d t h ) I 2;
x - random(cel1map-width);
y - random(cel1map-height);
n e x t - m a p . s e t - c e l l ( x ,y ) :
3 w h i l e( - i n i t - l e n g t h ) ;
current_map.copy-cells(next_map): 11 p u t i n i t map i n current-map
enter-display-mode():
generation++;
s p r i n t f ( g e n - t e x t . " % 1 0 1 u " g. e n e r a t i o n ) ;
s h o w - t e x t ( 1 . GENERATION-LINE. g e n - t e x t ) :
/ I R e c a l c u l a t ea n dd r a wt h en e x tg e n e r a t i o n
current_map.next-generation(next-map);
/ I Make c u r r e n t - m a pc u r r e n ta g a i n
[Link]-cells(next~map):
# i f LIMIT-18-HZ
/ I L i m i t t o amaximum o f1 8 . 2f r a m e sp e rs e c o n d . f o rv i s i b i l i t y
do I
-bios-timeofday(-TIMELGETCLOCK. &bios-time):
3 w h i l e( s t a r t - b i o s - t i m e
start-bios-time - bios-time:
- bios-time):
#endif
I w h i l e( ! k b h i t O ) ;
g e t c h ( 1: 11 c l e ak re y p r e s s
exit-display-mode();
c o u t << " T o t a lg e n e r a t i o n s : " << g e n e r a t i o n << "\nSeed: " <<
/* cellmap destructor. */
cellmap::-cellmap(void)
I
delete[] cells:
1
/* Copies one cellmap's cells to another cellmap. Both cellmaps are
assumed to be the same size. */
void cel1map::copy-cells(cel1map &sourcemap)
(
memcpy(cel1s. [Link], length-in-bytes):
I
/ * Turns cell on. */
void cellmap::set_cell(unsigned int x. unsigned int y )
r
unsigned char *cell-ptr =
cells + (y * width-in-bytes) + (x / 8 ) ;
#if WRAP-EDGES
while (x < 0 ) x +- width: / / wrap, if necessary
while (x >- width) x -- width:
while (y < 0 ) y +- height:
while (y >- height) y -- height;
#else
if ((x < 0 ) 1 1 (x >- width) 1 ) (y < 0 ) 1 1 (y >- height))
return 0 : / / return 0 for off edges if no wrapping
lendi f
cell-ptr - cells + (y * width-in-bytes) + (x / 8 ) ;
return (*cell-ptr & (0x80 >> (x & 0x07))) ? 1 : 0;
1
/ * Calculates the next generation of a cellmap and stores it in
next-map. * /
void ce1lmap::next-generation(cellmap& next-map)
t
unsigned int x. y. neighbor-count;
neighbor-count
cell-state(x+l.
-
/ / F i g u r eo u t how many n e i g h b o r st h i sc e l lh a s
cell-state(x-1. y-1) + c e l l - s t a t e ( x . y-1)
y-1) + cell-state(x-1, y) +
+
c e l l - s t a t e ( x + l .y ) + c e l l - s t a t e ( x - 1 . y+l) +
c e l l s-t a t e ( x . y+l) + cell-statetx+l.
i f (cell-state(x, y) -
1) I
/ / The c e l l i s on; does i t s t a y on?
y+l);
if ( ( n e i g h b o r - c o u n t !- 2 ) && ( n e i g h b o r - c o u n t != 3 ) ) I
[Link]-cell(x. y); / / t u r n it o f f
d r a w - p i x e l ( x . y . OFF-COLOR);
I
I else t
/ / The c e l l i s o f f :
i f (neighbor-count
[Link]-cell(x.
3) I
y);
--
does it t u r n on?
/ / t u r n i t on
d r a w - p i x e l ( x y, . ON-COLOR):
I
I
I
1
I
# d e f i n e TEXT-X-OFFSET 27
# d e f i n e SCREEN-WIDTH-IN-BYTES 320
/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l . */
e x t e r nu n s i g n e di n tm a g n i f i e r ;
FP-SEG(screen-ptr) -SCREEN-SEGMENT;
FP_OFF(screen-ptr) -
y * m a g n i f i e r * SCREEN-WIDTH-IN-BYTES + x * magnifier;
f o r( i - 0 ;i < m a g n i f i e r : i++) I
f o r (j-0; j < m a g n i f i e r ; j++) t
I
*(screen-ptr+j) -
color;
screen-ptr +- SCREEN-WIDTH-IN-BYTES;
I
I
/* Mode 13h m o d e - s e tf u n c t i o n . */
v o i de n t e r - d i s p l a y - m o d e 0
{
u n i o n REGS r e g s e t :
328 Chapter 17
Previous Home Next
r e g s e t . x . a x = 0x0013;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
1
I* T e x t mode m o d e - s e tf u n c t i o n . */
v o i de x i t - d i s p l a y - m o d e 0
{
union REGS regset:
r e g s e t . x . a x = 0x0003;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
1
/* T e x td i s p l a yf u n c t i o n .O f f s e t st e x tt on o n - g r a p h i c sa r e ao f
screen. * I
v o i ds h o w - t e x t ( i n tx .i n ty .c h a r* t e x t )
I
gotoxy(TEXTpX_OFFSET + x . y ) :
puts(text):
I
Its worth noting, though, that one reason drawqixelo doesn ’t much affectperfor-
p mance is that in Listing 17.1, we 5-e smart enough to redrawpixels only when their
states change, rather than during every generation. Detecting and eliminating re-
dundant operations is part of knowing the nature of your data, and is a potent
optimization technique thatwill be extremely usefula little later in this chapter.
However, ifyou never look beneath the suflaceof the abstract model at the implemen-
p tation details,you have noidea ofwhat thetruepe$nnance cost of various operations
is, and, withoutthat, you have largeb surrendered control over performance.
330 Chapter 17
Previous Home Next
Having said that, let me hasten to add that algorithmic improvements can make a
big difference even when working at a purely abstract level. For a large unordered
data set, a high-level Quicksort will beat the pants off the best-implemented inser-
tion sort you can imagine. Still, you can optimize your algorithm from here 'til
doomsday, and if you have a fast algorithm running ontop of a highly abstract pro-
gramming model, you'll almost certainly end up with a slow program. In Listing
17.1, the abstraction that's killing us is that of looking at the eight neighbors with
eight completely independent operations, requiring eight calls to cell-state() and
eight calculations of cell address and cell mask. In fact, given the nature of cell stor-
age, the eight neighbors are in a fixed relationship to one another, and theaddresses
and masks of all eight can generally be foundvery easily via hard-wired offsets and
shifts once theaddress and mask of anyone is known.
There's a kicker here, though, and that's the counting of neighbors forcells at the edge of
the cellmap. When cellmap wrapping is enabled (so that the cellmap becomes essentially a
toroid, with each edge joined seamlessly to the opposite edge, as opposed to having a
border of offcells), neighbors that reside on the otheredge of the cellmap can't be
accessed by the standard fixed offset, as shown in Figure 17.1. So, in general, we could
improve performance by hard-wiring our neighborcountingfor the bit-percell cellmap
L
the other side of the
cellmap.
1
J
Cellmap
Edge-wrapping complications.
Figure 17.1
*
Fbdding Cells
I I I
1
JI I
0 0 / 0 O O O 0 0 . 0 J
Cellmap
332 Chapter 17
LISTING 17.3 11 [Link]
/* c e l l m a pc l a s sd e f i n i t i o n ,c o n s t r u c t o r ,c o p y - c e l l s o ,s e t L c e l l 0 ,
c l e a r - c e l l O .c e l l L s t a t e 0 .c o u n t L n e i g h b o r s 0 . and
n e x t - g e n e r a t i o n 0f o rf a s t ,h a r d - w i r e dn e i g h b o rc o u n ta p p r o a c h .
O t h e r w i s e ,t h e same as L i s t i n g 1 7 . 1 * /
c l a s sc e l l m a p 1
private:
u n s i g n e dc h a r* c e l l s ;
u n s i g n e di n tw i d t h :
u n s i g n e di n tw i d t h - . i n - b y t e s ;
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s ;
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void);
v o i dc o p y - c e l l s ( c e l 1 m a p& s o u r c e m a p ) :
v o i ds e t - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) :
v o i dc l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) ;
intcell-state(int x. i n t y ) :
i n tc o u n t - n e i g h b o r s ( i n tx .i n ty ) ;
v o i d next-generation(cellmap& dest._map);
}:
/* c e l l m a pc o n s t r u c t o r . P a d sa r o u n dc e l ls t o r a g ea r e aw i t h 1 extra
b y t e ,u s e df o rh a n d l i n ge d g ew r a p p i n g . *I
cellmap::cellmap(unsigned i n t h .u n s i g n e di n t w)
i
w i d t h = w;
width-in-bytes = ((w + 7) / 8) + 2: / / p a de a c hs i d ew i t h
/ / 1 e x t r ab y t e
height = h;
length-in-bytes = width-in-bytes * ( h + 2); / / p atdo p / b o t t o m
I / w i t h 1 e x t r ab y t e
cells - new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] ; / / c e sl lt o r a g e
m e m s e tl(ecne0gl.1t hs-. i n - b y t e s ) : / I c cl aseeltatllalorsr t.
1
/ * C o p i e so n ec e l l m a p ' sc e l l st oa n o t h e rc e l l m a p . I f wrapping i s
e n a b l e d .c o p i e se d g e( w r a p )b y t e si n t oo p p o s i t ep a d d i n gb y t e si n
s o u r c ef i r s t , s o t h a tt h ep a d d i n gb y t e so f fe a c he d g eh a v et h e
same v a l u e sa sw o u l db ef o u n db yw r a p p i n ga r o u n dt ot h eo p p o s i t e
[Link] t o b et h e same s i z e . * /
v o i d cel1map::copy-cells(cel1map &sourcemap)
I
u n s i g n e dc h a r* c e l l - p t r ;
i n t i;
# i f WRAP-EDGES
/ / Copy l e f t and r i g h t edges i n t op a d d i n gb y t e s on r i g h t and l e f t
c e l l - p t r = [Link] + width-in-bytes:
f o r (i=O i;< h e i g h t ; i++) {
* c e l l - p t r = * ( c e l l - p t r + width-in-bytes - 2):
* ( c e l l - p t r + width-in-bytes - 1) = * ( c e l l L p t r + 1 ) :
c e l l - p t r += w i d t h - i n - b y t e s :
I
/ / Copy t o pa n db o t t o me d g e si n t op a d d i n gb y t e s on b o t t o ma n dt o p
rnemcpy([Link], s o u r c e m a p . c e l l s + l e n g t h - i n - b y t e s -
(width-in-bytes * 2 ) . width-in-bytes):
memcpy(sourcemap.cel1s + l e n g t h - i n - b y t e s - width-in-bytes.
sourcemap.cel1.s + w i d t h - i n - b y t e s .w i d t h - i n - b y t e s ) ;
/ * Turns cell off. x and y are offset by 1 byte down and to the right,
to compensate for the padding bytes around the cell map. */
void cel1map::clear-cell(unsigned int x . unsigned int y)
e
unsigned char *cell-ptr -
cells + ((y + 1) * width-in-bytes) + ( ( x / 8 ) + 1):
- -
cell-ptr cells + ((y * widthkin-bytes) + ( ( x + 7 ) / 8 ) ) ;
mask Ox80 >> ( ( x - 1) & 0 x 0 7 ) ;
neighbor-count -
/ / Count upper left neighbor
(*cell-ptr & mask) ? 1 : 0;
/ / Count left neighbor
if ((*(cell-ptr +-width-in-bytes) & mask)) neighbor-count++;
/ / Count lower left neighbor
if ((*(cellLptr + (width-in-bytes * 2)) & mask)) neighbor-count++;
if ((mask
mask -
>>- 1)
0x80;
0) -
I / Point to upper neighbor
cell-ptr++;
I
I / Count upper neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ / Count lower neighbor
334 Chapter 17
i f ((*(cell-ptr + (width-in-bytes * 2 ) ) & mask))
neighbor-count++;
I 1 P o i n tt ou p p e rr i g h tn e i g h b o r
i f ((mask >>- 1) = 0 ) {
mask = 0x80:
cell-ptr++;
I
/ / C o u n tu p p e rr i g h tn e i g h b o r
i f ( ( * c e l l _ p t r & mask))neighbor-count++;
/ / Count r i g h t n e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytes) & mask))neighbor-count++:
I / C o u n tl o w e rr i g h tn e i g h b o r
i f ( ( * ( c e l l L p t r + (width-in..bytes * 2 ) ) & mask))
neighbor-count++;
r e t u r nn e i g h b o r - c o u n t :
1
/* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s it i n
next-map. * I
v o i d cellmap::next_generation(cellmap& nexttmap)
f
u n s i g n e di n tx .y .n e i g h b o r - c o u n t :
f o r( y - 0 ;y < h e i g h t : y++) 1
f o r (x=O; x < w i d t h ; x++) I
n e i g h b o r - c o u n t = c o u n t - n e i g h b o r s ( x .y ) :
i f ( c e l l - s t a t e ( x .y ) == 1) I
i f ( ( n e i g h b o r - c o u n t != 2 ) && ( n e i g h b o r - c o u n t != 3 ) )
n e x t - m a p . c l e a r - c e l l ( yx ), : / I t u r n it o f f
d r a w - p i x e l ( x , y . OFF-COLOR):
1
I else
i f ( n e i g h b o r - c o u n t == 3 ) {
n e x t - m a p . s e t - c e l l y( x) :. / I t u r n i t on
d r a w - p i x e l ( x . y . ONKCOLOR):
I
1
1
1
In Listing 17.3, note the padded cellmap edges, and the alterationof the member
functions to compensate for the [Link] note that the width now has to be a
multiple of eight, to facilitate the process of copying the edges to the opposite padding
bytes. We have decreased the generality of our Game of Life implementation in ex-
change for better performance. That’s a very common trade-off, ascommon as trading
memory for [Link] a rule, the more generala programis, the slower it is.
A corollary is that often (not always, but often), the moreheavily optimized a pro-
gram is, the more complex and the moredifficult to implement it is. You can often
improve performance a good dealby implementing only the level of generality you
need, but at the same time decreased generality makes it moredifficult to change or
port the program a t some later date. A Game of Life implementation, such as Listing
17.1, that’s built on set-cell(), clear-cell(), and get-cell() is completely general; you
I* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s it i n
next-map. * I
v o i d cel1map::next-generation(cellmap& next-map)
u n s i g n e di n tx . y. neighbor-count:
u n s i g n e d i n t wi dth-in-bytesX2 - width-in-bytes << 1;
u n s i g n e dc h a r* c e l l L p t r .* c u r r e n t L c e l l - p t r .m a s k ,c u r r e n t t m a s k ;
u n s i g n e dc h a r* b a s e - c e l l - p t r *. r o w - c e l l - p t r b. a s e - m a s k ;
u n s i g n e dc h a r* d e s t - c e l l - p t r = [Link];
11 P r o c e s sa l lc e l l si nt h ec u r r e n tc e l l m a p
row-cel1-ptr - cells; / / p o i n tt ou p p e rl e f tn e i g h b o ro f
/I firstcellincell map
f o r( y - 0 :y < h e i g h t : y++) [ / I r e p e a tf o re a c hr o wo fc e l l s
11 C e l lp o i n t e ra n dc e l lb i t mask f o r f i r s t c e l l i n row
base-cell-ptr = row-cell-ptr; / I t o access upper l e fnt e i g h b o r
base-mask = 0x01: / If iciornesf ltl row
f o r ( x -x0<: w i d t h ; x++) [ / I r e p e faoet ra ccheri lnol w
/ I F i r s t ,c o u n tn e i g h b o r s
cell-ptr
mask = basecmask;
-
/ / P o i n tt ou p p e rl e f tn e i g h b o ro fc u r r e n tc e l l
base-cell-ptr; / I p o i n t e ra n db i t
11 u p lnpeef irtg h b o r
mask f o r
/ I Countupper l e f tn e i g h b o r
neighbor-count -
/ / Count l e f t n e i g h b o r
( * c e l l L p t r & mask) ? 1 : 0;
336 Chapter 17
/ / Point t o upper neighbor
if ((mask >>- 1) --
0) I
-
mask 0x80:
cell-ptr++:
1
/ / Remember where to find the current cell
current-cell-ptr -
cell-ptr + widthkin-bytes:
current-mask -
mask:
/ I Count upper neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ I Count lower neighbor
if ((*(cell-ptr + widthkin-bytesX2) & mask))
neighbor-count++;
/ I Point to upper right neighbor
if ((mask >>- 1)
-
mask 0x80:
-0) I
cell-ptr++:
1
/ I Count upper right neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ I Count right neighbor
if ((*(cell-ptr + width-in-bytes) & mask))
neighbor-count++:
/ / Count lower right neighbor
if ((*(cell-ptr + width-in-bytesX2) & mask))
neighbor-count++:
if (*current-cellLptr & current-mask) t
if ((neighbor-count !- 2) && (neighbor-count !- 3 ) ) t
*(dest-cell-ptr + (current-cell-ptr - cells)) &-
-current-mask: / / turnoff cell
draw-pixel(x. y . OFF-COLOR):
1
1 else I
if (neighbor-count --3) {
*(dest-cell-ptr + (current-cell-ptr - cells)) 1-
current-mask; / / turnon cell
draw-pixel(x. y . ON-COLOR):
1
I
/ / Advance t o the next cell on row
if ((base-mask >>- 1)
base-mask -
0x80:
--0) {
Listing 17.4 and Listing 17.3 are functionally the same; the only difference lies in
how nextsenerationo is implemented. (Only nextsenerationo is shownin Listing
1’7.4;the program is otherwise identical to Listing 17.3.) Listing 17.4 applies the
following optimizations to nextsenerationo:
The neighbor-counting code is brought into nextseneration, eliminating many func-
tion calls and from-scratch address/mask calculations; all multiplies are eliminated by
using pointers and addition; and all cellsare accessed directly via pointers and masks,
eliminating all remaining functioncalls and from-scratch address/mask calculations.
We’re still not ready for assembly, though; what we need is a new perspective that
lends itself to vastly better performancein C++.The Life program in the nextsection
is three to seven times faster than Listing 17.4-and it’s still in C++.
How is this possible? Here aresome hints:
After a few dozen generations, mostof the cellmap consists of cellsin the off state.
There are many possible cellmap representations other than one bit-per-pixel.
Cellschangestaterelativelyinfrequently.
338 Chapter 17
adding paddingbytes offthe edges so that pointer arithmetic would always work, but
the major optimizationswere moving the critical code into asingle loop and using
pointers rather than member functions whenever possible. In otherwords, we took
what we already knew and madeit more efficient.
Now it’s time to re-examine the nature of this programming task from the ground
up, looking for things that we don’t yet know. Let’s take a moment toreview whatthe
Game of Life consists of. The basic task is evolving a new generation, andthat’s done
by looking at the numberof “on” neighbors acell has and the cell’s own state. If a
cell is on, andtwo or three neighbors are on, thencell thestays on; otherwise, an on-
cell is turned off. If a cell is off and exactly three neighbors areon, then the cell is
turned on;otherwise, an off-cell stays [Link]’s all there is to it. As any fool cansee,
the trick is to arrangethings so that we can count neighbors and check the cell state
as quickly aspossible. Large lookup tables, oddly encoded cellmaps, and lots of bit-
twiddling assembly code spring to mind as possible approaches. Can’tyou just feel
your adrenaline start to pump?
Relax. Step [Link] to divine the true natureof theproblem. The objectis not to
p count neighbors and check cell states as quickly as possible; thatk just one pos-
sible implementation. The object is to determine whenb state
a cellmust be changed
and to change it appropriately, andthat’s what we needto do asquickly us possible.
What difference does that new perspective make? Let’s approach it this way. What
does atypical cellmap look like? As it happens, after afew generations, thevast ma-
jority of cellsare off. In fact, the vast majority of cells are notonly off but areentirely
surrounded by off-cells. Also, cells change state infrequently; in any given genera-
tion after the first few, most cellsremain in the same state as inthe previous generation.
Do you see where I’m heading? Do you hear a whisper of inspiration fromyour right
brain? The original implementation storedcell states as 1-bits (on), or0-bits (off).
For each generation and for eachcell, it counted thestates of the eight neighbors,
for an average of eight operations per cell per generation. Suppose, now, that on
average 10 percent of cells change state from one generation to the next. (The ac-
tual percentageis even lower, but this will do for illustration.) Supposealso that we
change the cell map format to store abyte rather than a bit for each cell, with the
byte storing notonly the cell state but also the countof neighboring on-cells for that
cell. Figure 17.3 shows this format. Then, rather than counting neighbors each time,
we could just look at the neighbor countin the cell and operatedirectly from that.
But what about the overhead needed to maintain the neighbor counts? Well, each
time a cell changes state, eight operations would be needed to update the countsin
the eight neighboring cells. Butthis happens only once every ten cells, on average-
so the cost of this approach is only one-tenth that of the original approach!
Know your data.
LISTING17.511 [Link]
/* C++ Game o f L i f e i m p l e m e n t a t i o n f o r a n y mode f o r w h i c h mode s e t
a n dd r a wp i x e lf u n c t i o n sc a nb ep r o v i d e d .T h ec e l l m a ps t o r e st h e
n e i g h b o rc o u n tf o re a c hc e l la sw e l la st h es t a t eo fe a c hc e l l :
t h i sa l l o w sv e r yf a s tn e x t - s t a t ed e t e r m i n a t i o n . Edgesalwayswrap
i nt h i si m p l e m e n t a t i o n .
T e s t e dw i t hB o r l a n d C++. To r u n . l i n k w i t h L i s t i n g 17.2
i nt h el a r g em o d e l . */
# i n c l u d e < s t d l ib. h>
#i n c l u d e < s t d 0. i h>
# i n c l u d e< i o s t r e a m . h >
# i n c l u d e< c o n i o . h >
340 Chapter 17
# i n c l u d e< t i m e . h >
#i n c l ude <dos .h>
fkin c l u d e < b i o s . h >
#i n c l u d e <mem. h>
I* C o n t r o l s t h e s i z e o f t h e c e l l map. Mustbe w i t h i nt h ec a p a b i l i t i e s
o ft h ed i s p l a y mode, andmustbe l i m i t e dt ol e a v e room f o r t e x t
d i s p l a ya tr i g h t . *I
u n s i g n e di n tc e l l m a p - w i d t h
u n s i g n e di n tc e l l m a p - h e i g h t
96:
96:
--
I* Width & h e i g h t i n p i x e l s o f e a c h c e l l . */
u n s i g n e di n tm a g n i f i e r 2; -
I* Randomizingseed */
unsigned i n t seed:
v o i dm a i n 0
{
u n s i g n e dl o n gg e n e r a t i o n - 0:
chargen-textC801:
l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :
c e l l m a p current-map(cel1map-height. cellmap-width):
current-map.init0: / / r a n d o m l yi n i t i a l i z ec e l l map
enter-di splay-mode( ) :
#endi f
1 w h i l e( ! k b h i t O ) :
getch0; / I c l ekaery p r e s s
e x i t - d i s pay-mode(
l ):
c o u t << " T o t a lg e n e r a t i o n s : " << generation << "\nSeed: " <<
seed << "\n":
1
/* c e l l m a pc o n s t r u c t o r . */
cellmap::cellmap(unsigned i n t h ,u n s i g n e di n t w)
width --
w:
- - -
height h;
length-in-bytes w * h:
cells new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] : / / c e sl lt o r a g e
temp-cells
if ( (cells - NULL) (temp-cells
p r i n t f ( " 0 u to fm e m o r y \ n " ) :
-
new u n s i g n e dc h a r [ l e n g t h - i n - b y t e s l ;
I( NULL) 1 I
I / temp c e l l s t o r a g e
exit(1):
I
memset(cel1s. 0. l e n g t h - i n - b y t e s ) ; I / c l e a ra l lc e l l s ,t os t a r t
I
I* c e l l m a pd e s t r u c t o r . *I
cellmap::-cellmap(void)
I
d e l e t e C l c e l l s;
d e l e t e [ ]t e m p - c e l l s :
1
/ * T u r n sa no f f - c e l lo n ,i n c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s . */
v o i d cel1map::set-cell(unsigned i n t x ,u n s i g n e di n ty )
(
u n s i g n e di n t w - width. h - height:
u n s i g n e dc h a r* c e l l - p t r -
i n tx o l e f t .x o r i g h t .y o a b o v e .y o b e l o w ;
c e l l s + (Y * W) + X:
I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s .
/ / a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l map
if (x
xoleft
0)-- -
w - 1:
else
xoleft -
-1:
342 Chapter 17
i f (y
yoabove
-- -0)
length-in-bytes - w:
else
yoabove
i f (x -- - -w:
(w - 1))
x o r i g h t = - ( w - 1):
else
xoright
i f (y
yobelow
-- -- 1:
(h - 1 ) )
-(length-in-bytes - w):
else
yobelow -
w:
*(cell-ptr) I- 0x01:
* ( c e l l - p t r + yoabove + x o l e f t ) +- 2:
* ( c e l l - p t r + yoabove) +- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) +- 2:
* ( c e l l - p t r + x o l e f t ) +- 2:
* ( c e l l - p t r + x o r i g h t ) +- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) +- 2:
* ( c e l l - p t r + yobelow) +- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) +- 2 :
1
I* T u r n sa no n - c e l lo f f ,d e c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s . *I
v o i d cel1map::clear-cell(unsigned i n tx .u n s i g n e di n ty )
(
u n s i g n e di n t w - width, h - height;
i n tx o l e f t ,x o r i g h t .y o a b o v e .y o b e l o w :
u n s i g n e dc h a r* c e l l - p t r - c e l l s + (y * w ) + x:
I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s ,
if (x
xoleft
--
/ I a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l
0)
w - 1:
map
-- --
else
xoleft -1:
if (y 0)
yoabove lengthkin-bytes - w:
else
yoabove
-- -
- -w:
i f (x
xoright
(w
- 1))
- ( w - 1);
- ---
else
xoright 1:
if ( y (h 1))
yobelow -(length-in-bytes - w):
else
yobelow - w;
* ( c e l l L p t r ) &- -0x01:
* ( c e l l _ p t r + yoabove + x o l e f t ) -- 2:
*(eel 1 - p t r + yoabove ) -- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) -- 2:
*(eel 1 - p t r + x o l e f t ) -- 2:
* ( c e l l _ p t r + x o r i g h t ) -- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) -- 2:
* ( c e l l - p t r + yobelow) -- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) -- 2:
1
cell-ptr -
cells + ( y * width) + x;
return *cell-ptr & 0x01;
1
I* Calculates and displays the next generation of current-map * I
void cel1map::next-generation0
(
unsigned int x. y. count;
unsigned int h -
height, w width;
unsigned char *cellLptr. *row-cell-ptr;
-
I1 Copy to temp map, s o we can have an unaltered version from
I f which to work
memcpy(temp-cells, cells, length-in-bytes);
while (*cell-ptr 0) {-
11 neighbors as possible
if (count -
I f Cell is off; turn it on if it has exactly 3 neighbors
3) (
set-cell(x. y);
draw-pixel (x. y. ON-COLOR):
1
3
/ I Advance to the nextcell
cell-ptr++; / I advance to the next cell byte
) while (++x < w);
RowDone:
1
1
/* Randomly initializes the cellmap to about 50% on-pixels. * I
void cel1map::initO
{
unsigned int x. y. init-length;
344 Chapter 17
/ / Gettheseed;seedrandomly i f 0 entered
c o u t << “Seed ( 0 f o r randomseed): ”;
c i n >> seed;
i f ( s e e d =- 0 ) seed = ( u n s i g n e d )t i m e ( N U L L ) :
x = random(width):
y - random(height);
i f ( c e l l - s t a t e ( x .y ) -= 0 ) 1
s e t - c e l l ( x .y ) ;
I
I w h i l e( - i n i t - l e n g t h ) ;
I
The large modelis actually not necessary for the96x96 cellmap inListing [Link]-
ever, I was actually more interested in seeingfast a 200x200 cellmap, and two 200x200
cellmaps can’t fit in asingle segment. (This caneasily be worked around inassembly
language for cellmaps up to a segment in size; beyond that size, cellmap scanning
becomes pretty complex, although it can still be efficiently implemented with some
clever programming.)
Anyway, using the large model helps illustrate that it’s the data representation and
the dataprocessing approach you choose that mattermost. Optimization details like
memory models and segments and in-line functions andassembly language are im-
portant but secondary. Let your mind roam creatively before you start coding.
Otherwise, you may find you’re writing well-tuned slow code, which is by no means
the same thingas fast code.
Take a close look at Listing 17.5. You will see that it’s quite a bit simpler than
Listing
17.4. To some extent, that’s because I decided to hard-wire the program to wrap
around from one edgeof the cellmap to theother (it’s much more interesting that
way), but the main reason is that it’s a lot easier to work with the neighbor-count
model. There’sno complex mask and pointer management, and the only thing that
reuZ(y needs to be optimized is scanning for zerobytes. (And, in fact,I haven’t opti-
mized even that because it’s done in a C t + loop; it should really be REPZ SCASB.)
In truth, none of the code in Listing 17.5 is particularly well-optimized, and, as I
noted, the program must be compiledwith the large model for large cellmaps. Also,
of course, the entire program is still in C+t; note well that there’s not a whit of
assembly here.
We’vegotten more than a 30-times speedup simply by removing a littleof the ab-
p straction thatC++ encourages, andby storing andprocessing the data in a manner
appropriate for the typical nature of the data itselJ: In other words, we’ve done
No doubt we could get another two to five times improvement with good assembly
code-but that’s dwarfed by a 30-times improvement, so optimization at a concep-
tual level must come first.
346 Chapter 17