0% found this document useful (0 votes)
32 views25 pages

Optimizing Conway's Game of Life

The document discusses algorithm optimization in the context of Conway's Game of Life, emphasizing that significant performance improvements can be achieved without resorting to assembly language. It outlines the basic rules of the game and presents a straightforward C++ implementation, highlighting the potential for optimization to achieve speed-ups of 50 times or more. The chapter aims to guide readers through a complete optimization process, illustrating key principles along the way.

Uploaded by

Nuno Gomes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views25 pages

Optimizing Conway's Game of Life

The document discusses algorithm optimization in the context of Conway's Game of Life, emphasizing that significant performance improvements can be achieved without resorting to assembly language. It outlines the basic rules of the game and presents a straightforward C++ implementation, highlighting the potential for optimization to achieve speed-ups of 50 times or more. The chapter aims to guide readers through a complete optimization process, illustrating key principles along the way.

Uploaded by

Nuno Gomes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Previous Home Next

Chapt

the triumph of algorithm optimization in a cellular automata game

I;:
nj"i
sjnin
si".
,*si
.,*a8

of Algorithmic Optimization
.&tomata Game
'"I&
I've spent a lot of m cussing assembly language optimization,which I con-
derappreciated topic. However, I'd like to take this
t there is much, much more to optimization than as-
s essential for absolute maximum performance, but
ecessary but notsufficient, if you catch my drift-and
ing forimproved but not maximum performance.
imes: Optimize your algorithm first. Devise new ap-
th said,Premature optimization is the root of all evil.
This is, of course, o&hat, stuff you knowlike the back of your hand. Oris it? As Jeff
Duntemann pointed out to me the otherday, performance programmers are made,
not born. While I'm merrily gallivanting around in this book optimizing 486
pipelining and turning simple tasks into horriblycomplicated and terrifylngly fast
state machines, many of youare still developing your basic optimization skills. I don't
want to shortchange thoseof you in the latter category, so in this chapter, we'll dis-
cuss some high-level language optimizations that can be applied by mere mortals
within a reasonable periodof time. We're going to examine a complete optimization
process, from start to
finish, and what we will find is that it's possible to get 50-times
a
speed-up withoutusing one byte of assembly! It's all a matterof perspective-how you
look at your code and data.

323
Previous Home Next
Conway‘s Game
The program that we’re going to optimize is Conway’s famous Game of Life, long-
ago favorite of the hackers at MIT’s AI Lab. If you’venever seen it, let meassure you:
Life isneat, and more than a little hypnotic. Fractals have been the hotgraphics topic
in recent years, but foreye-catching dazzle, Life is hard to beat.
Of course, eye-catching dazzle requires real-time performance-lots of pixels help
too-and there’s the rub. When there are, say, 40,000 cells to process and display, a
simple, straightforward implementation just doesn’t cut it, even on a 33 MHz 486.
Happily, though, there are many, many ways to speed up Life, and they illustrate a
variety of important optimization principles, as this chapter will show.
First, I’ll describe the groundrules of Life, implement avery straightforward version
in C++, and then speed that version up by about eight times without using any dras-
tically different approaches or any assembly. This may be a little tame for some of
you, but be patient; for after that, we’ll haul out thebig guns andmove into the30 to
40 times speed-up range. Then in the next chapter,I’ll show you how several pro-
grammers really floored it in taking me up on my second Optimization Challenge,
which involved the Game of Life.

The Rules of the Game


The Game of Life isridiculously simple. There is a cellmap, consisting of a rectangu-
lar matrix of cells,each of which may initially be either on oroff. Each cell has eight
neighbors: two horizontally, two vertically, and fourdiagonally. For each succeeding
generation of cells, the game logic determines whether each cell will be on or off
according to the following rules:
If a cell is on and has either two or three neighbors that are on in the current
generation, it stays on; otherwise, the cellturns off.
If a cell is off and has exactly three “on” neighbors in the current generation, it
turns on; otherwise,it stays off. That’s all the rules there are-but they give rise
to an astonishing variety of forms, including patterns that spin, march the across
screen, and explode.
It’s only a little more complicated to implement the Game of Life than it is to de-
scribe it. Listing 17.1, together with the display functions in Listing 17.2, is a C++
implementation of the Game of Life, and it’s very straightforward. A cellmap is an
object that’s accessible through member functions to set, clear, and test cell states,
and through a member function to calculate the next [Link] the
next generation involves nothing more than using the other memberfunctions to
set eachcell to the appropriatestate, given the numberof neighboring on-cells and
the cell’s current state. The only complication is that it’s necessary to place the next
generation’s cells in another cellmap, and then copy the final result back to the

324 Chapter 17
Previous Home Next
original cellmap. This keeps us from corrupting the currentgeneration’s cellmap
before we’re done using it to calculate the next generation.
All in all, Listing 17.1 is a clean, compact, and elegant implementation
of the Game
of Life. Wereit not that the codeis as slow asmolasses, we could stop right here.

LISTING 17.1 11 7-1 .CPP


/ * C++ Game o f L i f e i m p l e m e n t a t i o n f o r a n y mode f o r w h i c h mode s e t
a n dd r a wp i x e lf u n c t i o n sc a nb ep r o v i d e d .
T e s t e dw i t hB o r l a n d C++ i nt h es m a l lm o d e l . */
#i ncl ude <stdi l b. h>
{ [ i n c l u d e < s t d i o . h>
# i n c l u d e< i o s t r e a m . h >
# i n c ude l <coni 0. h>
{ [ i n c l u d e< t i m e . h >
{[include <dos. h>
#i ncl ude <bios. h>
#i n c l u d e <mem. h>

#define ON-COLOR 15 / / o n - c pe il xlceoll o r


{[define OFF-COLOR 0 / / o f f - c ep li lxceol l o r
%define MSG-LINE 10 / / row f ot er x t messages
#define GENERATION-LINE 1 2 / / row f o r g e n e r a t i o n # d i s p l a y
#define LIMIT-18-HZ 1 / / s e t 1 f o r maximum f r a m rea t e = 18Hz
{[define WRAP-EDGES 1 / / s etto 0 tdoi s a b lwe r a p p i nagr o u n d
/ / a t c e l l map edges
c l a s sc e l l m a p {
private:
unsigned char *cell s :
u n s i g n e di n tw i d t h :
u n s i g n e di n tw i d t h - i n - b y t e s :
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s :
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void):
v o i dc o p y - c e l l s ( c e l 1 m a p& s o u r c e m a p ) :
v o i ds e t _ c e l l ( u n s i g n e di n tx .u n s i g n e di n t y):
v o i dc l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n t y);
i n tc e l l - s t a t e ( i n tx .i n ty ) :
v o i d next-generation(cellmap& dest-map):
1:
e x t e r nv o i d enter-display-mode(void):
e x t e r nv o i d exit-display-mode(void):
e x t e r nv o i dd r a w - p i x e l ( u n s i g n e di n t X . u n s i g n e di n t Y.
u n s i g n e d in t C o l o r :
e x t e r nv o i ds h o w - t e x t ( i n tx .i n t y . c h a r* t e x t ) :

/* C o n t r o l st h es i z eo ft h ec e l l map. Mustbe w i t h i nt h ec a p a b i l i t i e s
o ft h ed i s p l a y mode,andmustbe limitedtoleave room f o r t e x t
d i s p l a ya tr i g h t . */
u n s i g n e di n tc e l l m a p - w i d t h -96;
u n s i g n e di n tc e l l m a p - h e i g h t = 96:
/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l as d i s p l a y e d on s c r e e n . * /
u n s i g n e di n tm a g n i f i e r - 2:

The Game of Life 325


Previous Home Next
voidmain0
(

u n s i g n e dl o n gg e n e r a t i o n
chargen-textC801;
-
u n s i g n e di n ti n i t - l e n g t h .x .y ,s e e d :
0;

l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :

c e l l m a p current-map(cel1map-height. cellmap-width);
c e l l m a p next-map(cel1map-height. cellmap-width):

11 Gettheseed:seedrandomly i f 0 entered
c o u t << "Seed ( 0 f o r randomseed): ":
c i n >> seed:
i f (seed -
0 ) seed -
( u n s i g n e d )t i m e ( N U L L 1 :

11 Randomly i n i t i a l i z e t h e i n i t i a l c e l l map
c o u t << " I n i t i a l i z i n g . . .";
srand(seed);
init-length
do {
- ( c e l l m a p - h e i g h t * c e l l m a p - w i d t h ) I 2;

x - random(cel1map-width);
y - random(cel1map-height);
n e x t - m a p . s e t - c e l l ( x ,y ) :
3 w h i l e( - i n i t - l e n g t h ) ;
current_map.copy-cells(next_map): 11 p u t i n i t map i n current-map

enter-display-mode():

/ I Keep r e c a l c u l a t i n g and r e d i s p l a y i n gg e n e r a t i o n su n t i l a key


/ I i sp r e s s e d
show-text(0. MSG-LINE, " G e n e r a t i o n : "1;
start-bios-time
do (
--bios-timeofday(-TIME-GETCLOCK, &bios-time);

generation++;
s p r i n t f ( g e n - t e x t . " % 1 0 1 u " g. e n e r a t i o n ) ;
s h o w - t e x t ( 1 . GENERATION-LINE. g e n - t e x t ) :
/ I R e c a l c u l a t ea n dd r a wt h en e x tg e n e r a t i o n
current_map.next-generation(next-map);
/ I Make c u r r e n t - m a pc u r r e n ta g a i n
[Link]-cells(next~map):
# i f LIMIT-18-HZ
/ I L i m i t t o amaximum o f1 8 . 2f r a m e sp e rs e c o n d . f o rv i s i b i l i t y
do I
-bios-timeofday(-TIMELGETCLOCK. &bios-time):
3 w h i l e( s t a r t - b i o s - t i m e
start-bios-time - bios-time:
- bios-time):

#endif
I w h i l e( ! k b h i t O ) ;
g e t c h ( 1: 11 c l e ak re y p r e s s
exit-display-mode();
c o u t << " T o t a lg e n e r a t i o n s : " << g e n e r a t i o n << "\nSeed: " <<

seed << "\n":


3
I* c e l l m a pc o n s t r u c t o r . *I
cellmap::cellmap(unsigned i n t h .u n s i g n e di n t w)
{
width w; -
- (w + 7) I 8;
width-in-bytes
height h; -
326 Chapter 17
Previous Home Next

/* cellmap destructor. */
cellmap::-cellmap(void)
I
delete[] cells:
1
/* Copies one cellmap's cells to another cellmap. Both cellmaps are
assumed to be the same size. */
void cel1map::copy-cells(cel1map &sourcemap)
(
memcpy(cel1s. [Link], length-in-bytes):
I
/ * Turns cell on. */
void cellmap::set_cell(unsigned int x. unsigned int y )
r
unsigned char *cell-ptr =
cells + (y * width-in-bytes) + (x / 8 ) ;

*(cell_ptr) I- Ox80 >> (x & 0x07):


1
/ * Turns cell off. * /
void cellmap::clear_cell(unsigned int x. unsigned int y)
f
unsigned char *cell-ptr -
cells + (y * width-in-bytes) + (x / 8 ) ;

*(cell-ptr) &- -(Ox80 >> (x & 0x07)):


I
/* Returns cell state (1-on or 0-off). optionally wrapping at the
borders around to the opposite edge.* /
int cel1map::cell-state(int x. int y)
(
unsigned char *cell-ptr:

#if WRAP-EDGES
while (x < 0 ) x +- width: / / wrap, if necessary
while (x >- width) x -- width:
while (y < 0 ) y +- height:
while (y >- height) y -- height;
#else
if ((x < 0 ) 1 1 (x >- width) 1 ) (y < 0 ) 1 1 (y >- height))
return 0 : / / return 0 for off edges if no wrapping
lendi f
cell-ptr - cells + (y * width-in-bytes) + (x / 8 ) ;
return (*cell-ptr & (0x80 >> (x & 0x07))) ? 1 : 0;
1
/ * Calculates the next generation of a cellmap and stores it in
next-map. * /
void ce1lmap::next-generation(cellmap& next-map)
t
unsigned int x. y. neighbor-count;

The Game of Life 327


Previous Home Next
f o r ( y - 0 ;y < h e i g h t : y++) {
f o r ( x - 0x; < w i d t h ; x++) t

neighbor-count
cell-state(x+l.
-
/ / F i g u r eo u t how many n e i g h b o r st h i sc e l lh a s
cell-state(x-1. y-1) + c e l l - s t a t e ( x . y-1)
y-1) + cell-state(x-1, y) +
+

c e l l - s t a t e ( x + l .y ) + c e l l - s t a t e ( x - 1 . y+l) +
c e l l s-t a t e ( x . y+l) + cell-statetx+l.
i f (cell-state(x, y) -
1) I
/ / The c e l l i s on; does i t s t a y on?
y+l);

if ( ( n e i g h b o r - c o u n t !- 2 ) && ( n e i g h b o r - c o u n t != 3 ) ) I
[Link]-cell(x. y); / / t u r n it o f f
d r a w - p i x e l ( x . y . OFF-COLOR);
I
I else t
/ / The c e l l i s o f f :
i f (neighbor-count
[Link]-cell(x.
3) I
y);
--
does it t u r n on?

/ / t u r n i t on
d r a w - p i x e l ( x y, . ON-COLOR):
I
I
I
1
I

LISTING 17.2 [Link]


/* VGA mode 1 3 h f u n c t i o n s f o r Game o f L i f e .
T e s t e dw i t hB o r l a n d C++. */
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
l i n c l ude <dos. h>

# d e f i n e TEXT-X-OFFSET 27
# d e f i n e SCREEN-WIDTH-IN-BYTES 320

/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l . */
e x t e r nu n s i g n e di n tm a g n i f i e r ;

/* Mode 1 3 hd r a wp i x e lf u n c t i o n .P i x e l sa r eo fw i d t h & height


s p e c i f i e db ym a g n i f i e r . */
v o i dd r a w - p i x e l ( u n s i g n e di n tx .u n s i g n e di n t y . u n s i g n e di n tc o l o r )
t
# d e f i n e SCREEN-SEGMENT OxAOOO
u n s i g n e dc h a rf a r* s c r e e n - p t r ;
i n t i. j ;

FP-SEG(screen-ptr) -SCREEN-SEGMENT;
FP_OFF(screen-ptr) -
y * m a g n i f i e r * SCREEN-WIDTH-IN-BYTES + x * magnifier;
f o r( i - 0 ;i < m a g n i f i e r : i++) I
f o r (j-0; j < m a g n i f i e r ; j++) t

I
*(screen-ptr+j) -
color;

screen-ptr +- SCREEN-WIDTH-IN-BYTES;
I
I
/* Mode 13h m o d e - s e tf u n c t i o n . */
v o i de n t e r - d i s p l a y - m o d e 0
{
u n i o n REGS r e g s e t :

328 Chapter 17
Previous Home Next
r e g s e t . x . a x = 0x0013;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
1

I* T e x t mode m o d e - s e tf u n c t i o n . */
v o i de x i t - d i s p l a y - m o d e 0
{
union REGS regset:

r e g s e t . x . a x = 0x0003;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
1

/* T e x td i s p l a yf u n c t i o n .O f f s e t st e x tt on o n - g r a p h i c sa r e ao f
screen. * I
v o i ds h o w - t e x t ( i n tx .i n ty .c h a r* t e x t )
I
gotoxy(TEXTpX_OFFSET + x . y ) :
puts(text):
I

Where Does the Time Go?


How slow isListing 17.1?Table 17.1 shows that even on a 486, Listing 17.1 does fewer
than three 96x96 generations per second. (The times in Table 17.1 are for 1,000
generations of a 96x96 cell map with seed=l, LIMIT-l8-HZ=O, M”-EDGES=l,
and mapifier=2, running on a 33 MHz 486.) Since my target is 18 generations per
second with a 200x200 cellmap on a 20 MHz 386, Listing 17.1 is too slow by a rather
wide margin-about 75 times too slow, in fact. You might say we have a little optimiz-
ing to do.
The first rule of optimization is: Only optimize where it matters. Use a profiler, or
risk making a foolof yourself. Consider Listings 17.1 and 17.2. Where do you think

The Game of Life 329


Previous Home Next
the potential for significant speed-up lies? I’ll tell you one place where I thought
there was considerable potential-in draw-pixel(). As a programmer of high-speed
graphics, I figuredany drawing function thatwas not only written in C/C++ but also
recalculated the target address from scratch for eachpixel would be among the first
optimization targets. I also expected to get major gains out of going to a Ping-Pong
arrangement so that I didn’t have to copy the new cellmap back to current-map
after calculating the next generation.
I was wrong. Wrong, wrong, wrong. (But at least I was smart enough to use a profiler
before actually writing any newcode.) Table 17.1 shows where the time actually goes
in Listings 17.1and 17.2. As you can see, the time taken by draw-pixel(), copy-cells(),
and atmythingother than calculating the next generationis nothing more than noise.
We could optimize these routines right down toexecuting instantaneously, and you know
what? It wouldn’t make the slightest perceptible difference in how fast the program
runs. Given the present state of our Game of Life implementation, the only areas
worth looking at forpossible optimizations are cell-state() and nextsenerationo.

Its worth noting, though, that one reason drawqixelo doesn ’t much affectperfor-
p mance is that in Listing 17.1, we 5-e smart enough to redrawpixels only when their
states change, rather than during every generation. Detecting and eliminating re-
dundant operations is part of knowing the nature of your data, and is a potent
optimization technique thatwill be extremely usefula little later in this chapter.

The Hazards and Advantages of Abstraction


How can we speed up cell-state() and nextsenerationo? I’ll tell you how not to do
it: By writing those member functionsin assembly. It’stempting to say that cell-state()
is taking all the time, so we need to speed itup with assembly,but what we really need
to do is figure out why cell-state() is taking all the time, then address that aspect of
the programdirectly.
Once you know where you need to optimize, the one word to keep in mind isn’t
assembly, it’s.. .plastics. No, actually, it’s abstraction. Well-writtenC and especially C++
programs arehighly abstract models. For example, Listing 17.1 essentiallycreates a
new programming language in which cellsare tangible things, with built-in manipu-
lation instructions. Given the cellmap member functions, you don’t even need to
know the cell storage format! This is a wonderful thing, in general; saves it program-
ming time and bugs, and frees you to work on the application’s needs, rather than
implementation details.

However, ifyou never look beneath the suflaceof the abstract model at the implemen-
p tation details,you have noidea ofwhat thetruepe$nnance cost of various operations
is, and, withoutthat, you have largeb surrendered control over performance.

330 Chapter 17
Previous Home Next
Having said that, let me hasten to add that algorithmic improvements can make a
big difference even when working at a purely abstract level. For a large unordered
data set, a high-level Quicksort will beat the pants off the best-implemented inser-
tion sort you can imagine. Still, you can optimize your algorithm from here 'til
doomsday, and if you have a fast algorithm running ontop of a highly abstract pro-
gramming model, you'll almost certainly end up with a slow program. In Listing
17.1, the abstraction that's killing us is that of looking at the eight neighbors with
eight completely independent operations, requiring eight calls to cell-state() and
eight calculations of cell address and cell mask. In fact, given the nature of cell stor-
age, the eight neighbors are in a fixed relationship to one another, and theaddresses
and masks of all eight can generally be foundvery easily via hard-wired offsets and
shifts once theaddress and mask of anyone is known.
There's a kicker here, though, and that's the counting of neighbors forcells at the edge of
the cellmap. When cellmap wrapping is enabled (so that the cellmap becomes essentially a
toroid, with each edge joined seamlessly to the opposite edge, as opposed to having a
border of offcells), neighbors that reside on the otheredge of the cellmap can't be
accessed by the standard fixed offset, as shown in Figure 17.1. So, in general, we could
improve performance by hard-wiring our neighborcountingfor the bit-percell cellmap

The left neighbors for this


cell are not at the usual
adjacent addresses... ...but are rather
on

L
the other side of the
cellmap.
1
J
Cellmap

All neighbors for this cell are at the


usual adjacent addresses.

Edge-wrapping complications.
Figure 17.1

The Game of Life 331


format, butit seems we’d need a lotof conditional codeto handle wrapping, and that
would slow things back down again.
When a problem doesn’t lend itself well to optimization, make it a practice to see if
you can change the problem definition to one thatallows for greater efficiency. In
this case, we’llchange the problemby putting padding bytes around the edgeof the
cellmap, and duplicating each edgeof the cellmapin the paddingbytes at the oppo-
site side, as shown in Figure 17.2. That way, a hard-wired neighbor count will find
exactly whatit should-the opposite edge-without any special code at all.
But doesn’t that extracopying of the edges take time? Sure, butonly a little;we can
build itinto the cellmapcopying function, and thenfrankly we won’t even notice it.
Avoiding tens or hundredsof thousands of calls to cell-state(), on the other hand,
will be very noticeable. Listing 17.3 shows the alterations toListing 1’7.1required to
implement ahard-wired neighborcounting [Link] is a minor change, in truth,
implemented in abouthalf an hour and not making the codesignificantly larger-
but Listing 17.3 is 3.6 times faster thanListing 17.1, as shownin Table [Link]’re up
to about 10 generations per second on a 486; not where we want to be, but it is a
vast improvement.

All neighbors for this cell are at


the usual adjacent addresses,
thanks to the padding cells.
Fbdding Cells -

*
Fbdding Cells
I I I
1
JI I
0 0 / 0 O O O 0 0 . 0 J

Cellmap

Boundary of normal cellmap (excluding padding cells).

The “adding cells” solution.


Figure 17.2

332 Chapter 17
LISTING 17.3 11 [Link]
/* c e l l m a pc l a s sd e f i n i t i o n ,c o n s t r u c t o r ,c o p y - c e l l s o ,s e t L c e l l 0 ,
c l e a r - c e l l O .c e l l L s t a t e 0 .c o u n t L n e i g h b o r s 0 . and
n e x t - g e n e r a t i o n 0f o rf a s t ,h a r d - w i r e dn e i g h b o rc o u n ta p p r o a c h .
O t h e r w i s e ,t h e same as L i s t i n g 1 7 . 1 * /

c l a s sc e l l m a p 1
private:
u n s i g n e dc h a r* c e l l s ;
u n s i g n e di n tw i d t h :
u n s i g n e di n tw i d t h - . i n - b y t e s ;
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s ;
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void);
v o i dc o p y - c e l l s ( c e l 1 m a p& s o u r c e m a p ) :
v o i ds e t - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) :
v o i dc l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) ;
intcell-state(int x. i n t y ) :
i n tc o u n t - n e i g h b o r s ( i n tx .i n ty ) ;
v o i d next-generation(cellmap& dest._map);
}:

/* c e l l m a pc o n s t r u c t o r . P a d sa r o u n dc e l ls t o r a g ea r e aw i t h 1 extra
b y t e ,u s e df o rh a n d l i n ge d g ew r a p p i n g . *I
cellmap::cellmap(unsigned i n t h .u n s i g n e di n t w)
i
w i d t h = w;
width-in-bytes = ((w + 7) / 8) + 2: / / p a de a c hs i d ew i t h
/ / 1 e x t r ab y t e
height = h;
length-in-bytes = width-in-bytes * ( h + 2); / / p atdo p / b o t t o m
I / w i t h 1 e x t r ab y t e
cells - new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] ; / / c e sl lt o r a g e
m e m s e tl(ecne0gl.1t hs-. i n - b y t e s ) : / I c cl aseeltatllalorsr t.
1
/ * C o p i e so n ec e l l m a p ' sc e l l st oa n o t h e rc e l l m a p . I f wrapping i s
e n a b l e d .c o p i e se d g e( w r a p )b y t e si n t oo p p o s i t ep a d d i n gb y t e si n
s o u r c ef i r s t , s o t h a tt h ep a d d i n gb y t e so f fe a c he d g eh a v et h e
same v a l u e sa sw o u l db ef o u n db yw r a p p i n ga r o u n dt ot h eo p p o s i t e
[Link] t o b et h e same s i z e . * /
v o i d cel1map::copy-cells(cel1map &sourcemap)
I
u n s i g n e dc h a r* c e l l - p t r ;
i n t i;

# i f WRAP-EDGES
/ / Copy l e f t and r i g h t edges i n t op a d d i n gb y t e s on r i g h t and l e f t
c e l l - p t r = [Link] + width-in-bytes:
f o r (i=O i;< h e i g h t ; i++) {
* c e l l - p t r = * ( c e l l - p t r + width-in-bytes - 2):
* ( c e l l - p t r + width-in-bytes - 1) = * ( c e l l L p t r + 1 ) :
c e l l - p t r += w i d t h - i n - b y t e s :
I
/ / Copy t o pa n db o t t o me d g e si n t op a d d i n gb y t e s on b o t t o ma n dt o p
rnemcpy([Link], s o u r c e m a p . c e l l s + l e n g t h - i n - b y t e s -
(width-in-bytes * 2 ) . width-in-bytes):
memcpy(sourcemap.cel1s + l e n g t h - i n - b y t e s - width-in-bytes.
sourcemap.cel1.s + w i d t h - i n - b y t e s .w i d t h - i n - b y t e s ) ;

The Game of Life 333


#endi f
/ / Copy all cells to the destination
memcpy(cel1s. [Link]. length-in-bytes);
I
/ * Turns cell on. x and y are offset by 1 byte down and to the right,to compensate for the
padding bytes around the cellmap. * I
void ce1lmap::set-cell(unsigned int x . unsigned int y)
e
unsigned char *cell-ptr -
cells + ((y + 1) * width-in-bytes) + ( ( x / 8 ) + 1);

*(cell-ptr) I- Ox80 >> ( x & 0x07);


1

/ * Turns cell off. x and y are offset by 1 byte down and to the right,
to compensate for the padding bytes around the cell map. */
void cel1map::clear-cell(unsigned int x . unsigned int y)
e
unsigned char *cell-ptr -
cells + ((y + 1) * width-in-bytes) + ( ( x / 8 ) + 1):

*(cell-ptr) &- -40x80 >> (x & 0x07));


I
/ * Returns cell state (1-on or 0-off). x and y are offset by 1 byte
down and to the right. to compensate for the padding bytes around
the cell map. */
i n t cel1map::cell-state(int x . int y)
{
unsigned char *cell-ptr -
cells + ((y + 1) * width-in-bytes) + ( ( x / 8 ) + 1);

return (*cell-ptr & (Ox80 >> ( x & 0 x 0 7 ) ) ) ? 1 : 0;


1
*/
/ * Counts the number of neighboring on-cells for specified cell.
int cel1map::count-neighbors(int x . int y)
c
unsigned char *cell-ptr. mask;
unsigned int neighbor-count:

/ / Point to upper left neighbor

- -
cell-ptr cells + ((y * widthkin-bytes) + ( ( x + 7 ) / 8 ) ) ;
mask Ox80 >> ( ( x - 1) & 0 x 0 7 ) ;

neighbor-count -
/ / Count upper left neighbor
(*cell-ptr & mask) ? 1 : 0;
/ / Count left neighbor
if ((*(cell-ptr +-width-in-bytes) & mask)) neighbor-count++;
/ / Count lower left neighbor
if ((*(cellLptr + (width-in-bytes * 2)) & mask)) neighbor-count++;

if ((mask
mask -
>>- 1)
0x80;
0) -
I / Point to upper neighbor

cell-ptr++;
I
I / Count upper neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ / Count lower neighbor

334 Chapter 17
i f ((*(cell-ptr + (width-in-bytes * 2 ) ) & mask))
neighbor-count++;

I 1 P o i n tt ou p p e rr i g h tn e i g h b o r
i f ((mask >>- 1) = 0 ) {
mask = 0x80:
cell-ptr++;
I
/ / C o u n tu p p e rr i g h tn e i g h b o r
i f ( ( * c e l l _ p t r & mask))neighbor-count++;
/ / Count r i g h t n e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytes) & mask))neighbor-count++:
I / C o u n tl o w e rr i g h tn e i g h b o r
i f ( ( * ( c e l l L p t r + (width-in..bytes * 2 ) ) & mask))
neighbor-count++;

r e t u r nn e i g h b o r - c o u n t :
1
/* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s it i n
next-map. * I
v o i d cellmap::next_generation(cellmap& nexttmap)
f
u n s i g n e di n tx .y .n e i g h b o r - c o u n t :

f o r( y - 0 ;y < h e i g h t : y++) 1
f o r (x=O; x < w i d t h ; x++) I
n e i g h b o r - c o u n t = c o u n t - n e i g h b o r s ( x .y ) :
i f ( c e l l - s t a t e ( x .y ) == 1) I
i f ( ( n e i g h b o r - c o u n t != 2 ) && ( n e i g h b o r - c o u n t != 3 ) )
n e x t - m a p . c l e a r - c e l l ( yx ), : / I t u r n it o f f
d r a w - p i x e l ( x , y . OFF-COLOR):
1
I else
i f ( n e i g h b o r - c o u n t == 3 ) {
n e x t - m a p . s e t - c e l l y( x) :. / I t u r n i t on
d r a w - p i x e l ( x . y . ONKCOLOR):
I
1
1
1

In Listing 17.3, note the padded cellmap edges, and the alterationof the member
functions to compensate for the [Link] note that the width now has to be a
multiple of eight, to facilitate the process of copying the edges to the opposite padding
bytes. We have decreased the generality of our Game of Life implementation in ex-
change for better performance. That’s a very common trade-off, ascommon as trading
memory for [Link] a rule, the more generala programis, the slower it is.
A corollary is that often (not always, but often), the moreheavily optimized a pro-
gram is, the more complex and the moredifficult to implement it is. You can often
improve performance a good dealby implementing only the level of generality you
need, but at the same time decreased generality makes it moredifficult to change or
port the program a t some later date. A Game of Life implementation, such as Listing
17.1, that’s built on set-cell(), clear-cell(), and get-cell() is completely general; you

The Game of Life 335


can change the cell storage format simply by changing the constructor and those
three functions. Listing 17.3 is harder to changebecause count-neighborso would
also have to be altered, and it’s more complex than any of the otherfunctions.
So, in Listing 17.3, we’ve gotten under the hood and changed the cellmap format a
little, and gotten impressive results. But now count-neighborso is hard-wired for
optimized counting, and it’s stilltaking up more thanhalf the time. Maybe now it’s
time to go to assembly?
Not hardly.

Heavy-Duty C++ Optimization


Before we get to assembly, we still haveto perform C++ optimization, then see if we can
find an alternative approach that better fits the application. It would actually have made
much more sense if wehad looked for a new approach as our first optimization step, but
I decided it would be better to cover straightforwardC++ optimizationsat this point, and
the mind-bending stuff a little later. Right now, let’s look at some C++ optimizations;
Listing 17.4is a C++-optimizedversion of Listing 17.3.

LISTING 17.4 [Link]


I* n e x t L g e n e r a t i o n 0 .i m p l e m e n t e du s i n gf a s t ,a l l - i n - o n eh a r d - w i r e d
n e i g h b o rc o u n t / u p d a t e / d r a wf u n c t i o n .O t h e r w i s e ,t h e same as
L i s t i n g1 7 . 3 . *I

I* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s it i n
next-map. * I
v o i d cel1map::next-generation(cellmap& next-map)

u n s i g n e di n tx . y. neighbor-count:
u n s i g n e d i n t wi dth-in-bytesX2 - width-in-bytes << 1;
u n s i g n e dc h a r* c e l l L p t r .* c u r r e n t L c e l l - p t r .m a s k ,c u r r e n t t m a s k ;
u n s i g n e dc h a r* b a s e - c e l l - p t r *. r o w - c e l l - p t r b. a s e - m a s k ;
u n s i g n e dc h a r* d e s t - c e l l - p t r = [Link];

11 P r o c e s sa l lc e l l si nt h ec u r r e n tc e l l m a p
row-cel1-ptr - cells; / / p o i n tt ou p p e rl e f tn e i g h b o ro f
/I firstcellincell map
f o r( y - 0 :y < h e i g h t : y++) [ / I r e p e a tf o re a c hr o wo fc e l l s
11 C e l lp o i n t e ra n dc e l lb i t mask f o r f i r s t c e l l i n row
base-cell-ptr = row-cell-ptr; / I t o access upper l e fnt e i g h b o r
base-mask = 0x01: / If iciornesf ltl row
f o r ( x -x0<: w i d t h ; x++) [ / I r e p e faoet ra ccheri lnol w
/ I F i r s t ,c o u n tn e i g h b o r s

cell-ptr
mask = basecmask;
-
/ / P o i n tt ou p p e rl e f tn e i g h b o ro fc u r r e n tc e l l
base-cell-ptr; / I p o i n t e ra n db i t
11 u p lnpeef irtg h b o r
mask f o r

/ I Countupper l e f tn e i g h b o r
neighbor-count -
/ / Count l e f t n e i g h b o r
( * c e l l L p t r & mask) ? 1 : 0;

i f ( ( * ( c e l l - p t r + width-in-bytes) & mask))


neighbor-count++;
/ I C o u n tl o w e rl e f tn e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytesX2) & mask))
neighbor-count++:

336 Chapter 17
/ / Point t o upper neighbor
if ((mask >>- 1) --
0) I
-
mask 0x80:
cell-ptr++:
1
/ / Remember where to find the current cell
current-cell-ptr -
cell-ptr + widthkin-bytes:
current-mask -
mask:
/ I Count upper neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ I Count lower neighbor
if ((*(cell-ptr + widthkin-bytesX2) & mask))
neighbor-count++;
/ I Point to upper right neighbor
if ((mask >>- 1)
-
mask 0x80:
-0) I

cell-ptr++:
1
/ I Count upper right neighbor
if ((*cell-ptr & mask)) neighbor-count++;
/ I Count right neighbor
if ((*(cell-ptr + width-in-bytes) & mask))
neighbor-count++:
/ / Count lower right neighbor
if ((*(cell-ptr + width-in-bytesX2) & mask))
neighbor-count++:
if (*current-cellLptr & current-mask) t
if ((neighbor-count !- 2) && (neighbor-count !- 3 ) ) t
*(dest-cell-ptr + (current-cell-ptr - cells)) &-
-current-mask: / / turnoff cell
draw-pixel(x. y . OFF-COLOR):
1
1 else I
if (neighbor-count --3) {
*(dest-cell-ptr + (current-cell-ptr - cells)) 1-
current-mask; / / turnon cell
draw-pixel(x. y . ON-COLOR):
1
I
/ / Advance t o the next cell on row
if ((base-mask >>- 1)
base-mask -
0x80:
--0) {

base-cell_ptr++: / / advance to the next cell byte


I
1
row-cell-ptr +- width-in-bytes: // point to start o f next row
1
I

Listing 17.4 and Listing 17.3 are functionally the same; the only difference lies in
how nextsenerationo is implemented. (Only nextsenerationo is shownin Listing
1’7.4;the program is otherwise identical to Listing 17.3.) Listing 17.4 applies the
following optimizations to nextsenerationo:
The neighbor-counting code is brought into nextseneration, eliminating many func-
tion calls and from-scratch address/mask calculations; all multiplies are eliminated by
using pointers and addition; and all cellsare accessed directly via pointers and masks,
eliminating all remaining functioncalls and from-scratch address/mask calculations.

The Game of Life 337


The neteffect of these optimizations is that Listing 17.4is more than twice as fast as
Listing 17.3;we’ve achieved the desired 18 generationsper second, albeit only on a
486, and only at 96x96. (The #define that enables codelimiting the speedto 18 Hz,
which seemed ridiculous in Listing 17.1, is actually useful for keeping the genera-
tions from iterating tooquickly when Listing 17.4 is running ona 486, especially with
a small cellmap like 48x48.) We’ve sped things up by about eight times so far; we
need to increase our speed another ten times to reach our goal of 200~200at 18
generations per second ona 20 MHz 386.
It’s undoubtedly possible to improve the performanceof Listing17.4 further by fine-
tuning thecode, but no tremendous improvement is possible that way.

Once you’ve reached the point offine-tuningpointer usage andregister variables


p and the likein Cor C++, you ’ve become compiler-dependent; you therefore might
as well go to assembly and get the real McCoy.

We’re still not ready for assembly, though; what we need is a new perspective that
lends itself to vastly better performancein C++.The Life program in the nextsection
is three to seven times faster than Listing 17.4-and it’s still in C++.
How is this possible? Here aresome hints:
After a few dozen generations, mostof the cellmap consists of cellsin the off state.
There are many possible cellmap representations other than one bit-per-pixel.
Cellschangestaterelativelyinfrequently.

Bringing In the Right Brain


In the previous section, we saw how a C++ program could be sped up about eight
times simply by rearranging the data and code in straightforward ways. Now we’re
going to see how right-brain non-linear optimization can speed things up by another
four times-and make the codes i m p h .
Now that’s Zen code optimization.
I have two objectives to achieve in the remainderof this chapter. First, I wantto show
that optimization consists of many levels, from assembly language up to conceptual
design, and that assembly language kicks in pretty late in the optimization process.
Second, I want to encourage you to saturate your brain with everything you know
about any particular optimization problem, then makespace for your right brain to
solve the problem.

Re-Examining the Task


Earlier in this chapter, we looked at a straightforward Game of Lifeimplementation,
then increased performance considerably by making the implementation littlea less
abstract and a little less general. We made a small change to the cellmap format,

338 Chapter 17
adding paddingbytes offthe edges so that pointer arithmetic would always work, but
the major optimizationswere moving the critical code into asingle loop and using
pointers rather than member functions whenever possible. In otherwords, we took
what we already knew and madeit more efficient.
Now it’s time to re-examine the nature of this programming task from the ground
up, looking for things that we don’t yet know. Let’s take a moment toreview whatthe
Game of Life consists of. The basic task is evolving a new generation, andthat’s done
by looking at the numberof “on” neighbors acell has and the cell’s own state. If a
cell is on, andtwo or three neighbors are on, thencell thestays on; otherwise, an on-
cell is turned off. If a cell is off and exactly three neighbors areon, then the cell is
turned on;otherwise, an off-cell stays [Link]’s all there is to it. As any fool cansee,
the trick is to arrangethings so that we can count neighbors and check the cell state
as quickly aspossible. Large lookup tables, oddly encoded cellmaps, and lots of bit-
twiddling assembly code spring to mind as possible approaches. Can’tyou just feel
your adrenaline start to pump?

Relax. Step [Link] to divine the true natureof theproblem. The objectis not to
p count neighbors and check cell states as quickly as possible; thatk just one pos-
sible implementation. The object is to determine whenb state
a cellmust be changed
and to change it appropriately, andthat’s what we needto do asquickly us possible.

What difference does that new perspective make? Let’s approach it this way. What
does atypical cellmap look like? As it happens, after afew generations, thevast ma-
jority of cellsare off. In fact, the vast majority of cells are notonly off but areentirely
surrounded by off-cells. Also, cells change state infrequently; in any given genera-
tion after the first few, most cellsremain in the same state as inthe previous generation.
Do you see where I’m heading? Do you hear a whisper of inspiration fromyour right
brain? The original implementation storedcell states as 1-bits (on), or0-bits (off).
For each generation and for eachcell, it counted thestates of the eight neighbors,
for an average of eight operations per cell per generation. Suppose, now, that on
average 10 percent of cells change state from one generation to the next. (The ac-
tual percentageis even lower, but this will do for illustration.) Supposealso that we
change the cell map format to store abyte rather than a bit for each cell, with the
byte storing notonly the cell state but also the countof neighboring on-cells for that
cell. Figure 17.3 shows this format. Then, rather than counting neighbors each time,
we could just look at the neighbor countin the cell and operatedirectly from that.
But what about the overhead needed to maintain the neighbor counts? Well, each
time a cell changes state, eight operations would be needed to update the countsin
the eight neighboring cells. Butthis happens only once every ten cells, on average-
so the cost of this approach is only one-tenth that of the original approach!
Know your data.

The Game of Life 339


Acting on WhatWe Know
Once we’ve changed the cellmap format to store neighbor countsas well as states,
with a byte for each cell, we can get another performance boost by again examining
what we know about ourdata. I said earlier that most cells are off during any given
generation. This means that most cells haveno neighborsthat are [Link] the cell
map representation for an off-cell that has no neighbors is a zero byte, we can skip
over scads ofunchanged cells at apop simply by scanning fornon-zero bytes. This is
much faster than explicitly testing cell states and neighbor counts, and lends itself
beautifully to assembly language implementation as REPZ S W B or (with a little
cleverness) REPZ SCASW. (Unfortunately, there’s no C library function that can
scan memory for the next byte that’s non-zero.)
Listing 17.5 is a Gameof Lifeimplementation thatuses the neighbor-countcell map
format andscans for non-zero bytes. On a 20 MHz 386, Listing17.5 is about 4.5 times
faster at calculating generations (that is, the generation engine is 4.5 times faster;
I’m ignoring the time consumed by drawing and text display) than Listing 17.4,
which is no slouch. On a 33 MHz 486, Listing 17.5 is about 3.5 times faster than
Listing 17.4. This is true even though Listing 17.5 must be compiled using the large
model. Imagine that-getting a four times speed-up while switching from the small
model to the large model!

LISTING17.511 [Link]
/* C++ Game o f L i f e i m p l e m e n t a t i o n f o r a n y mode f o r w h i c h mode s e t
a n dd r a wp i x e lf u n c t i o n sc a nb ep r o v i d e d .T h ec e l l m a ps t o r e st h e
n e i g h b o rc o u n tf o re a c hc e l la sw e l la st h es t a t eo fe a c hc e l l :
t h i sa l l o w sv e r yf a s tn e x t - s t a t ed e t e r m i n a t i o n . Edgesalwayswrap
i nt h i si m p l e m e n t a t i o n .
T e s t e dw i t hB o r l a n d C++. To r u n . l i n k w i t h L i s t i n g 17.2
i nt h el a r g em o d e l . */
# i n c l u d e < s t d l ib. h>
#i n c l u d e < s t d 0. i h>
# i n c l u d e< i o s t r e a m . h >
# i n c l u d e< c o n i o . h >

340 Chapter 17
# i n c l u d e< t i m e . h >
#i n c l ude <dos .h>
fkin c l u d e < b i o s . h >
#i n c l u d e <mem. h>

#define ONKCOLOR 1 5 /I o n - c pe il xlceoll o r


#define OFF-COLOR 0 /I o f f - c ep li lxceol l o r
P d e f ine MSG-LINE 10 /I row f o tre x t messages
/I row f o r g e n e r a t i o n # d i s p l a y
#define
#define
GENERATION-LINE 1 2
LIMIT-18-HZ 0 / / s e t 1 t ot o maximum f r a m rea t e - 18Hz
c l a s sc e l l m a p {
private:
unsigned char *cell s :
u n s i g n e dc h a r* t e m p - c e l l s :
u n s i g n e di n tw i d t h :
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s :
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void):
v o i ds e t - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) :
v o i d c l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n t y);
intcell-state([Link] y):
i n tc o u n t - n e i g h b o r s ( i n tx .i n t y):
v o i dn e x t - g e n e r a t i o n ( v o i d ) :
v o i di n i t ( v o i d ) ;
I:
e x t e r nv o i d enter-displaymode(void):
e x t e r nv o i d exit-display-mode(void);
e x t e r nv o i dd r a w - p i x e l ( u n s i g n e di n t X . u n s i g n e di n t Y.
u n s i g n e di n tC o l o r ) ;
e x t e r nv o i ds h o w - t e x t ( i n tx .i n ty .c h a r* t e x t ) ;

I* C o n t r o l s t h e s i z e o f t h e c e l l map. Mustbe w i t h i nt h ec a p a b i l i t i e s
o ft h ed i s p l a y mode, andmustbe l i m i t e dt ol e a v e room f o r t e x t
d i s p l a ya tr i g h t . *I
u n s i g n e di n tc e l l m a p - w i d t h
u n s i g n e di n tc e l l m a p - h e i g h t
96:
96:
--
I* Width & h e i g h t i n p i x e l s o f e a c h c e l l . */
u n s i g n e di n tm a g n i f i e r 2; -
I* Randomizingseed */
unsigned i n t seed:

v o i dm a i n 0
{
u n s i g n e dl o n gg e n e r a t i o n - 0:
chargen-textC801:
l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :

c e l l m a p current-map(cel1map-height. cellmap-width):

current-map.init0: / / r a n d o m l yi n i t i a l i z ec e l l map

enter-di splay-mode( ) :

The Game of Life 341


/ / Keep r e c a l c u l a t i n ga n dr e d i s p l a y i n gg e n e r a t i o n su n t i la n yk e y
/ I i s pressed
s h o w - t e x t ( 0 . MSG-LINE. " G e n e r a t i o n : " ) :
start-bios-time - -bios-timeofday(-TIME-GETCLOCK. &bios-time):
do {
generation++:
s p r i n t f ( g e n - t e x t ". % 1 0 1 u " g. e n e r a t i o n ) ;
show-text(1. GENERATION-LINE, g e n - t e x t ) ;
/ / R e c a l c u l a t ea n dd r a wt h en e x tg e n e r a t i o n
[Link]-generationo;
# i f LIMIT-18-HZ
/ / L i m i t t o amaximum o f 18.2 f r a m e s p e r s e c o n d , f o r v i s i b i l i t y
do
bios-timeofday(-TIME-GETCLOCK.&bios-time):
] w h i l e( s t a r t - b i o s - t i m e
start-bios-time -
bios-time;
-
bios-time);

#endi f
1 w h i l e( ! k b h i t O ) :
getch0; / I c l ekaery p r e s s
e x i t - d i s pay-mode(
l ):
c o u t << " T o t a lg e n e r a t i o n s : " << generation << "\nSeed: " <<
seed << "\n":
1
/* c e l l m a pc o n s t r u c t o r . */
cellmap::cellmap(unsigned i n t h ,u n s i g n e di n t w)

width --
w:

- - -
height h;
length-in-bytes w * h:
cells new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] : / / c e sl lt o r a g e
temp-cells
if ( (cells - NULL) (temp-cells
p r i n t f ( " 0 u to fm e m o r y \ n " ) :
-
new u n s i g n e dc h a r [ l e n g t h - i n - b y t e s l ;
I( NULL) 1 I
I / temp c e l l s t o r a g e

exit(1):
I
memset(cel1s. 0. l e n g t h - i n - b y t e s ) ; I / c l e a ra l lc e l l s ,t os t a r t
I
I* c e l l m a pd e s t r u c t o r . *I
cellmap::-cellmap(void)
I
d e l e t e C l c e l l s;
d e l e t e [ ]t e m p - c e l l s :
1
/ * T u r n sa no f f - c e l lo n ,i n c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s . */
v o i d cel1map::set-cell(unsigned i n t x ,u n s i g n e di n ty )
(
u n s i g n e di n t w - width. h - height:

u n s i g n e dc h a r* c e l l - p t r -
i n tx o l e f t .x o r i g h t .y o a b o v e .y o b e l o w ;
c e l l s + (Y * W) + X:

I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s .
/ / a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l map
if (x
xoleft
0)-- -
w - 1:
else
xoleft -
-1:

342 Chapter 17
i f (y
yoabove
-- -0)
length-in-bytes - w:
else
yoabove
i f (x -- - -w:
(w - 1))
x o r i g h t = - ( w - 1):
else
xoright
i f (y
yobelow
-- -- 1:
(h - 1 ) )
-(length-in-bytes - w):
else
yobelow -
w:

*(cell-ptr) I- 0x01:
* ( c e l l - p t r + yoabove + x o l e f t ) +- 2:
* ( c e l l - p t r + yoabove) +- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) +- 2:
* ( c e l l - p t r + x o l e f t ) +- 2:
* ( c e l l - p t r + x o r i g h t ) +- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) +- 2:
* ( c e l l - p t r + yobelow) +- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) +- 2 :
1

I* T u r n sa no n - c e l lo f f ,d e c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s . *I
v o i d cel1map::clear-cell(unsigned i n tx .u n s i g n e di n ty )
(
u n s i g n e di n t w - width, h - height;
i n tx o l e f t ,x o r i g h t .y o a b o v e .y o b e l o w :
u n s i g n e dc h a r* c e l l - p t r - c e l l s + (y * w ) + x:

I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s ,

if (x
xoleft
--
/ I a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l
0)
w - 1:
map

-- --
else
xoleft -1:
if (y 0)
yoabove lengthkin-bytes - w:
else
yoabove
-- -
- -w:
i f (x
xoright
(w
- 1))
- ( w - 1);

- ---
else
xoright 1:
if ( y (h 1))
yobelow -(length-in-bytes - w):
else
yobelow - w;
* ( c e l l L p t r ) &- -0x01:
* ( c e l l _ p t r + yoabove + x o l e f t ) -- 2:
*(eel 1 - p t r + yoabove ) -- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) -- 2:
*(eel 1 - p t r + x o l e f t ) -- 2:
* ( c e l l _ p t r + x o r i g h t ) -- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) -- 2:
* ( c e l l - p t r + yobelow) -- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) -- 2:
1

The Game of Life 343


I* Returns cell state (1-on or 0-off). *I
int cel1map::cell-statecint x, int y)
{
unsigned char *cell-ptr;

cell-ptr -
cells + ( y * width) + x;
return *cell-ptr & 0x01;
1
I* Calculates and displays the next generation of current-map * I
void cel1map::next-generation0
(
unsigned int x. y. count;
unsigned int h -
height, w width;
unsigned char *cellLptr. *row-cell-ptr;
-
I1 Copy to temp map, s o we can have an unaltered version from
I f which to work
memcpy(temp-cells, cells, length-in-bytes);

/ I Process all cells in the current cell map


cell-ptr -
temp-cells;
for (y-0; y<h; y++) I
I / first cell in cell map
I1 repeat for each row of cells
I1 Process all cells in the current row of the cell map
x
do (
-0:
/ / repeatforeach cellin row
11 Zip quickly through as many off-cells with no

while (*cell-ptr 0) {-
11 neighbors as possible

cell-ptr++; / I advance to the next cell


if (++x >- w) goto RowDone:
1
I / Found a cell that's either on or has on-neighbors,
/ I so see if its state needs tobe changed
-
count *cell-ptr >> 1 ; / I I of neighboring on-cells
if (*cell-ptr & 0x01) I
/ / Cell is on; turn it off if it doesn't have
I1 2 or 3 neighbors
if ((count !- 2 ) && (count !- 3 ) ) (
clear-ce?l(x. y):
draw-pixel(x. y. OFF-COLOR);
1
1 else {

if (count -
I f Cell is off; turn it on if it has exactly 3 neighbors
3) (
set-cell(x. y);
draw-pixel (x. y. ON-COLOR):
1
3
/ I Advance to the nextcell
cell-ptr++; / I advance to the next cell byte
) while (++x < w);
RowDone:
1
1
/* Randomly initializes the cellmap to about 50% on-pixels. * I
void cel1map::initO
{
unsigned int x. y. init-length;

344 Chapter 17
/ / Gettheseed;seedrandomly i f 0 entered
c o u t << “Seed ( 0 f o r randomseed): ”;
c i n >> seed;
i f ( s e e d =- 0 ) seed = ( u n s i g n e d )t i m e ( N U L L ) :

/ / Randomly i n i t i a l i z e t h e i n i t i a l c e l l map t o 50% a n - p i x e l s


/ / ( a c t u a l l yg e n e r a l l yf e w e r ,b e c a u s e some c o o r d i n a t e s will be
/ / r a n d o m l ys e l e c t e dm o r et h a no n c e )
c o u t << “ I n i t i a l i z i n g . . . “ :
srand(seed);
init-length
do {
- ( h e i g h t * w i d t h ) / 2:

x = random(width):
y - random(height);
i f ( c e l l - s t a t e ( x .y ) -= 0 ) 1
s e t - c e l l ( x .y ) ;
I
I w h i l e( - i n i t - l e n g t h ) ;
I

The large modelis actually not necessary for the96x96 cellmap inListing [Link]-
ever, I was actually more interested in seeingfast a 200x200 cellmap, and two 200x200
cellmaps can’t fit in asingle segment. (This caneasily be worked around inassembly
language for cellmaps up to a segment in size; beyond that size, cellmap scanning
becomes pretty complex, although it can still be efficiently implemented with some
clever programming.)
Anyway, using the large model helps illustrate that it’s the data representation and
the dataprocessing approach you choose that mattermost. Optimization details like
memory models and segments and in-line functions andassembly language are im-
portant but secondary. Let your mind roam creatively before you start coding.
Otherwise, you may find you’re writing well-tuned slow code, which is by no means
the same thingas fast code.
Take a close look at Listing 17.5. You will see that it’s quite a bit simpler than
Listing
17.4. To some extent, that’s because I decided to hard-wire the program to wrap
around from one edgeof the cellmap to theother (it’s much more interesting that
way), but the main reason is that it’s a lot easier to work with the neighbor-count
model. There’sno complex mask and pointer management, and the only thing that
reuZ(y needs to be optimized is scanning for zerobytes. (And, in fact,I haven’t opti-
mized even that because it’s done in a C t + loop; it should really be REPZ SCASB.)
In truth, none of the code in Listing 17.5 is particularly well-optimized, and, as I
noted, the program must be compiledwith the large model for large cellmaps. Also,
of course, the entire program is still in C+t; note well that there’s not a whit of
assembly here.

We’vegotten more than a 30-times speedup simply by removing a littleof the ab-
p straction thatC++ encourages, andby storing andprocessing the data in a manner
appropriate for the typical nature of the data itselJ: In other words, we’ve done

The Game of Life 345


Previous Home Next
some linear, left-brained optimization (usingpointers and reducing and calls)
some
non-linear, right-brained optimization (understanding the real problem and lis-
tening for the creative whisperof non-obvious solutions).

No doubt we could get another two to five times improvement with good assembly
code-but that’s dwarfed by a 30-times improvement, so optimization at a concep-
tual level must come first.

The Challenge That Ate My Life


The most recent optimization challenge I laid my community of readers was to write
the fastest possible Game of Lifegeneration [Link] “engine”I meant thatI didn’t
care about time spent in input or output,only time consumed by the call to next-
generation. The time spent updating the cellmap was what I wanted people to
concentrate on.
Here are the rules I laid down for thechallenge:
Readers could modify any code in Listing17.5, except the main loop, as well as
change the cell map representationway any they liked. However, the code had to
produce exactly the same output as Listing 17.5 under all circumstances in order
to be eligible to win.
Engine code had to be less than 400 lines long in total, excluding the video-
related code shown in Listing17.2.
Submissions had to compile/assemble with Borland C++ (in either C++ or C
mode, as desired) and/orTASM.
All submissions had to handle cellmaps at least 200x200 in size.
Assembly language couldof course be used to speed up any part of the program.
. C rather than C++ was legal as well, so long as entered implementations pro-
duced the same results as Listing 17.5 and 17.2 together and were less than
400
lines long.
All entries would be timed on the same 33 MHz 486 with a256K external cache.
That was the challenge I put to thereaders. Little did I realize the challenge it would
lay on me: Entries poured in from the four corners of the globe. Some were plain,some
were brilliant, some were, well, berserk. Many didn’t even work. But all had to be gone
through, examined for adherenceto the rules, read, compiled, linked,run, andjudged.
I learned a lot-about a lot of things,not the least ofwhich was the process (or maybe
the wisdom) of laying down challenges to readers.
Who won? What did I learn? To find out,read on.

346 Chapter 17

You might also like