C Programming

C Programming
C Programming
The notes on these pages are for the courses in C Programming I used to teach in the Experimental
College at the University of Washington in Seattle, WA. Normally these notes accompany fairly
traditional classroom lecture presentations, but they are intended to be reasonably complete (more so, for
that matter, than the lectures!) and should be usable as standalone tutorials.
I originally designed the first, Introductory course around The C Programming Language (2nd Edition)
by Kernighan and Ritchie, and the notes were designed to complement that text, highlighting important
points and explaining subtleties which might be lost on the general reader. Later, I rewrote the notes to
stand on their own (in part because, in spite of the first set of notes, too many of my students found K&R
a bit too technical for an informal, introductory course). Finally, I occasionally teach an Intermediate
course, which covers the topics which tend to be skipped or glossed over in introductory courses (bitwise
operators, structures, file I/O, etc.). The Intermediate course has its own set of notes.
All three sets of notes are available here. If you have a copy of K&R2 and would like a thorough
treatment of the language, read K&R and the ``Notes to Accompany K&R'' side by side. If you're just
getting your feet wet and would like a somewhat simpler introduction, read the `Ìntroductory Class
Notes.'' If you have had an introduction to C (either here or elsewhere) and are now looking to fill in
some of the missing pieces, read the `Ìntermediate Class Notes.''
Of course, just reading a book or these notes won't really teach you C; you will also want to write and run
your own programs, for practice and so that the language concepts will make some kind of practical
sense. Most of my programming assignments (including review questions) are here as well, along with
their solution sets. (No peeking at the answers until you've given the problems your best shot!)
These notes are arranged for the web in the usual hierarchy by section and subsection. If you want to read
through all of them, without keeping track of your own stack to implement a depth-first tree traversal,
just follow the ``read sequentially'' links at the bottom of each page.
Depending on your background, you might want to read one or both of the two preliminary handouts:
one on programming in general, and one which reviews some math which is relevant to programming.
(And there are some other miscellaneous handouts, too.)
One note about the HTML: these pages were produced automatically from the base manuscripts for my
class notes, using a program of my own devising which is, all too typically, not (yet?) perfect. I apologize
in advance for any formatting glitches. In particular, when you see <sup>...</sup> or
<sub>...</sub> in the text, these do not represent bugs in your browser or accidental bugs in my
markup; instead, these are my interim compromise way of representing superscripts and subscripts to
you, since there's no way to do so in portable HTML.
http://www.eskimo.com/~scs/cclass/cclass.html (1 of 2) [22/07/2003 5:07:43 PM]
C Programming
Finally, I realize that reading these notes on the net is not always as convenient as it might be,
particularly when the net is slow. Please realize, though, that the net is what it is, and that I have gone to
a certain amount of effort to place these notes here at all. Please do not ask me to send you a set of these
notes for browsing on your own machine, as I am currently unable to do so.
Handout: A Short Introduction to Programming

Handout: A Brief Refresher on Some Math Often Used in Computing
Readings: Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Readings: Introductory C Programming Class Notes (standalone)
Readings: Intermediate C Programming Class Notes
Assignments: (questions, exercises, and solutions)
introductory class
intermediate class
Other Handouts
This page by Steve Summit // Copyright 1996-9 // mail feedback
http://www.eskimo.com/~scs/cclass/cclass.html (2 of 2) [22/07/2003 5:07:43 PM]
Experimental College
Experimental College
The Experimental College is (I think) Washington's oldest and largest alternative educational resource,
typically offering hundreds of classes serving thousands of students each quarter. Experimental College
classes are taught by local members of the community who love what they're doing and want to teach
you to love it, too. Most classes are conducted in the Seattle area. For much more information, visit the
official Experimental College home page.
http://www.eskimo.com/~scs/expcoll/ [22/07/2003 5:07:44 PM]
C Programming Notes
C Programming Notes
Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Steve Summit
The C Programming Language, or K&R as it is affectionately known, is widely praised by experienced

C programmers as one of the best books on C there is. (It was also the first; it also happens to be a bestseller.) The only real criticism K&R ever receives is that it may not be the best tutorial for beginners; it
seems to assume a certain amount of programming savvy and familiarity with computers. Actually, if
you read it carefully, you'll find that is constantly dispensing wisdom about programming in general,
from basic concepts to deep insights to impeccable commentary on imponderable topics such as
programming style, at the same time it teaches the specifics of the C language. Therefore, the
fundamental criticism may simply be that K&R is not suitable for those who read carelessly.
The authors are not out to save the world or to convert it to their philosophy of programming. When they
say something, they say it once, without theatrics or undue emphasis. If you read the book too quickly, or
skim it, or look only for specific answers to what you think you're trying to learn today, you will miss
much of the excellent advice which the authors have to offer.
These notes were prepared (beginning in Spring, 1995) for the University of Washington Experimental
College course in Introductory C Programming. They are meant to supplement K&R for the reader who
is new to C and perhaps to programming, and who wants a slightly more detailed, less pithy presentation.
I'll add insights from my own experience, in particular by pointing out those areas where people
traditionally misunderstand something about C or K&R's presentation of it. I'll also call out a few of the
very deep sentences, which you might overlook at first even if you're not skimming (perhaps because
their significance only becomes apparent once you've begun writing bigger or more complicated
programs), but which contain advice which is absolutely vital to successful real-world programming and
which, if you can take it to heart early, will save you from a lot of misery out in the school of hard
knocks later on.
Note that most of these notes merely amplify on the things K&R is saying; there isn't much to say that it
doesn't already say, usually better. In particular, many of the things that I'll comment on in the early
chapters are discussed in more detail in the later chapters; by barging in with my know-it-all comments,
I'm partially destroying the authors' careful progression from an initial, slightly superficial overview to a
more detailed, complete presentation. If these notes present more detail than you want to see at first, don't
worry (but please do let me know); just come back to them later to see if they clear up anything you're
still uncertain on. (Also, if you find the description in K&R adequately clear, you don't have to read all of
these notes, but do take note of the highlighted ``deep sentences.'')
http://www.eskimo.com/~scs/cclass/krnotes/top.html (1 of 2) [22/07/2003 5:07:46 PM]
C Programming Notes
Preface
Preface to the First Edition
Introduction
Chapter 1. A Tutorial Introduction
Chapter 2: Types, Operators, and Expressions
Chapter 3: Control Flow
Chapter 4: Functions and Program Structure
Chapter 5: Pointers and Arrays
Chapter 6: Structures
Chapter 7: Input and Output
Read Sequentially
This page by Steve Summit // Copyright 1995, 1996 // mail feedback
http://www.eskimo.com/~scs/cclass/krnotes/top.html (2 of 2) [22/07/2003 5:07:46 PM]
Preface
Preface
page ix
You'll get some hint here that C has become a bit more formal as it has ``grown up.'' That formality is
appropriate, and for the second edition of K&R to acknowledge it is appropriate, and for any modern
course in C programming to teach it is appropriate. Personally, I learned C before it had become quite so
formalized, and occasionally my traditional biases will leak through. I'll try to admit it when they do.
As the authors note, C is a relatively small language, but one which (to its admirers, anyway) wears well.
C's small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in
the way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you,
there's a lot you have to do yourself. (Actually, this is viewed by many as an additional advantage:
anything the language doesn't do for you, it doesn't dictate to you, either, so you're free to do that
something however you want.)
Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx1.html [22/07/2003 5:07:48 PM]

page xi
This Preface, in a few spare paragraphs, sums up much of the philosophy of C and the authors'
philosophy about programming in general. Their comments in C's size and scope are fundamental, and
though no one may have fully recognized it at the time (or yet), this unassuming approach to the design
of the language is surely a significant factor behind C's success. I didn't have the first paragraph of the
Preface to the First Edition in front of me when I wrote my notes (just above) on the Preface to the
Second Edition, but it's not surprising that they're similar.
As the authors say, they assume some familiarity with basic programming concepts; other notes in this
series will give you a bit of help with those concepts if you need it. The authors also anticipate another
theme of theirs, which is that they will stress learning by doing. (I'll have more to say about this as the
learning begins.)
Deep sentence:
Besides showing how to make effective use of the language, we have tried where possible
to illustrate useful algorithms and principles of good style and sound design.
The authors' advice on style is good, and their design is sound. Pay attention to the things they say which
go beyond the nuts-and-bolts details of C: there's a lot to learn here about programming in general.

Introduction
Introduction
page 2
Deep sentence:
...C deals with the same sort of objects that most computers do, namely characters,
numbers, and addresses.
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable resulting from
a C program is not efficient, it's probably because of something silly you did, not because of something
the compiler did behind your back which you have no control over. In any case, there's no point in
complaining about C's low-level flavor: C is what it is.
Next we see a more detailed list of the things that are not ``part of C.'' It's good to understand exactly
what we mean by this. When we say that the C language proper does not do things like memory
allocation or I/O, or even string manipulation, we obviously do not mean that there is no way to do these
things in C. In fact, the usual functions for doing these things are specified by the ANSI C Standard with
as much rigor as is the core language itself.
The fact that things like memory allocation and I/O are done through function calls has three
implications:
1. the function calls to do memory allocation, I/O, etc. are no different from any other function calls;
2. the functions which do memory allocation, I/O, etc. do not know any more about the data they're
acting on than ordinary functions do (we'll have more to say about this later); and
3. if you have specialized needs, you can do nonstandard memory allocation or I/O whenever you wish,
by using your own functions and ignoring the standard ones provided.
The sentence that says ``Most C implementations have included a reasonably standard collection of such
functions'' is historical; today, all implementations conforming to the ANSI C Standard have a very
http://www.eskimo.com/~scs/cclass/krnotes/sx3.html (1 of 2) [22/07/2003 5:07:52 PM]
Introduction
standard collection.
page 3
Deep sentence:
...C retains the basic philosophy that programmers know what they are doing; it only
requires that they state their intentions explicitly.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
page 4
The last sentence of the Introduction is misleading: as we'll see, it's risky to defer to any particular
compiler as a ``final authority on the language.'' A compiler is only a final authority on the language it
accepts, and the language that a particular compiler accepts is not necessarily exactly C, no matter what
the name of the compiler suggests. Most compilers accept extensions which are not part of standard C
and which are not supported by other compilers; some compilers are deficient and fail to accept certain
constructs which are in standard C. From time to time, you may have questions about what is truly
standard and which neither you nor anyone you've talked to is able to answer. If you don't have a copy of
the standard (or if you do, but you discover that the standardese in which it's written is impenetrable),
you may have to temporarily accept the jurisdiction of your particular compiler, in order to get some
program working today and under that particular compiler, but you'd do well to mark the code in
question as suspect and the question in your head as ``don't know; still unanswered.''


page 5
I completely agree with the authors that writing real programs, and soon, is the best way to learn
programming. This way, concepts which would otherwise seem abstract make sense, and the positive
feedback you get from getting even a small program to work gives you a great incentive to improve it or
write the next one.
Diving in with ``real'' programs right away has another advantage, if only pragmatic: if you're using a
conventional compiler, you can't run a fragment of a program and see what it does; nothing will run until
you have a complete (if tiny or trivial) program. You can't learn everything you'd need to write a
complete program all at once, so you'll have to take some things `òn faith'' and parrot them in your first
programs before you begin to understand them. (You can't learn to program just one expression or
statement at a time any more than you can learn to speak a foreign language one word at a time. If all you
know is a handful of words, you can't actually say anything: you also need to know something about the
language's word order and grammar and sentence structure and declension of articles and verbs.)
The authors list a few drawbacks of this ``dive in and program'' approach, and I must add one more. It's a
small step from learning-by-doing to learning-by-trial-and-error, and when you learn programming by
trial-and-error, you can very easily learn many errors. When you're not sure whether something will
work, or you're not even sure what you could use that might work, and you try something, and it does
work, you do not have any guarantee that what you tried worked for the right reason. You might just
have ``learned'' something that works only by accident or only on your compiler, and it may be very hard
to un-learn it later, when it stops working. (Also, if what you tried didn't work, it may have been due to a
bug in the compiler, such that it should have worked.)
Therefore, whenever you're not sure of something, be very careful before you go off and try it ``just to
see if it will work.'' Of course, you can never be absolutely sure that something is going to work before
you try it, otherwise we'd never have to try things. But you should have an expectation that something is
going to work before you try it, and if you can't predict how to do something or whether something
would work and find yourself having to determine it experimentally, make a note in your mind that
whatever you've just learned (based on the outcome of the experiment) is suspect.
section 1.1: Getting Started
section 1.2: Variables and Arithmetic Expressions
section 1.3: The For Statement
section 1.4: Symbolic Constants

section 1.5: Character Input and Output
section 1.5.1: File Copying
section 1.5.2: Character Counting
section 1.5.3: Line Counting
section 1.5.4: Word Counting
section 1.6: Arrays
section 1.7: Functions
section 1.8: Arguments--Call by Value
section 1.9: Character Arrays
section 1.10: External Variables and Scope


page 6
Deep sentence:
With these mechanical details mastered, everything else is comparatively easy.
The claim that a program as simple as ``hello, world'' is a big hurdle may seem outrageous, but it's really
quite true. It is a hurdle: on an unfamiliar computer, it can be arbitrarily difficult to figure out how to
enter a text file containing program source, or how to compile and link it, or how to invoke it, or what
happened after (if?) it ran. The most experienced C programmers immediately go back to this one, simple
program whenever they're trying out a new system or a new way of entering or building programs or a
new way of printing output from within programs. As they say, everything else is comparatively easy.
One hurdle which the authors don't mention but which many of you may find yourself facing is the
choice of an appropriate compiler. On many Unix machines, the cc command which the authors describe
is an older compiler which does not recognize modern, ANSI Standard C syntax. An old compiler will
accept the simple program on page 6, but it will not accept many of the other programs in the book. If
you find yourself getting baffling compilation errors on programs which you've typed in exactly as
they're shown in the book, it probably indicates that you're using an older compiler. On many machines,
another compiler called acc or gcc is available, and you'll want to use it, instead.
Deep sentence:
main will usually call other functions to help perform its job, some that you wrote, and
others from libraries that are provided for you.
We heard about this already in the Introduction, but here it is again: as far as the compiler and the
language definition are concerned, there's no difference between a function that you write and a function
someone else wrote for you, including a function like printf which seems to be part of the language.
There's nothing magic about printf; there's nothing that it can do that one of your functions couldn't.
(Well, actually, there are a few magic, or at least surprising, things about printf, but they're magic in
ways that your functions can be, too.)
There is one slight problem with the simple ``hello, world'' program in the book. The problem will
usually be ignored (that is, the program will usually work correctly), but if you receive any warning or
error messages or have any problems having to do with the ``value returned from main,'' jump forward
to page 26 to learn why main ought to end with the line
return 0;
http://www.eskimo.com/~scs/cclass/krnotes/sx4a.html (1 of 2) [22/07/2003 5:07:56 PM]


page 10
Deep sentence:
Although C compilers do not care about how a program looks, proper indentation and
spacing are critical in making programs easy for people to read. We recommend writing
only one statement per line, and using blanks around operators to clarify grouping. The
position of braces is less important, although people hold passionate beliefs. We have
chosen one of several popular styles. Pick a style that suits you, then use it consistently.
There are two things to note here. One is that (with one or two exceptions) the compiler really does not
care how a program looks; it doesn't matter how it's broken into lines. The fragments
while(i < j)
i = 2 * i;
and
while(i < j) i = 2 * i;
and
while(i<j)i=2*i;
and
while(i < j)
i = 2 * i;
and
while
i
j
i
2
i
(
<
)
=
*
;
are all treated exactly the same way by the compiler.

http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (1 of 3) [22/07/2003 5:07:58 PM]
The second thing to note is that style issues (such as how a program is laid out) are important, but they're
not something to be too dogmatic about, and there are also other, deeper style issues besides mere layout
and typography.
There is some value in having a reasonably standard style (or a few standard styles) for code layout.
Please don't take the authors' advice to ``pick a style that suits you'' as an invitation to invent your own
brand-new style. If (perhaps after you've been programming in C for a while) you have specific
objections to specific facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If you want to place your
own stamp of originality on the programs that you write, there are better avenues for your creativity than
inventing a bizarre layout; you might instead try to make the logic easier to follow, or the user interface
easier to use, or the code freer of bugs.)
Deep sentence:
...in C, as in many other languages, integer division truncates: any fractional part is
discarded.
The authors say all there is to say here, but remember it: just when you've forgotten this sentence, you'll
wonder why something is coming out zero when you thought it was supposed the be the quotient of two
nonzero numbers.
page 12
Here is more discussion on the difference between integer and floating-point division. Nothing deep; just
something to remember.
page 13
Hidden here are discriptions of some more of printf's ``conversion specifiers.'' %o and %x print
integers, in octal (base 8) and hexadecimal (base 16), respecively. Since a percent sign normally tells
printf to expect an additional argument and insert its value, you might wonder how to get printf to
just print a %. The answer is to double it: %%.
Also, note (as was mentioned on page 11) that you must match up the arguments to printf with the
conversion specification; the compiler can't (or won't) generally check them for you or fix things up if
you get them wrong. If fahr is a float, the code
printf("%d\n", fahr);
will not work. You might ask, ``Can't the compiler see that %d needs an integer and fahr is floatingpoint and do the conversion automatically, just like in the assignments and comparisons on page 12?''
And the answer is, no. As far as the compiler knows, you've just passed a character string and some other
arguments to printf; it doesn't know that there's a connection between the arguments and some special
characters inside the string. This is one of the implications of the fact, stated earlier, that functions like
printf are not special. (Actually, some compilers or other program checkers do know that a function
named printf is special, and will do some extra checking for you, but you can't count on it.)


pages 13-14
Deep sentence:
...in any context where it is permissible to use the value of a variable of some type, you can
use a more complicated expression of that type.
You may have used other languages which placed restrictions on where you could use expressions or
how complicated they could be. C has relatively few such restrictions. There's nothing magical about the
printf call above; this ability to perform a computation inside of an argument is not unique to
printf. In any function call, the arguments in the argument list are expressions, and it doesn't matter if
they are simple expressions which just fetch the value of one variable, like fahr, or more complicated
expressions, like 5.0/9.0 * (fahr - 32).

http://www.eskimo.com/~scs/cclass/krnotes/sx4c.html [22/07/2003 5:07:59 PM]

pages 14-15
Deep sentence:
Notice that there is no semicolon at the end of a #define line.
Actually, all lines that begin with # are special; we'll learn more about them later.

http://www.eskimo.com/~scs/cclass/krnotes/sx4d.html [22/07/2003 5:08:00 PM]

page 15
Note that you do not need to worry about whether your computer uses a carriage return (CR) or linefeed
(LF) or CRLF combination or something else to terminate lines in text files; in a C program, the line
terminator will always appear to be the newline, \n.

http://www.eskimo.com/~scs/cclass/krnotes/sx4e.html [22/07/2003 5:08:02 PM]
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html
section 1.5.1: File Copying

page 16
Pay particular attention to the discussion of why the variable to hold getchar's return value is declared
as an int rather than a char. The distinction may not seem terribly significant now, but it is important.
If you use a char, it may seem to work, but it may break down mysteriously later. Always remember to
use an int for anything you assign getchar's return value to.
page 17
The line
while ((c = getchar()) != EOF)
epitomizes the cryptic brevity which C is notorious for. You may find this terseness infuriating (and
you're not alone!), and it can certainly be carried too far, but bear with me for a moment while I defend it.
The simple example on pages 16 and 17 illustrates the tradeoffs well. We have four things to do:
1.
2.
3.
4.
call getchar,
assign its return value to a variable,
test the return value against EOF, and
process the character (in this case, print it again).
We can't eliminate any of these steps. We have to assign getchar's value to a variable (we can't just use
it directly) because we have to do two different things with it (test, and print). Therefore, compressing the
assignment and test into the same line (as on page 17) is the only good way of avoiding two distinct calls
to getchar (as on page 16). You may not agree that the compressed idiom is better for being more
compact or easier to read, but the fact that there is now only one call to getchar is a real virtue.
In a tiny program like this, the repeated call to getchar isn't much of a problem. But in a real program,
if the thing being read is at all complicated (not just a single character read with getchar), and if the
processing is at all complicated (such that the input call before the loop and the input call at the end of the
loop become widely separated), and if the way that input is done is ever changed some day, it's just too
likely that one of the input calls will get changed but not the other.
(Also, note that when an assignment like c = getchar() appears within a larger expression, the
surrounding expression receives the same value that is assigned. Using an assignment as a subexpression
in this way is perfectly legal and quite common in C.)
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html (1 of 2) [22/07/2003 5:08:03 PM]
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html
When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard has nothing
to do with the EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something (i.e. if it's in an infinite loop) you'll
have to take more drastic measures. Under Unix, control-C will terminate the current program, almost no
matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the current
program, but by default MS-DOS only checks for control-C when it's looking for input, so an infinite loop
can be unkillable. There's a DOS command, I think it's
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your
computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program, which it reads all at once, and then copies to the screen (again). In
other words, when you run this program, it will probably seem to echo the input a line at a time, rather
than a character at a time. You may wonder how a program can read a character right away, without
waiting for the user to hit RETURN. That's an excellent question, but unfortunately the answer is rather
complicated, and beyond the scope of this introduction. (Among other things, how to read a character
right away is one of the things that's not defined by the C language, and it's not defined by any of the
standard library functions, either. How to do it depends on which operating system you're using.)


page 18
Ignore the mention of efficiency with respect to nc = nc+1 vs. ++nc. Once you've gotten used to ++
meaning `ìncrement by 1,'' you'll probably find yourself preferring ++nc simply because it is more
concise, and incrementing things by 1 is so common. (Personally, once I got used to it, I found ++ more
natural, too, because after all, expressions like nc = nc+1, though they're common enough in
programming, are very unnatural from an algebraic perspective.)
pages 18-19
You may find it odd to have a loop with no body, but such loops do crop up. Just make sure that the
explicit null statement (or, if you prefer, empty {}) marking the empty loop body is plainly visible.
The whole first paragraph of page 19 counts as ``deep.'' A clean, well-designed loop will work properly
for all of its ``boundary conditions'': zero trips through the loop, one trip, many trips, maximum trips (if
there is any maximum, and if so, also maximum minus one). If a loop for some reason doesn't work at a
particular boundary condition, it's tempting to claim that that condition is rare or impossible and that the
loop is therefore okay. But if the loop can't handle the boundary condition, why can't it? It's probably
awkwardly constructed, and straightening it out so that it naturally handles all boundary conditions will
usually make it clearer and easier to understand (and may also remove other lurking bugs).

http://www.eskimo.com/~scs/cclass/krnotes/sx4g.html [22/07/2003 5:08:05 PM]

page 19
Note the word of caution about = vs. == carefully. Typing one when you mean the other is,
unfortunately, a very easy mistake to make.
Note that the character constants discussed on page 19 are very different from the string constants
introduced on page 7.

http://www.eskimo.com/~scs/cclass/krnotes/sx4h.html [22/07/2003 5:08:06 PM]

page 21
Deep sentence:
In a program as tiny as this, it makes little difference, but in larger programs, the increase
in clarity is well worth the modest extra effort to write it this way from the beginning.
I agree with this. Some people complain that symbolic constants make a program harder to read, because
you always have to look them up to see what they mean. As long as you choose appropriate names for
symbolic constants and use them consistently (i.e. even if APPLE and ORANGE happen to have the
same value, don't use one when you mean the other), no one will have this complaint about your
programs.
Note that there's no direct way to simplify the condition
if (c == ' ' || c == '\n' || c == '\t')
In particular, something like
if (c == (' ' || '\n' || '\t'))
would not work. (What would it do?)

http://www.eskimo.com/~scs/cclass/krnotes/sx4i.html [22/07/2003 5:08:07 PM]
section 1.6: Arrays
section 1.6: Arrays

page 22
Note carefully that arrays in C are 0-based, not 1-based as they are in some languages. (As we'll see, 0based arrays turn out to be more convenient than 1-based arrays more of the time, but they may take a bit
of getting used to at first.)
When they say `às reflected in the for loops that initialize and print the array,'' they're referring to the
fact that the vast majority of for loops in C look like this:
for(i = 0; i < 10; ++i)
and count from 0 to 9. The loop
for(i = 1; i <= 10; ++i)
would count from 1 to 10, but loops like this are comparatively rare. (In fact, whenever you see either ``=
1'' or ``<='' in a for loop, it's an indication that something unusual is going on which you'll want to be
aware of, and it may even be a bug.)
page 23
They've started going a little fast here, so read up if they're losing you. What's this magic expression c'0' that they're using as an array subscript? Remember, as we saw first on page 19, that characters in C
are represented by small integers corresponding to their values in the machine's character set. In ASCII,
which most machines use, 'A' is character code 65, '0' (zero) is code 48, '9' is code 57, and all the
other characters have their own values which I won't bother to list. If we've just read the character '9'
from the file, it has value 57, so c-'0' is 57 - 48 which is 9, and we'll increment cell number 9 in the
array, just like we want to. Furthermore, even if we're not using a machine which uses ASCII, by
subtracting '0', we'll always subtract whatever the right value is to map the characters from '0' to
'9' down to the array cell range 0 to 9.

http://www.eskimo.com/~scs/cclass/krnotes/sx4j.html [22/07/2003 5:08:09 PM]

page 24
Deep sentence:
...you will often see a short function defined and called only once, just because it clarifies
some piece of code.
Ideally, this is true in any language. Breaking a program up into functions (or subroutines or procedures
or whatever a language calls them) is one of the first and one of the most important ways to keep control
of the proliferating complexity in a software project.
page 25
Note that the for loop at the top of the page runs from 1 to n rather than 0 to n-1, and may therefore
seem suspect by the above note for page 22. In this case, since all that matters is that the loop is traversed
n times, it doesn't matter which values i takes on.
Not only the names of the parameters and local variables, but also their values (as we'll see in section
1.8), are all local to a function. Rather than remembering a list of things that are local, it's easier to
remember that everything is local: the whole point of a function as an abstraction mechanism is that it's a
black box; you don't have to know or care about any of its implementation details, such as what it
chooses to name its parameters and local variables. You pass it some arguments, and it returns you a
value according to its specification.
The distinction between the terms argument and parameter may seem overly picky, but it's a good way
of reinforcing the notion that the parameters and other details of a function's implementation are almost
completely separated from (that is, of no concern to) the caller.
page 26
Note the discussion about return values from main. The first few sample programs in this chapter,
including the very first ``hello, world'' example on page 6, have omitted a return value, which is, stricly
speaking, incorrect. Do get in the habit of returning a value from main, both to be correct, and because
``programs should return status to their environment.''
By ``Parameter names need not agree'' they mean that it's not a problem that the prototype declaration of
power says that the first parameter is named m, while the actual function definition that it's named
base.
http://www.eskimo.com/~scs/cclass/krnotes/sx4k.html (1 of 2) [22/07/2003 5:08:10 PM]
pages 26-7
It's probably a good idea if you're aware of this `òld style'' function syntax, so that you won't be taken
aback when you come across it, perhaps in code written by reactionary old fogies (such as the author of
these notes) who still tend to use it out of habit when they're not paying attention.

section 1.8: Arguments -- Call by Value
section 1.8: Arguments -- Call by Value

page 27
If, on the other hand, you are not used to other languages such as Fortran, these call-by-value semantics
may not be surprising (any more than anything else in C which is new to you).
Even though you can modify a parameter in a function (i.e. treat it as a ``conveniently initialized local
variable''), you certainly don't have to, especially if (as is often the case) you'll need an unmodified copy
of the parameter later in the function.
page 28
Don't worry too much about the exception mentioned for arrays--there are a number of exceptions for
arrays, and we'll have much more to say about them later. But be aware that we are deliberately glossing
over a few details here, and they are details which will be come important later on. (In particular, the
statement on page 27 that ``the called function cannot directly alter a variable in the calling function''
may not seem to be true for arrays, and this is what the authors mean when they say that ``The story is
different''. We'll be seeing several functions which return things--usually strings--to their callers by
writing into caller-supplied arrays. In chapter 5 we'll learn how this is possible. If this discrepancy
wouldn't have bothered you now, pretend I didn't mention it.)

http://www.eskimo.com/~scs/cclass/krnotes/sx4l.html [22/07/2003 5:08:12 PM]

Pay attention to the way this program is developed first in ``pseudocode,'' and then refined into real C
code. A clear pseudocode statement not only makes it easier to think about the structure of the eventual
real code, but if you make the eventual real code mimic the pseudocode, the real code will be equally
straightforward and easy to read.
The function getline, introduced here, is extremely useful, and we'll have as much use for it in our
own programs as the authors do in theirs. (In other words, they have succeeded in their goal of making it
`ùseful in other contexts.'' In fact, I've been using a getline function much like this one ever since I
learned C from K&R, and I generally find it preferable to the standard library's line-reading function.)
Pages 28 through 30 introduce quite a lot of material all at once; you'll probably want to read it several
times, especially if arrays or character strings are new to you.
Earlier we said that C provided no particular built-in support for composite objects such as character
strings, and here we begin to see the significance of that omission. A string is just an array of characters,
and you can access the characters within a string exactly as easily (because you use exactly the same
syntax) as you access the elements within any other array.
If you've used BASIC, you will probably wonder where C's SUBSTR function is. C doesn't have one, for
two reasons. First of all, there's less of a need for one, because it's so easy the get at the individual
characters within a string in C. More importantly, a SUBSTR function implies that you take a string and
extract a substring as a new string. However, creating a new string (i.e. the extracted substring) involves
allocating arbitrary amounts of memory to hold the string, and C rarely if ever allocates memory
implicitly for you.
If anything, it's too easy to access the individual characters within strings in C. String handling illustrates
one of the potentially frustrating aspects of C we mentioned earlier: the language doesn't define any highlevel string handling features for you, so you're free to do whatever low-level string processing you wish.
The down side is that constantly manipulating strings down at the character level, and always having to
remember to allocate memory for new strings, can get tedious after a while.
The preceding paragraph is not meant to discourage you, but just to point out a reality: any C program
which manipulates strings (and this includes most C programs) will find itself doing a certain amount of
character-level fiddling and a certain amount of memory allocation. It will also find that it can do just
about anything it wants to do (and that its programmer has the patience to do) with the strings it
manipulates.
Since string processing, and at this relatively low level, is so common in C, you'll want to pay careful
attention to the discussion on page 30 of how strings are stored in character arrays, and particularly to the
http://www.eskimo.com/~scs/cclass/krnotes/sx4m.html (1 of 3) [22/07/2003 5:08:14 PM]
fact that a '\0' character is always present to mark the end of a string. (It's easy to forget to count the
'\0' character when allocating space for a string, for instance.) Notice the nice picture on page 30; this
is a good way of thinking about data structures (and not just simple character arrays, either).
page 29
Note that the program explicitly allocates space for the two strings it manipulates: the current line line,
and the longest line longest. (It only needs these two strings at any one time, even though the input
consists of arbitrarily many lines.) Note that it cannot simply assign one string to another (because C
provides no built-in support for composite objects such as character strings); the program calls the copy
function to do so. (The authors write their own copy function for explanatory purposes; the standard
library contains a string-copying function which would normally be used.) The only strings that aren't
explicitly allocated are the arrays in the getline and copy functions; as the discussion briefly
mentions, these do not need to be allocated because they're already allocated in the caller. (There are a
number of subtleties about array parameters to functions; we'll have more to say about them later.)
The code on page 29 contains a number of examples of compressed assignments and tests; evidently the
authors expect you to get used to this style in a hurry. The line
while ((len = getline(line, MAXLINE)) > 0)
is similar to the getchar loops earlier in this chapter; it calls getline, saves its return value in the
variable len, and tests it against 0.
The comparison
i<lim-1 && (c=getchar())!=EOF && c!='\n'
in the for loop in the getline function does several things: it makes sure there is room for another
character in the array; it calls, assigns, and tests getchar's return value against EOF, as before; and it
also tests the returned character against '\n', to detect end of line. The surrounding code is mildly
clumsy in that it has to check for \n a second time; later, when we learn more about loops, we may find
a way of writing it more cleanly. You may also notice that the code deals correctly with the possibility
that EOF is seen without a \n.
The line
while ((to[i] = from[i]) != '\0')
in the copy function does two things at once: it copies characters from the from array to the to array,
and at the same time it compares the copied character against '\0', so that it stops at the end of the
string. (If you think this is cryptic, wait 'til we get to page 106 in chapter 5!)
We've also just learned another printf conversion specifier: %s prints a string.
page 30
Deep sentence:
There is no way for a user of getline to know in advance how long an input line might
be, so getline checks for overflow.
Because dynamically allocating memory for arbitrary-length strings is mildly tedious in C, it's tempting
to use fixed-size arrays. (It's so tempting, in fact, that that's what most programs do, and since fixed-size
arrays are also considerably easier to discuss, all of our early example programs will use them.) Using
fixed-size arrays is fine, as long as some assurance is made that they don't overflow. Unfortunately, it's
also tempting (and easy) to forget to guard against array overflow, perhaps by deluding yourself into
thinking that too-long inputs ``can't happen.'' Murphy's law says that they do happen, and the various
corrolaries to Murphy's law say that they happen in the most unpleasant way and at the least convenient
time. Don't be cavalier about arrays; do make sure that they're big enough and that you guard against
overflowing them. (In another mark of C's general insensitivity to beginning programmers, most
compilers do not check for array overflow; if you write more data to an array than it is declared to hold,
you quietly scribble on other parts of memory, usually with disastrous results.)


page 31
There's a bit of jargon in this section. An external variable is what is sometimes called a global variable.
The authors introduce the term automatic to refer to the local variables we've seen so far; this is a good
word to remember, even if you never use it, because people will spring it on you when they're being
precise, and if you don't know this usage you'll think they're talking about transmissions or something.
(To be precise, ``local'' is a broader category than `àutomatic''; there are both automatic and static
local variables.)
Deep sentence:
If [automatic variables] are not set, they will contain garbage.
Actually, if automatic variables always contained garbage, the situation wouldn't be quite so bad. In
practice, they often (though not always) do contain zero or some other predictable value, and this
happens just often enough to lull you into the occasional false sense of security, by making a program
with an inadvertently uninitialized variable seem to work.
Deep sentence:
An external variable must be defined, exactly once, outside of any function; this sets aside
storage for it. The variable must also be declared in each function that wants to access it;
this states the type of the variable.
The basic rule is ``define once; declare many times.'' As we'll see just below, it is not necessary for a
declaration of an external variable to appear in every single function; it is possible for one external
declaration to apply to many functions. (In the clause ``the variable must also be declared in each
function'', the word ``declared'' is an adjective, not a verb.)
page 33
In fact, the ``common practice'' of placing ``definitions of all external variables at the beginning of the
source file'' is so common that it's rare to see external declarations within functions, as in the functions on
page 32. The authors are using the in-function extern declarations partly because it is an alternative
style, and partly because we haven't talked about separate compilation (that is, building a single program
from several separate source files) yet. Rather than jumping the gun and discussing those two topics now,
I'll just mention that the discussion in section 1.10 might be a bit misleading, and that you should
probably wait until we get to the complete description of the issue in section 4.4 before you commit any
of this to memory.
http://www.eskimo.com/~scs/cclass/krnotes/sx4n.html (1 of 2) [22/07/2003 5:08:16 PM]
Deep sentence:
You should note that we are using the words definition and declaration carefully when we
refer to external variables in this section. ``Definition'' refers to the place where the
variable is created or assigned storage; ``declaration'' refers to places where the nature of
the variable is stated but no storage is allocated.
Do note the careful distinction; it's an important one and one which I'll be using, too.
page 34
The authors' criticism of the second (page 32) version of the longest-line program is accurate. The
revision of the longest-line program to use external variables was done only to demonstrate the use of
external variables, not to improve the program in any way (nor does it improve the program in any way).
As a general rule, external variables are acceptable for storing certain kinds of global state information
which never changes, which is needed in many functions, and which would be a nuisance to pass around.
I don't think of external variables as ``communicating between functions'' but rather as ``setting common
state for the entire program.'' When you start thinking of an external variables as being one of the ways
you communicate with a particular function, and in particular when you find yourself changing the value
of some external variable just before calling some function, to affect its operation in some way, you start
getting into the troublesome uses of external variables, which you should avoid.

Chapter 2: Types, Operators, and

Expressions
page 35
Deep sentence:
The type of an object determines the set of values it can have and what operations can be
performed on it.
This is a fairly formal, mathematical definition of what a type is, but it is traditional (and meaningful).
There are several implications to remember:
1. The ``set of values'' is finite. C's int type can not represent all of the integers; its float type
can not represent all floating-point numbers.
2. When you're using an object (that is, a variable) of some type, you may have to remember what
values it can take on and what operations you can perform on it. For example, there are several
operators which play with the binary (bit-level) representation of integers, but these operators are
not meaningful for and may not be applied to floating-point operands.
3. When declaring a new variable and picking a type for it, you have to keep in mind the values and
operations you'll be needing.
In other words, picking a type for a variable is not some abstract academic exercise; it's closely
connected to the way(s) you'll be using that variable.
You don't need to worry about the list of ``small changes and additions'' made by the ANSI standard,
unless you started learning C long ago or have a keen interest in its history. We'll be using these new
features indiscriminately, usually without comment.
section 2.1: Variable Names
section 2.2: Data Types and Sizes
section 2.3: Constants
section 2.4: Declarations
section 2.5: Arithmetic Operators
section 2.6: Relational and Logical Operators

section 2.7: Type Conversions
section 2.8: Increment and Decrement Operators
section 2.9: Bitwise Operators
section 2.10: Assignment Operators and Expressions
section 2.11: Conditional Expressions
section 2.12: Precedence and Order of Evaluation


Deep sentence:
Don't begin variable names with underscore, however, since library routines often use such
names.
If you happen to pick a name which ``collides'' with (is the same as) a name already chosen by a library
routine, either your code or the library routine (or both) won't work. Naming issues become very
significant in large projects, and problems can be avoided by setting guidelines for who may use which
names. One of these guidelines is simply that user code should not use names beginning with an
underscore, because these names are (for the most part) ``reserved to the implementation'' (that is,
reserved for use by the compiler and the standard library).
Note that case is significant; assuming that case is ignored (as it is with some other programming
languages and operating systems) can lead to real frustration.
The convention that all-upper-case names are used for symbolic constants (i.e. as created with the
#define directive, which we learned about in section 1.4) is arbitrary, but useful. Like the various
conventions for code layout (page 10), this convention is a good one to accept (i.e. not get too creative
about), until you have some very good reason for altering it.
Deep sentence:
Keywords like if, else, int, float, etc., are reserved; you can't use them as variable
names.
You can find the complete list of keywords in appendix A2.4 on page 192.

http://www.eskimo.com/~scs/cclass/krnotes/sx5a.html [22/07/2003 5:08:19 PM]

page 36
If you can look at this list of `à few basic types in C'' and say to yourself, `Òh, how simple, there are
only a few types, I won't have to worry much about choosing among them,'' you'll have an easy time with
declarations. (Some masochists wish that the type system were more complicated so that you could
specify more things about each variable, but those of us who would rather not have to specify these extra
things each time are glad that we don't have to.)
Note that the basic types are defined as having at least a certain size. There is no specification that a
short int will be exactly 16 bits, or that a long int will be exactly 32 bits. Some programmers
become obsessed with knowing exactly what sizes things will be in various situations, and write
programs which depend on things having certain sizes. Exact sizes are occasionally important, but most
of the time we can sidestep size issues and let the compiler do most of the worrying.
Most of the simple variables in most programs are of types int, long int, or double. Typically,
we'll use int and double for most purposes, and long int any time we need to hold values greater
than 32,767. We'll rarely use individual variables of type char; although we'll use plenty of arrays of
char. Types short int and float are important primarily when efficiency (speed or memory
usage) is a concern, and for us it usually won't be.
Note that even when we're manipulating individual characters, we'll usually use an int variable, for the
reason discussed in section 1.5.1 on page 16.

http://www.eskimo.com/~scs/cclass/krnotes/sx5b.html [22/07/2003 5:08:20 PM]

page 37
We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler's. The
compiler doesn't care; it always converts everything into binary internally, anyway. (There is, however,
no good way to specify constants in source code in binary.)
pages 37-38
Read the descriptions of character and string constants carefully; most C programs work with these data
types a lot, and their proper use must be kept in mind. Note particularly these facts:
1. The character constant 'x' is quite different from the string constant "x".
2. The value of a character is simply ``the numeric value of the character in the machine's character
set.''
3. Strings are terminated by the null character, \0. (This applies to both string constants and to all
other strings we'll build and manipulate.) This means that the size of a string (the number of
char's worth of memory it occupies) is always one more than its length (i.e. as reported by
strlen) appears to be.
As we saw in section 1.6 on page 23, it's possible to switch rather freely between thinking of a character
as a character and thinking of it as its value. For example, the character '0' (that is, the character that
can print on your screen and looks like the number zero) has in the ASCII character set the internal value
48. Another way of saying this is to notice that the following expressions are all true:
'0' == 48
'0' == '\060'
'0' == '\x30'
We'll have a bit more to say about characters and their small integer representations in section 2.7.
Note also that the string "48" consists of the three characters '4', '8', and '\0'. Also in section 2.7
we'll meet the atoi function which computes a numeric value from a string of digits like this.
page 39
We won't be using enumerations, so you don't have to worry too much about the description of
enumeration constants.
http://www.eskimo.com/~scs/cclass/krnotes/sx5c.html (1 of 2) [22/07/2003 5:08:21 PM]


page 40
You may wonder why variables must be declared before use. There are two reasons:
1. It makes things somewhat easier on the compiler; it knows right away what kind of storage to
allocate and what code to emit to store and manipulate each variable; it doesn't have to try to intuit
the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly;
you must think about them enough to pick appropriate types for them. (The compiler's error
messages to you, telling you that you apparently forgot to declare a variable, are as often helpful
as they are a nuisance: they're helpful when they tell you that you misspelled a variable, or forgot
to think about exactly how you were going to use it.)
Although there are a few places where ``certain declarations can be made implicitly by context'', making
use of these removes the advantages of reason 2 above, so I recommend always declaring everything
explicitly.
Most of the time, I recommend writing one declaration per line (as in the ``latter form'' on page 40). For
the most part, the compiler doesn't care what order declarations are in. You can order the declarations
alphabetically, or in the order that they're used, or to put related declarations next to each other.
Collecting all variables of the same type together on one line essentially orders declarations by type,
which isn't a very useful order (it's only slightly more useful than random order).
If you'd rather not remember the rules for default initialization (namely that `èxternal or static variables
are initialized to zero by default'' and `àutomatic variables for which there is no initializer have...
garbage values''), you can get in the habit of initializing everything. It never hurts to explicitly initialize
something when it would have been implicitly initialized anyway, but forgetting to initialize something
that needs it can be the source of frustrating bugs.
Don't worry about the distinction between `èxternal or static variables''; we haven't seen it yet.
One mild surprise is that const variables are not ``constant expressions'' as defined on page 38. You
can't say something like
const int maxline = 1000;
char line[maxline+1];
/* WRONG */
http://www.eskimo.com/~scs/cclass/krnotes/sx5d.html (1 of 2) [22/07/2003 5:08:23 PM]


page 41
Keep in the back of your mind somewhere the fact that the behavior of the / and % operators is not
precisely defined for negative operands. This means that -7 / 4 might be -1 or -2, and -7 % 4 might
be -3 or +1. The difference won't matter for the simple programs we'll be writing at first, but eventually
you'll get bit by it if you don't remember it.
An additional arithmetic operation you might be wondering about is exponentiation. Some languages
have an exponentiation operator (typically ^ or **), but C doesn't.
The term ``precedence'' refers to how ``tightly'' operators bind to their operands (that is, to the things they
operate on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7,
not 9. In other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.
The term `àssociativity'' refers to the grouping when two or more operators of the same precedence
participate next to each other in an expression. When an operator (like subtraction) associates ``left to
right,'' it means that 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not +2.
By the way, the word `àrithmetic'' as used in the title of this section is an adjective, not a noun, and it's
pronounced differently than the noun: the accent is on the third syllable.


If it isn't obvious, >= is greater-than-or-equal-to, <= is less-than-or-equal-to, == is equal-to, and != is
not-equal-to. We use >=, <=, and != because the symbols >=, <=, and != are not common on computer
keyboards, and we use == because equality testing and assignment are two completely different
operations, but = is already taken for assignment. (Obviously, typing = when you mean == is a very easy
mistake to make, so watch for it. Some compilers will warn you when you use one but seem to want the
other.)
The fact that evaluation of the logical operators && and || ``stops as soon as the truth or falsehood of the
result is known'' refers to the fact that
``false'' AND anything is false
or, in C,
(0 && anything) == 0
while, on the other hand,
``true'' OR anything is true
or, in C,
(1 || anything) == 1
Looking at these another way, if you want to do something if thing1 is true and thing2 is true, and you've
just noticed that thing1 is false, you don't even need to check thing2. Similarly, if you're supposed to do
something if thing3 is true or thing4 is true, and you notice that thing3 is true, you can go ahead and do
whatever it is you're supposed to do without checking thing4.
C works the same way, and if it's not true that ``most C programs rely on these properties,'' it's certainly
true that many do.
For another example of the usefulness of this ``short-circuiting'' behavior, suppose we're taking the
average of n numbers. If n is zero, that is, if we don't have any numbers to take the average of, we don't
want to divide by zero. Code like
if(n != 0 && sum / n > 1)
is common: it tests whether n is nonzero and the average is greater than 1, but it does not have to worry
about dividing by zero. (If, on the other hand, the compiler always evaluated both sides of the && before
checking to see whether they were both true, the code above could divide by zero.)
page 42
Note the extra parentheses in
(c = getchar()) != '\n'
Since this is a common idiom, you'll need to remember the parentheses. What would
c = getchar() != '\n'
do?
C's treatment of Boolean values (that is, those where we only care whether they're true or false) is
straightforward. We'll have more to say about it later, but for now, note that a value of zero is ``false,''
and any nonzero value is ``true.'' You might also note that there is no necessary connection between
statements like if() which expect a true/false value and operators like >= and && which generate
true/false values. You can use operators like >= and && in any expression, and you can use any
expression in an if() statement.
The authors make a good point about style: if valid is conceptually a Boolean variable (that is, it's an
integer, but we only care about whether it's zero or nonzero, in other words, ``false'' or ``true''), then
if(valid)
is a perfectly reasonable and readable condition. However, when values are not conceptually Boolean, I
encourage you to make explicit comparisons against 0. For example, we could have expressed our
average-taking code as
if(n && sum / n > 1)
but I think it's clearer to be explicit and say
if(n != 0 && sum / n > 1)
(However, many C programmers feel that expressions like
if(n && sum / n > 1)
are ``more concise,'' so you will see them all the time and you should be able to read them.)


The conversion rules described here and on page 44 are straightforward, but they're quite important, so
you'll need to learn them well. Usually, conversions happen automatically and when you want them to, but
not always, so it's important to keep the rules in mind. (Recall the discussion of 5/9 on page 12.)
Deep sentence:
A char is just a small integer, so chars may be freely used in arithmetic expressions.
Whether you treat a ``small integer'' as a character or an integer is pretty much up to you. As we saw
earlier, in the ASCII character set, the character '0' has the value 48. Therefore, saying
int i = '0';
is the same as saying
int i = 48;
If you print i out as a character, using
putchar(i);
or
printf("%c", i);
(the %c format prints characters; see page 13), you'll see the character '0'. If you print it out as a number:
printf("%d", i);
you'll see the value 48.
Most of the time, you'll use whatever notation matches what you're trying to do. If you want the character
'0', you'll use '0'. If you want the value 48 (as the number of months in four years, or something), you'll
use 48. If you want to print characters, you'll use putchar or printf %c, and if you want to print
integers, you'll use printf %d. Occasionally, you'll cross over between thinking of characters as
characters and as values, such as in the character-counting program in section 1.6 on page 22, or in the
atoi function we'll look at next. (You should never have to know that '0' has the value 48, and you
should never have to write code which depends on it.)
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (1 of 8) [22/07/2003 5:08:29 PM]
page 43
To illustrate the ``schitzophrenic'' nature of characters (are they characters, or are they small integer
values?), it's useful to look at an implementation of the standard library function atoi. (If you're getting
overwhelmed, though, you may skip this example for now, and come back to it later.) The atoi routine
converts a string like "123" into an integer having the corresponding value.
As you study the atoi code at the top of page 43, figure out why it does not seem to explicitly check for
the terminating '\0' character.
The expression
s[i] - '0'
is an example of the ``crossing over'' between thinking about a character and its value. Since the value of
the character '0' is not zero (and, similarly, the other numeric characters don't have their `òbvious''
values, either), we have to do a little conversion to get the value 0 from the character '0', the value 1 from
the character '1', etc. Since the character set values for the digit characters '0' to '9' are contiguous
(48-57, if you must know), the conversion involves simply subtracting an offset, and the offset (if you think
about it) is simply the value of the character '0'. We could write
s[i] - 48
if we really wanted to, but that would require knowing what the value actually is. We shouldn't have to
know (and it might be different in some other character set), so we can let the compiler do the dirty work
by using '0' as the offset (since subtracting '0' is, by definition, the same as subtracting the value of the
character '0').
The functions from <ctype.h> are being introduced here without a lot of fanfare. Here is the main loop
of the atoi routine, rewritten to use isdigit:
for (i = 0; isdigit(s[i]); ++i)
n = 10 * n + (s[i] - '0');
Don't worry too much about the discussion of signed vs. unsigned characters for now. (Don't forget about it
completely, though; eventually, you'll find yourself working with a program where the issue is significant.)
For now, just remember:
1. Use int as the type of any variable which receives the return value from getchar, as discussed in
section 1.5.1 on page 16.
2. If you're ever dealing with arbitrary ``bytes'' of binary data, you'll usually want to use unsigned
char.
page 44
As we saw in section 2.6 on page 44, relational and logical operators always ``return'' 1 for ``true'' and 0 for
``false.'' However, when C wants to know whether something is true or false, it just looks at whether it's
nonzero or zero, so any nonzero value is considered ``true.'' Finally, some functions which return true/false
values (the text mentions isdigit) may return ``true'' values of other than 1.
You don't have to worry about these distinctions too much, and you also don't have to worry about the
fragment
d = c >= '0' && c <= '9'
as long as you write conditionals in a sensible way. If you wanted to see whether two variables a and b
were equal, you'd never write
if((a == b) == 1)
(although it would work: the == operator ``returns'' 1 if they're equal). Similarly, you don't want to write
if(isdigit(c) == 1)
because it's equally silly-looking, and in this case it might not work. Just write things like
if(a == b)
and
if(isdigit(c))
and you'll steer clear of most problems. (Make sure, though, that you never try something like if('0'
<= c <= '9'), since this wouldn't do at all what it looks like it's supposed to.)
The set of implicit conversions on page 44, though informally stated, is exactly the set to remember for
now. They're easy to remember if you notice that, as the authors say, ``the `lower' type is promoted to the
`higher' type,'' where the `òrder'' of the types is
char < short int < int < long int < float < double < long double
(We won't be using long double, so you don't need to worry about it.) We'll have more to say about
these rules on the next page.
Don't worry too much for now about the additional rules for unsigned values, because we won't be using
them at first.
Do notice that implicit (automatic) conversions do happen across assignments. It's perfectly acceptable to
assign a char to an int or vice versa, or assign an int to a float or vice versa (or any other
combination). Obviously, when you assign a value from a larger type to a smaller one, there's a chance that
it might not fit. Therefore, compilers will often warn you about such assignments.
page 45
Casts can be a bit confusing at first. A cast is the syntax used to request an explicit type conversion;
coercion is just a more formal word for ``conversion.'' A cast consists of a type name in parentheses and is
used as a unary operator. You may have used languages which had conversion operators which looked
more like function calls:
integer i = 2;
floating f = floating(i);
integer i2 = integer(f);
/* not C */
/* not C */
In C, you accomplish the same thing with casts:

int i = 2;
float f = (float)i;
int i2 = (int)f;
(Actually, in C, we wouldn't need casts in those initializations at all, because conversions between int and
float are some of the ones that C performs automatically.)
To further understand both how implicit conversions and explicit casts work, let's study how the implicit
conversions would look if we wrote them out explicitly. First we'll declare a few variables of various types:
char c1, c2;
int i1, i2;
long int L1, L2;
double d1, d2;
Next we'll look at the kinds of conversions which C automatically performs when performing arithmetic on
two dissimilar types, or when assigning a value to a dissimilar type. The rules are straightforward: when
performing arithmetic on two dissimilar types, C converts one or both sides to a common type; and when
assigning a value, C converts it to the type of the variable being assigned to.
If we add a char to an int:
i2 = c1 + i1;
the fourth rule on page 44 tells us to convert the char to an int, as if we'd written
i2 = (int)c1 + i1;
If we multiply a long int and a double:
d2 = L1 * d1;
the second rule tells us to convert the long int to a double, as if we'd written
d2 = (double)L1 * d1;
An assignment of a char to an int
i1 = c1;
is as if we'd written
i1 = (int)c1;
and an assignment of a float to an int
i1 = f1;
is as if we'd written
i1 = (int)f1;
Some programmers worry that implicit conversions are somehow unreliable and prefer to insert lots of
explicit conversions. I recommend that you get comfortable with implicit conversions--they're quite useful-and don't clutter your code with extra casts.
There are a few places where you do need casts, however. Consider the code
i1 = 200;
i2 = 400;
L1 = i1 * i2;
The product 200 x 400 is 80000, which is not guaranteed to fit into an int. (Remember that an int is only
guaranteed to hold values up to 32767.) Since 80000 will fit into a long int, you might think that you're
okay, but you're not: the two sides of the multiplication are of the same type, so the compiler doesn't see the
need to perform any automatic conversions (none of the rules on page 44 apply). The multiplication is
carried out as an int, which overflows with unpredictable results, and only after the damage has been
done is the unpredictable value converted to a long int for assignment to L1. To get a multiplication
like this to work, you have to explicitly convert at least one of the int's to long int:
L1 = (long int)i1 * i2;
Now, the two sides of the * are of different types, so they're both converted to long int (by the fifth rule
on page 44), and the multiplication is carried out as a long int. If it makes you feel safer, you can use
two casts:
L1 = (long int)i1 * (long int)i2;
but only one is strictly required.
A similar problem arises when two integers are being divided. The code
i1 = 1;
f1 = i1 / 2;
does not set f1 to 0.5, it sets it to 0. Again, the two operands of the / operand are already of the same type
(the rules on page 44 still don't apply), so an integer division is performed, which discards any fractional
part. (We saw a similar problem in section 1.2 on page 12.) Again, an explicit conversion saves the day:
f1 = (float)i1 / 2;
Alternately, in a case like this, you can use a floating-point constant:
f1 = i1 / 2.0;
In either case, as soon as one of the operands is floating point, the division is carried out in floating point,
and you get the result you expect.
Implicit conversions always happen during arithmetic and assignment to variables. The situation is a bit
more complicated when functions are being called, however.
The authors use the example of the sqrt function, which is as good an example as any. sqrt accepts an
argument of type double and returns a value of type double. If the compiler didn't know that sqrt
took a double, and if you called
sqrt(4);
or
int n = 4;
sqrt(n);
the compiler would pass an int to sqrt. Since sqrt expects a double, it will not work correctly if it
receives an int. Therefore, it was once always necessary to use explicit conversions in cases like this, by
calling
sqrt((double)4)
or
sqrt((double)n)
or
sqrt(4.0)
However, it is now possible, with a function prototype, to tell the compiler what types of arguments a
function expects. The prototype for sqrt is
double sqrt(double);
and as long as a prototype is in effect (`ìn scope,'' as the cognoscenti would say), you can call sqrt
without worrying about conversions. When a prototype is in effect, the compiler performs implicit
conversions during function calls (specifically, while passing the arguments) exactly as it does during
simple assignments.
Obviously, using prototypes makes for much safer programming, and it is recommended that you use them
whenever possible. For the standard library functions (the ones already written for you), you get prototypes
automatically when you include the header files which describe sets of library functions. For example, you
get prototypes for all of C's built-in math functions by putting the line
#include <math.h>
at the top of your program. For functions that you write, you can supply your own prototypes, which we'll
be learning more about later.
However, there are a few situations (we'll talk about them later) where prototypes do not apply, so it's
important to remember that function calls are a bit different and that explicit conversions (i.e. casts) may
occasionally be required. Don't imagine that prototypes are a panacea.
page 46
Don't worry about the rand example.


The distinction between the prefix and postfix forms of ++ and -- will probably seem strained at first,
but it will make more sense once we begin using these operators in more realistic situations.
The authors point out that an expression like (i+j)++ is illegal, and it's worth thinking for a moment
about why. The ++ operator doesn't just mean `àdd one''; it means `àdd one to a variable'' or ``make a
variable's value one more than it was before.'' But (i+j) is not a variable, it's an expression; so there's
no place for ++ to store the incremented result. If you were bound and determined to use ++ here, you'd
have to introduce another variable:
int k = i + j;
k++;
But really, when you want to add one to an expression, just use
i + j + 1
Another unfortunate (and utterly meaningless) example is
i = i++;
If you want to increment i (that is, add one to it, and store the result back in i), either use
i = i + 1;
or
i++;
Don't try to combine the two.
page 47
Deep sentence:
In a context where no value is wanted, just the incrementing effect, as in
if(c == '\n')
nl++;
http://www.eskimo.com/~scs/cclass/krnotes/sx5h.html (1 of 2) [22/07/2003 5:08:31 PM]
prefix and postfix are the same.

In other words, when you're just incrementing some variable, you can use either the nl++ or ++nl
form. But when you're immediately using the result, as in the examples we'll look at later, using one or
the other makes a big difference.
In that light, study one of the examples on this page--squeeze, the modified getline, or strcat-and convince yourself that it would not work if the wrong form of increment (++i or ++j) were used.
You may note that all three examples on pages 47-48 use the postfix form. Postfix increment is probably
more common, though prefix definitely has its uses, too.
You may notice the keyword void popping up in a few code examples. void is a type we haven't met
yet; it's a type with no values and no operations. When a function is declared as ``returning'' void, as in
the squeeze and strcat examples on pages 47 and 48, it means that the function does not return a
value. (This was briefly mentioned on page 30 in chapter 1.)


page 48
The bitwise operators are definitely a bit (pardon the pun) more esoteric than the parts of C we've
covered so far (and, indeed, than probably most of C). We won't concentrate on them, but they do come
up all the time, so you should eventually learn enough about them to recognize what they do, even if you
don't use them in any of your own programs for a while. You may skip this section for now, though.
To see what the bitwise operators are doing, it may help to convert to binary for a moment and look at
what's happening to the individual bits. In the example on page 48, suppose that n is 052525, which is
21845 decimal, or 101010101010101 binary. Then n & 0177, in base 2 and base 8 (binary and octal)
looks like
101010101010101
& 000000001111111
--------------1010101
052525
& 000177
-----125
In the second example, if SET_ON is 012 and x is 0, then x | SET_ON looks like
000000000
| 000001010
--------1010
000000
| 000012
-----12
and if x starts out as 402, it looks like

100000010
| 000001010
--------100001010
000402
| 000012
-----412
Note that with &, anywhere we have a 0 we turn bits off, and anywhere we have a 1 we copy bits through
from the other side. With |, anywhere we have a 1 we turn bits on, and anywhere we have a 0 we leave
bits alone.
You'll frequently see the word mask used, both as a noun and a verb. You can imagine that we've cut a
mask or stencil out of cardboard, and are using spray paint to spray through the mask onto some other
piece of paper. For |, the holes in the mask are like 1's, and the spray paint is like 1's, and we paint more
1's onto the underlying paper. (If there was already paint under a hole, nothing really changes if we get
http://www.eskimo.com/~scs/cclass/krnotes/sx5i.html (1 of 2) [22/07/2003 5:08:33 PM]
more paint on it; it's still a ``1''.)

The & operator is a bit harder to fit into this analogy: you can either imagine that the holes in the mask
are 1's and you're spraying some preservative which will fix some of the underlying bits after which the
others will get washed off, or you can imagine that the holes in the mask are 0's, and you're spraying
some erasing paint or some background color which obliterates anything (i.e. any 1's, any foreground
color) it reaches.
For a bit more information on ``bitwise'' operations, see the handout, `À Brief Refresher on Some Math
Often Used in Computing.''
page 49
Work through the example at the top of the page, and convince yourself that 1 & 2 is 0 and that 1 &&
2 is 1.
The precedence of the bitwise operators is not what you might expect, and explicit parentheses are often
needed, as noted in this deep sentence from page 52:
Note that the precedence of the bitwise operators &, ^, and | falls below == and !=. This
implies that bit-testing expressions like
if ((x & MASK) == 0) ...
must be fully parenthesized to give proper results.


page 50
You may wonder what it means to say that `èxpr<sub>1</sub> is computed only once'' since in an assignment
like
i = i + 2
we don't `èvaluate'' the i on the left hand side of the = at all, we assign to it. The distinction becomes important,
however, when the left hand side (expr<sub>1</sub>) is more complicated than a simple variable. For example,
we could add 2 to each of the n cells of an array a with code like
int i = 0;
while(i < n)
a[i++] += 2;
If we tried to use the expanded form, we'd get
int i = 0;
while(i < n)
a[i++] = a[i++] + 2;
and by trying to increment i twice within the same expression we'd get (as we'll see) undesired, unpredictable, and in
fact undefined results. (Of course, a more natural form of this loop would be
for(i = 0; i < n; i++)
a[i] += 2;
and with the increment of i moved out of the array subscript, it wouldn't matter so much whether we used a[i] +=
2 or a[i] = a[i] + 2.)
page 51
To make the point more clear, the ``complicated expression'' without using += would look like
yyval[yypv[p3+p4] + yypv[p1+p2]] = yyval[yypv[p3+p4] + yypv[p1+p2]] + 2
(What's going on here is that the subexpression yypv[p3+p4] + yypv[p1+p2] is being used as a subscript to
determine which cell of the yyval array to increment by 2.)
The sentence on p. 51 that includes the words ``the assignment statement has a value'' is a bit misleading: an
assignment is really an expression in C. Like any expression, it has a value, and it can therefore participate as a
subexpression in a larger expression. (If the distinction between the terms ``statement'' and `èxpression'' seems
vague, don't worry; we'll start talking about statements in the next chapter.)
http://www.eskimo.com/~scs/cclass/krnotes/sx5j.html (1 of 2) [22/07/2003 5:08:35 PM]


``Ternary'' is a ten-dollar word meaning ``having three operands.'' (It's analogous to the terms unary and
binary, which refer to operators having one and two operands, respectively.) The conditional operator is a
bit of a frill, and it's a bit obscure, so you may skip section 2.11 in the book on first reading, but please
read the comments in these notes just below (under the mention of `ànnoying compulsion'').
page 52
To see what the ?: operator has bought us, here is what the array-printing loop might look like without
it:
for(i = 0; i < n; i++) {
printf("%6d", a[i]);
if(i%10==9 || i==n-1)
printf("\n");
else
printf(" ");
}
You may be finding this compulsion to write ``compact'' or ``concise'' code using operators like ++ and
+= and ?: a bit annoying. There are three things to know:
1. In complicated code, these operators allow an economy of expression which is beneficial.
Mathematicians are constantly inventing new notations, in which one letter or symbol stands for a
complicated expression or operation, in order to solve complicated problems without drowning in
so much verbiage that it would be impossible to follow an argument or check for errors. Computer
programs are large and complex, so well-chosen abbreviations can make them easier to work
with, too.
2. Some C programmers, it's true, do take the urge to write succinct or concise code to excess, and
end up with cryptic, bewildering, obfuscated, impenetrable messes. (I'm not apologizing for them:
I hate overly abbreviated, impossible-to-read code, too!)
3. Since there is overly concise C code out there, it's occasionally necessary to dissect a piece of it
and figure out what it does, so you need to have enough familiarity with these operators, and with
some standard, idiomatic ways in which they're commonly combined, so that you won't be utterly
stymied.
However, there is nothing that says that you have to write concise code yourself. Don't be lured into
thinking that you're not a ``real C programmer'' until you routinely and easily write code which no one
else can read. Write in a style that's comfortable to you; don't be embarrassed if your code seems
``simple.'' (Actually, the very best code seems simple, too.) With time, you'll probably come to
appreciate at least some of the idioms, and to be comfortable enough with them that you may want to use
a few of them yourself, after all.


Note that precedence is not the same thing as order of evaluation. Precedence determines how an
expression is parsed, and it has an influence on the order in which parts of it are evaluated, but the
influence isn't as strong as you'd think. Precedence says that in the expression
1 + 2 * 3
the multiplication happens before the addition. But if we have several function calls, such as
f() + g() * h()
we have no idea which function will be called first; the compiler might arrange to call f() first even
though its value won't be needed until last. If we were to write an abomination like
i = 1;
a[i++] + a[i++] * a[i++]
we would have no way of knowing which order the three increments would happen in, and in fact the
compiler wouldn't have any idea either. We could not argue that since multiplication has higher
precedence than addition, and since multiplication associates from left to right, the second i++ would
have to happen first, then the third, then the first. (Actually, associativity never says anything about
which side of a single binary operator gets evaluated first; associativity says which of several adjacent
same-precedence operators happens first.)
In general, you should be wary of ever trying to second-guess the relative order in which the various
parts of an expression will be evaluated, with two exceptions:
1. You can obviously assume that precedence will dictate the order in which binary operators are
applied. This typically says more than just what order things happens in, but also what the
expression actually means. (In other words, the precedence of * over + says more than that the
multiplication ``happens first'' in 1 + 2 * 3; it says that the answer is 7, not 9.)
2. You can assume that the && and || operators are evaluated left-to-right, and that the right-hand
side is not evaluated at all if the left-hand side determines the outcome.
To look at one more example, it might seem that the code
int i = 7;
printf("%d\n", i++ * i++);
would have to print 56, because no matter which order the increments happen in, 7x8 is 8x7 is 56. But
http://www.eskimo.com/~scs/cclass/krnotes/sx5l.html (1 of 3) [22/07/2003 5:08:39 PM]
++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if it chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
to do anything reasonable with them at all, such that the above code might end up printing 42, or
8923409342, or 0, or crashing your computer.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
f() + (g() * h())
we still wouldn't know whether f(), g(), or h() would be called first. (The parentheses would force
the multiplication to happen before the addition, but precedence already would have forced that,
anyway.) If we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
order; this parenthesized version would be just as undefined as i++ * i++ was.
page 53
Deep sentence:
Function calls, nested assignment statements, and increment and decrement operators
cause ``side effects''--some variable is changed as a by-product of the evaluation of an
expression.
(There's a slight inaccuracy in this sentence: any assignment expression counts as a side effect.)
It's these ``side effects'' that you want to keep in mind when you're making sure that your programs are
well-defined and don't suffer any of the undefined behavior we've been discussing. (When we informally
said that complex expressions had several things going on `àt once,'' we were actually referring to
expressions with multiple side effects.) As a general rule, you should make sure that each expression
only has one side effect, or if it has several, that different variables are changed by the several side
effects.
page 54
Deep sentence:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. And you don't want to write programs that work only under one particular compiler, that take
advantage of the way that compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on `ùnder the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.


section 3.1: Statements and Blocks
section 3.2: If-Else
section 3.3: Else-If
section 3.4: Switch
section 3.5: Loops--While and For
section 3.6: Loops--Do-while
section 3.7: Break and Continue
section 3.8: Goto and Labels


page 55
Deep sentence:
There is no semicolon after the right brace that ends a block.
Nothing more to say here, but it's a frequent point of confusion.

http://www.eskimo.com/~scs/cclass/krnotes/sx6a.html [22/07/2003 5:08:41 PM]

The syntax description here may seem to suggest that statement<sub>1</sub> and
statement<sub>2</sub> must be single, simple statements, but, as mentioned in section 3.1, a block
of statements enclosed in braces {} is equivalent to a single statement.
page 56
``Coding shortcuts'' like
if(expression)
can indeed be cryptic, but they're also quite common, so you'll need to be able to recognize them even if
you don't choose to write them in your own code. Whenever you see code like
if (x)
or
if (f())
where x or f() do not have obvious ``Boolean'' names, just mentally add in != 0.
Don't worry too much if the multiple if/else ambiguity described on page 56 doesn't make perfect
sense; just note the deep sentence:
...it's a good idea to use braces when there are nested ifs.


pages 57-58
Binary search is an extremely important algorithm, but it turns out that it is subtle to get the
implementation just right. (It has been observed that although the first binary search was published in
1946, the first published binary search without bugs did not appear until 1962.) The basic idea is the
same as the algorithm we all tend to use when we're asked to guess a number between 1 and 100: `Ìs it
between 1 and 50? Yes? Okay, is it between 25 and 50? No? Okay, is it between 1 and 12? ... '' (Don't
worry if you can't follow all of the details of the algorithm or the code on page 58, but do remember to be
extremely careful if you're ever asked to write a binary search routine.)

section 3.4: Switch
section 3.4: Switch

pages 58-59
We won't be concentrating on switch statements much (they're a bit of a luxury; there's nothing you
can do with a switch that you can't do with an if/else chain, as in section 3.3 on page 57). But
they're quite handy, and good to know about.
The example on page 59 is about as contrived as the example in section 1.6 (page 22) which it replaces,
but studying both examples will give you an excellent feel for how a switch statement works, what the
if/then statements are that a switch is equivalent to and how to map between the two, and why a
switch statement can be convenient.
In the example in the text, note especially the way that ten case labels are attached to one set of
statements (ndigit[c-'0']++;). As the authors point out, this works because of the way switch
cases ``fall through,'' which is a mixed blessing.
The danger of fall-through is illustrated by:
switch(food) {
case APPLE:
printf("apple\n");
case ORANGE:
printf("orange\n");
break;
default:
printf("other\n");
}
When food is APPLE, this code erroneously prints
apple
orange
because the break statement after the APPLE case was omitted.
section 3.4: Switch

section 3.5: Loops -- While and For

page 60
Remember that, as always, the statement can be a brace-enclosed block.
Make sure you understand how the for loop
for (expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
is equivalent to the while loop
expr<sub>1</sub>;
while (expr<sub>2</sub>) {
statement
expr<sub>3</sub> ;
}
There is nothing magical about the three expressions at the top of a for loop; they can be arbitrary
expressions, and they're evaluated just as the expansion into the equivalent while loop would suggest.
(Actually, there are two tiny differences: the behavior of continue, which we'll get to in a bit, and the
fact that the test expression, expr<sub>2</sub>, is optional and defaults to ``true'' for a for loop, but
is required for a while loop.)
for(;;) is one way of writing an infinite loop in C; the other common one is while(1). Don't worry
about what a break would mean in a loop, we'll be seeing it in a few more pages.
pages 60-61
Deep sentences:
Whether to use while or for is largely a matter of personal preference...
Nonetheless, it is bad style to force unrelated computations into the initialization and
increment of a for, which are better reserved for loop control operations.
In general, the three expressions in a for loop should all manipulate (initialize, test, and increment) the
same variable or data structure. If they don't, they are `ùnrelated computations,'' and a while loop
would probably be clearer. (The reason that one loop or the other can be clearer is simply that, when you
see a for loop, you expect to see an idiomatic initialize/test/increment of a single variable or data
structure, and if the for loop you're looking at doesn't end up matching that pattern, you've been
http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (1 of 5) [22/07/2003 5:08:48 PM]
momentarily misled.)
page 61
When the authors say that ``the index and limit of a C for loop can be altered from within the loop,''
they mean that a loop like
int i, n = 10;
for(i = 0; i < n; i++) {
if(i == 5)
i++;
printf("%d\n", i);
if(i == 8)
n++;
}
where i and n are modified within the loop, is legal. (Obviously, such a loop can be very confusing, so
you'll probably be better off not making use of this freedom too much.)
When they say that ``the index variable... retains its value when the loop terminates for any reason,'' you
may not find this too surprising, unless you've used other languages where it's not the case. The fact that
loop control variables retain their values after a loop can make some code much easier to write; for
example, the atoi function at the bottom of this page depends on having its i counter manipulated by
several loops as it steps over three different parts of the string (whitespace, sign, digits) with i's value
preserved between each step.
Deep sentence:
Each step does its part, and leaves things in a clean state for the next.
This is an extremely important observation on how to write clean code. As you study the atoi code,
notice that it falls into three parts, each implementing one step of the pseudocode description: skip white
space, get sign, get integer part and convert it. At each step, i points at the next character which that
step is to inspect. (If a step is skipped, because there is no leading whitespace or no sign, the later steps
don't care.)
You may hear the term invariant used: this refers to some condition which exists at all stages of a
program or function. In this case, the invariant is that i always points to the next character to be
inspected. Having some well-chosen invariants can make code much easier to write and maintain. If there
aren't enough invariants--if i is sometimes the next character to look at and sometimes the character that
was just looked at--debugging and maintaining the code can be a nightmare.
In the atoi example, the lines

for (i = 0; isspace(s[i]); i++) /* skip white space */
;
are about at the brink of ``forcing unrelated computations into the initialization and increment,''
especially since so much has been forced into the loop header that there's nothing left in the body. It
would be equally clear to write this part as
i = 0;
while (isspace(s[i]))
i++;
/* skip white space */
The line
sign = (s[i] == '-') ? -1 : 1;
may seem a bit cryptic at first, though it's a textbook example of the use of ?: . The line is equivalent to
sign = 1;
if(s[i] == '-')
sign = -1;
pages 61-62
It's instructive to study this Shell or ``gap'' sort, but don't worry if you find it a bit bewildering.
Deep sentence:
Notice how the generality of for makes the outer loop fit the same form as the others,
even though it is not an arithmetic progression.
The point is that loops don't have to count 0, 1, 2... or 1, 2, 3... . (This one counts n/2, n/4, n/8... .
Later we'll see loops which don't step over numbers at all.)
page 63
Deep sentence:
The commas that separate function arguments, variables in declarations, etc. are not
comma operators...
This looks strange, but it's true. If you say

for (i = 0, j = strlen(s)-1; i < j; i++, j--)
the first comma says to do i = 0 then do j = strlen(s)-1, and the second comma says to do i++
then do j--. However, when you say
getline(line, MAXLINE);
the comma just separates the two arguments line and MAXLINE; they both have to be evaluated, but it
doesn't matter in which order, and they're both passed to getline. (If the comma in a function call
were interpreted as a comma operator, the function would only receive one argument, since the value of
the first operand of the comma operator is discarded.) Since the comma operator discards the value of its
first operand, its first operand had better have a side effect. The expression
++a,++b
increments a and increments b and (if anyone cares) returns b's value, but the expression
a+1,b+1
adds 1 to a, discards it, and returns b+1.
If the comma operator isn't making perfect sense, don't worry about it for now. You're most likely to see
it in the first or third expression of a for statement, where it has the obvious meaning of separating two
(or more) things to do during the initialization or increment step. Just be careful that you don't
accidentally write things like
for(i = 0; j = 0; i < n && j < j; i++; j++)
/* WRONG */
for(i = 0, j = 0, i < n && j < j, i++, j++)
/* WRONG */
or
The correct form of a multi-index loop is something like

for(i = 0, j = 0; i < n && j < j; i++, j++)
Semicolons always separate the initialization, test, and increment parts; commas may appear within the
initialization and increment parts.

section 3.6: Loops -- Do-while

page 63
Note the semicolon following the parenthesized expression in the do-while loop; it's a required part of
the syntax.
Make sure you understand the difference between a while loop and a do-while loop. A while loop
executes strictly according to its conditional expression: if the expression is never true, the loop executes
zero times. The do-while loop, on the other hand, makes an initial ``no peek'' foray through the loop
body no matter what.
To see the difference, let's imagine three different ways of writing the loop in the itoa function on page
64. Suppose we somehow forgot to use a termination condition at all, and wrote something like
for(;;) {
s[i++] = n % 10 + '0';
n /= 10;
}
Eventually, n becomes zero, but we keep going around the loop, and we convert a number like 123 into a
string like "0000000000123", except with an infinite number of leading zeroes. (Mathematically, this
is correct, but it's not what we want here, especially if we want our program to use a finite amount of
time and space.)
Our next attempt might be
while(n > 0) {
s[i++] = n % 10 + '0';
n /= 10;
}
so that we stop creating digits when n reaches 0. This works fine for positive numbers, but for 0, it stops
too soon: it would convert the number 0 to the empty string "". That's why the do-while loop is
appropriate here; the fact that it always makes at least one pass through the loop makes sure that we
always generate at least one digit, even it it's 0.
(It's also useful to look at the invariants in this loop: during each trip through the loop, n contains the rest
of the number we have to convert, s[] contains the digits we've already converted, and i points at the
next cell in s[] which is to receive a digit. Each trip through the loop converts one digit, increments i,
and divides n by 10.)


pages 64-65
Note that a break inside a switch inside a loop causes a break out of the switch, while a break
inside a loop inside a switch causes a break out of the loop.
Neither break nor continue has any effect on a brace-enclosed block of statements following an if.
break causes a break out of the innermost switch or loop, and continue forces the next iteration of
the innermost loop.
There is no way of forcing a break or continue to act on an outer loop.
Another example of where continue is useful is when processing data files. It's often useful to allow
comments in data files; one convention is that a line beginning with a # character is a comment, and
should be ignored by any program reading the file. This can be coded with something like
while(getline(line, MAXLINE) > 0) {
if(line[0] == '#')
continue;
/* process data file line */
}
The alternative, without a continue, would be
while(getline(line, MAXLINE) > 0) {
if(line[0] != '#') {
/* process data file line */
}
}
but now the processing of normal data file lines has been made subordinate to comment lines. (Also, as
the authors note, it pushes most of the body of the loop over by another tab stop.) Since comments are
exceptional, it's nice to test for them, get them out of the way, and go on about our business, which the
code using continue nicely expresses.

pages 65-66
A tremendous amount of impassioned debate surrounds the lowly goto statement, which exists in many
languages. Some people say that gotos are fine; others say that they must never be used. You should
definitely shy away from gotos, but don't be dogmatic about it; some day, you'll find yourself writing
some screwy piece of code where trying to avoid a goto (by introducing extra tests or Boolean control
variables) would only make things worse.
page 66
When you find yourself writing several nested loops in order to search for something, such that you
would need to use a goto to break out of all of them when you do find what you're looking for, it's often
an excellent idea to move the search code out into a separate function. Doing so can make both the
``found'' and ``not found'' cases easier to handle. Here's a slight variation on the example in the middle of
page 66, written as a function:
/* return i such that a[i] == b[j] for some j, or -1 if none */
int findequal(int a[], int na, int b[], int nb)
{
int i, j;
for(i = 0; i < na; i++) {
for(j = 0; j < nb; j++) {
if(a[i] == b[j])
return i;
}
}
return -1;
}
This function can then be called as
i = findequal(a, na, b, nb);
if(i == -1)
/* didn't find any common element */
else
/* got one */
(The only disadvantage here is that it's trickier to return i and j if we need them both.)

Chapter 4: Functions and Program

Structure
page 67
Deep paragraph:
Functions break large computing tasks into smaller ones, and enable people to build on
what others have done instead of starting over from scratch. Appropriate functions hide
details of operation from parts of the program that don't need to know about them, thus
clarifying the whole, and easing the pain of making changes.
Functions are probably the most import weapon in our battle against software complexity. You'll want to
learn when it's appropriate to break processing out into functions (and also when it's not), and how to set
up function interfaces to best achieve the qualities mentioned above: reuseability, information hiding,
clarity, and maintainability.
The quoted sentences above show that a function does more than just save typing: a well-defined
function can be re-used later, and eases the mental burden of thinking about a complex program by
freeing us from having to worry about all of it at once. For a well-designed function, at any one time, we
should either have to think about:
1. that function's internal implementation (when we're writing or maintaining it); or
2. a particular call to the function (when we're working with code which uses it).
But we should not have to think about the internals when we're calling it, or about the callers when we're
implementing the internals. (We should perhaps think about the callers just enough to ensure that the
function we're designing will be easy to call, and that we aren't accidentally setting up so that callers will
have to think about any internal details.)
Sometimes, we'll write a function which we only call once, just because breaking it out into a function
makes things clearer and easier.
Deep sentence:
C has been designed to make functions efficient and easy to use; C programs generally
consist of many small functions rather than a few big ones.
Some people worry about ``function call overhead,'' that is, the work that a computer has to do to set up
and return from a function call, as opposed to simply doing the function's statements in-line. It's a risky
thing to worry about, though, because as soon as you start worrying about it, you have a bit of a
disincentive to use functions. If you're reluctant to use functions, your programs will probably be bigger
and more complicated and harder to maintain (and perhaps, for various reasons, actually less efficient).
The authors choose not to get involved with the system-specific aspects of separate compilation, but we'll
take a stab at it here. We'll cover two possibilities, depending on whether you're using a traditional
command-line compiler or a newer integrated development environment (IDE) or other graphical user
interface (GUI) compiler.
When using a command-line compiler, there are usually two main steps involved in building an
executable program from one or more source files. First, each source file is compiled, resulting in an
object file containing the machine instructions (generated by the compiler) corresponding to the code in
that source file. Second, the various object files are linked together, with each other and with libraries
containing code for functions which you did not write (such as printf), to produce a final, executable
program.
Under Unix, the cc command can perform one or both steps. So far, we've been using extremely simple
invocations of cc such as
cc hello.c
(section 1.1, page 6). This invocation compiles a single source file, links it, and places the executable
(somewhat inconveniently) in a file named a.out.
Suppose we have a program which we're trying to build from three separate source files, x.c, y.c, and
z.c. We could compile all three of them, and link them together, all at once, with the command
cc x.c y.c z.c
(see also page 70). Alternatively, we could compile them separately: the -c option to cc tells it to
compile only, but not to link. Instead of building an executable, it merely creates an object file, with a
name ending in .o, for each source file compiled. So the three commands
cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o, and z.o. Then, the three object
files could be linked together using
cc x.o y.o z.o
When the cc command is given an .o file, it knows that it does not have to compile it (it's an object file,
already compiled); it just sends it through to the link process.
Here we begin to see one of the advantages of separate compilation: if we later make a change to y.c,
only it will need recompiling. (At some point you may want to learn about a program called make,
which keeps track of which parts need recompiling and issues the appropriate commands for you.)
Above we mentioned that the second, linking step also involves pulling in library functions. Normally,
the functions from the Standard C library are linked in automatically. Occasionally, you must request a
library manually; one common situation under Unix is that certain math routines are in a separate math
library, which is requested by using -lm on the command line. Since the libraries must typically be
searched after your program's own object files are linked (so that the linker knows which library
functions your program uses), any -l option must appear after the names of your files on the command
line. For example, to link the object file mymath.o (previously compiled with cc -c mymath.c)
together with the math library, you might use
cc mymath.o -lm
Two final notes on the Unix cc command: if you're tired of using the nonsense name a.out for all of
your programs, you can use -o to give another name to the output (executable) file:
cc -o hello hello.c
would create an executable file named hello, not a.out. Finally, everything we've said about cc also
applies to most other Unix C compilers. Many of you will be using acc (a semistandard name for a
version of cc which does accept ANSI Standard C) or gcc (the FSF's GNU C Compiler, which also
accepts ANSI C and is free).
There are command-line compilers for MS-DOS systems which work similarly. For example, the
Microsoft C compiler comes with a CL (``compile and link'') command, which works almost the same as
Unix cc. You can compile and link in one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.
The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Turbo C or the Microsoft Programmer's Workbench or Think C, most
of the mechanical details are taken care of for you. (There's also less I can say here about these
environments, because they're all different.) Typically there's a way to specify the list of files (modules)
which make up your project, and a single ``build'' button which does whatever's required to build (and
perhaps even execute) your program.
section 4.1: Basics of Functions
section 4.2: Functions Returning Non-Integers
section 4.3: External Variables
section 4.4: Scope Rules
section 4.5: Header Files
section 4.6: Static Variables
section 4.7: Register Variables
section 4.8: Block Structure
section 4.9: Initialization
section 4.10: Recursion
section 4.11: The C Preprocessor
section 4.11.1: File Inclusion
section 4.11.2: Macro Substitution
section 4.11.3: Conditional Inclusion

page 68
Once again, notice how a clear, simple description of the problem we're trying to solve leads to an (almost)
equally clear program implementing it.
Here are some more nice statements about the virtues of a clean, modular design:
Although it's certainly possible to put the code for all of this in main, a better way is to use
the structure to advantage by making each part a separate function. Three small pieces are
easier to deal with than one big one, because irrelevant details can be buried in the functions,
and the chance of unwanted interactions is minimized. And the pieces may even be useful in
other programs.
Let's say a bit more about how and why functions can be useful. First, we can see that, having chosen to use
a separate function for each part of the print-matching-lines program, the top-level main routine on page
69 is particularly simple and straightforward; it's little more than a transcription into C of the pseudocode
on page 68. The authors don't tend to use too many comments in their code, anyway, but this code hardly
needs any: the names of the functions called speak for themselves. (The only thing that might not be
obvious at first is that strindex is being used not so much to find the index of a substring but just to
determine whether a substring is present at all.) Second, we may be pleased to notice that we're already
having a chance to re-use the getline function we first wrote in Chapter 1. Third, we note that the two
functions which we've chosen to use (getline and strindex) are themselves reasonably simple and
straightforward to write. Finally, note that sometimes what you re-use is not so much a function as a
function interface. The code on page 69 uses a new implementation of getline, but the interface (the
argument list, return value, and functionality) is the same as for the versions of getline in section 1.9 on
page 29. We could have used that version here, or this new version there. Later, if we think of some even
better way of reading lines, we can write yet another version of getline, and as long as it has the same
interface, these programs can call it without their having to be rewritten.
The ease with which a program like this comes together may be mildly deceptive, because nowhere have
we discussed the the motivations which led to the particular pseudocode description on page 68 or the
particular definitions of the functions which were chosen to break the problem down into. Choosing a
design for a program, and defining subfunctions (their interfaces and their behavior) are both arts, and of
course the tasks are not unrelated. A good design leads to the invention of functions which might well be
useful later, and an existing body of good, general-purpose functions (all crying out to be re-used) can help
to guide the design of the next program.
What makes a good building block, either an abstract one that we use in a pseudocode description, or a
concrete one in the form of a general-purpose function? The most important aspect of a good building
block is that have a single, well-defined task to perform. Two of the three functions used in the line-
matching program fill this role very well: getline's job is to read one line, and strindex'es job is to
find one string in another string. printf's specification is considerably broader: its job is to print stuff.
(It's not surprising that printf can therefore be the harder routine to call, and is certainly much harder to
implement. Its saving virtue is that it is nonetheless broadly applicable and infinitely reusable.)
When you find that a program is hard to manage, it's often because if has not been designed and broken up
into functions cleanly. Two obvious reasons for moving code down into a function are because:
1. It appeared in the main program several times, such that by making it a function, it can be written just
once, and the several places where it used to appear can be replaced with calls to the new function.
2. The main program was getting too big, so it could be made (presumably) smaller and more manageable
by lopping part of it off and making it a function.
These two reasons are important, and they represent significant benefits of well-chosen functions, but they
are not sufficient to automatically identify a good function. A good function has at least these two
additional attributes:
3. It does just one well-defined task, and does it well.
4. Its interface to the rest of the program is clean and narrow.
Attribute 3 is just a restatement of something we said above. Attribute 4 says that you shouldn't have to
keep track of too many things when calling a function. If you know what a function is supposed to do, and
if its task is simple and well-defined, there should be just a few pieces of information you have to give it to
act upon, and one or just a few pieces of information which it returns to you when it's done. If you find
yourself having to pass lots and lots of information to a function, or remember details of its internal
implementation to make sure that it will work properly this time, it's often a sign that the function is not
sufficiently well-defined. (It may be an arbitrary chunk of code that was ripped out of a main program that
was getting too big, such that it essentially has to have access to all of that main function's variables.)
The whole point of breaking a program up into functions is so that you don't have to think about the entire
program at once; ideally, you can think about just one function at a time. A good function is a ``black box'':
when you call it, you only have to know what it does (not how it does it); and when you're writing it, you
only have to know what it's supposed to do (and you don't have to know why or under what circumstances
its caller will be calling it). Some functions may be hard to write (if they have a hard job to do, or if it's
hard to make them do it truly well), but that difficulty should be compartmentalized along with the function
itself. Once you've written a ``hard'' function, you should be able to sit back and relax and watch it do that
hard work on call from the rest of your program. If you find that difficulties pervade a program, that the
hard parts can't be buried inside black-box functions and then forgotten about, if you find that there are hard
parts which involve complicated interactions among multiple functions, then the program probably needs
redesigning.
For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program'' into
a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. The ``main
program,'' main(), is itself just a function, and any function we find ourself writing will often be
appropriately written in terms of sub-functions, sub-sub-functions, etc.
That's probably enough for now about functions in general. Here are a few more notes about the linematching program.
The authors mention that ``The standard library provides a function strstr that is similar to strindex,
except that it returns a pointer instead of an index.'' We haven't met pointers yet (they're in chapter 5), so
we aren't quite in a position to appreciate the difference between an index and a pointer. Generally, an
index is a small number referring to some element of an array. A pointer is more general: it can point to any
data object of a particular type, whether it's one element of an array, or some other object anywhere in
memory. (Don't worry too much about the distinction yet, but bear in mind that there is a distinction. Note,
too, that the distinction is not absolute; in fact, the word `ìndex'' seems to derive from the concept of
pointing, as you can see if you think about what you use your index finger for, or if you notice that the
entries in a book's index point at the referenced parts of the book. We frequently speak casually of an index
variable ``pointing at'' some cell of an array, even though it's not a true pointer variable.)
One facet of the getline function's interface might bear mentioning: its first argument, the character
array s, is being used to return the line that it reads. This may seem to contradict the rule that a function
can never modify the value of a variable in its caller. As was briefly mentioned on page 28, there's an
exception for arrays, which well be learning about in chapter 5; for now, we'll gloss over the point.
(Actually, we're glossing over two points: not only is getline able to return a value via an argument, but
the argument isn't really an array, although it's declared as and looks like one. Please forgive these gentle
fictions; explaining them completely would really be premature at this point. Perhaps they weren't worth
mentioning yet, after all.)
For comparison, here is yet another version of getline:
int getline(char s[], int lim)
{
int c, i = 0;
while(--lim > 0 && (c=getchar()) != EOF) {
s[i++] = c;
if(c == '\n')
break;
}
s[i] = '\0';
return i;
}
Note that by using break, we avoid having to test for '\n' in two different places.
If you're having trouble seeing how the strindex function works, its algorithm is
for (each position i in s)
if (t occurs at position i in s)
return i;
(else) return -1;
Filling in the details of `ìf (t occurs at position i in s)'', we have:
for (each position i in s)
for (each character in t)
if (it matches the corresponding character in s)
if (it's '\0')
return i;
else
keep going
else
no match at position i
(else) return -1;
A slightly less compressed implementation than the one on page 69 would be:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
for(j = i, k = 0; t[k] != '\0'; j++, k++)
if(s[j] != t[k])
break;
if(t[k] == '\0')
return i;
}
return -1;
}
Note that we have to check for the end of the string t twice: once to see if we're at the end of it in the
innermost loop, and again to see why we terminated the innermost loop. (If we terminated the innermost
loop because we reached the end of t, we found a match; otherwise, we didn't.) We could rearrange things
to remove the duplicated test:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
j = i;
k = 0;
do {
if(t[k] == '\0')
return i;
} while(s[j++] == t[k++]);
}
return -1;
}
It's a matter of style which implementation of strindex is preferable; it's impossible to say which is
``best.'' (Can you see a slight difference in the behavior of the version on page 69 versus the two here?
Under what circumstance(s) would this difference be significant? How would the version on page 69
behave under those circumstances, and how would the two routines here behave?)
page 70
Deep sentence:
A program is just a set of definitions of variables and functions.
This sentence may or may not seem deep, and it may or may not be deep, but it's a fundamental definition
of what a C program is.
Note that a function's return value is automatically converted to the return type of the function, if necessary,
just as in assignments like
f = i;
where f is float and i is int.
Most programmers do use parentheses around the expression in a return statement, because that way it
looks more like while(), for(), etc. The reason the parentheses are optional is that the formal syntax is
return expression ;
and, as we know, any expression surrounded by parentheses is another expression.
It's debatable whether it's ``not illegal'' for a function to have return statements with and without values.
It's a ``sign of trouble'' at best, and undefined at worst. Another clear sign of trouble (which is equally
undefined) is when a function returns no value, or is declared as void, but a caller attempts to use the
return value.
The main program on page 69 returns the number of matching lines found. This is probably better than
returning nothing, but the convention is usually that a C program returns 0 when it succeeds and a positive
number when it fails.


page 71
Actually, we may have seen at least one function returning a non-integer, in the Fahrenheit-Celsius
conversion program in exercise 1-15 on page 27 in section 1.7.
The type name which precedes the name of a function (and which sets its return type) looks just like (i.e.
is syntactically the same as) the void keyword we've been using to identify functions which don't return
a value.
Note that the version of atof on page 71 does not handle exponential notation like 1.23e45; handling
exponents is left for exercise 4-2 on page 73.
``The standard library includes an atof'' means that we're reimplementing something which would
otherwise be provided for us anyway (i.e. just like printf). In general, it's a bad idea to rewrite
standard library routines, because by doing so you negate the advantage of having someone else write
them for you, and also because the compiler or linker are allowed to complain if you redefine a standard
routine. (On the other hand, seeing how the standard library routines are implemented can be a good
learning experience.)
page 72
In the ``primitive calculator'' code at the top of page 72, note that the call to atof is buried in the
argument list of the call to printf.
Deep sentences:
The function atof must be declared and defined consistently. If atof itself and the call
to it in main have inconsistent types in the same source file, the error will be detected by
the compiler. But if (as is more likely) atof were compiled separately, the mismatch
would not be detected, atof would return a double that main would treat as an int,
and meaningless answers would result.
The problems of mismatched function declarations are somewhat reduced today by the widespread use of
ANSI function prototypes, but they're still important to be aware of.
The implicit function declarations mentioned at the bottom of page 72 are an older feature of the
language. They were handy back in the days when most functions returned int and function prototypes
hadn't been invented yet, but today, if you want to use prototypes, you won't want to rely on implicit
declarations. If you don't like depending on defaults and implicit declarations, or if you do want to use
function prototypes religiously, you're under no compunction to make use of (or even learn about)
implicit function declarations, and you'll want to configure your compiler so that it will warn you if you
call a function which does not have an explicit, prototyped declaration in scope.
You may wonder why the compiler is able to get some things right (such as implicit conversions between
integers and floating-point within expressions) whether or not you're explicit about your intentions, while
in other circumstances (such as while calling functions returning non-integers) you must be explicit. The
question of when to be explicit and when to rely on the compiler hinges on several questions:
1. How much information does the compiler have available to it?
2. How likely is it that the compiler will infer the right action?
3. How likely is it that a mistake which you the programmer might make will be caught by the
compiler, or silently compiled into incorrect code?
It's fine to depend on things like implicit conversions as long as the compiler has all the information it
needs to get them right, unambiguously. (Relying on implicit conversions can make code cleaner, clearer,
and easier to maintain.) Relying on implicit declarations, however, is discouraged, for several reasons.
First, there are generally fewer declarations than expressions in a program, so the impact (i.e. work) of
making them all explicit is less. Second, thinking about declarations is good discipline, and requiring that
everything normally be declared explicitly can let the compiler catch a number of errors for you (such as
misspelled functions or variables). Finally, since the compiler only compiles one source file at a time, it
is never able to detect inconsistencies between files (such as a function or variable declared one way in
once source file and some other way in another), so it's important that cross-file declarations be explicit
and consistent. (Various strategies, such as placing common declarations in header files so that they can
be #included wherever they're needed, and requesting that the compiler warn about function calls without
prototypes in scope, can help to reduce the number of errors having to do with improper declarations.)
For the most part, you can also ignore the `òld style'' function syntax, which hardly anyone is using any
more. The only thing to watch out for is that an empty set of parentheses in a function declaration is an
old-style declaration and means `ùnspecified arguments,'' not ``no arguments.'' To declare a new-style
function taking no arguments, you must include the keyword void between the parentheses, which
makes the lack of arguments explicit. (A declaration like
int f(void);
does not declare a function accepting one argument of type void, which would be meaningless, since
the definition of type void is that it is a type with no values. Instead, as a special case, a single,
unnamed parameter of type void indicates that a function takes no arguments.) For example, the
definition of the getchar function might look like
int getchar(void)
{
int c;
read next character into c somehow
if (no next character)
return EOF;
return c;
}
page 73
Note that this version of atoi, written in terms of atof, has very slightly different behavior: it reads
past a '.' (and, assuming a fully-functional version of atof, an 'e').
The use of an explicit cast when returning a floating-point expression from a routine declared as
returning int represents another point on the spectrum of what you should worry about explicitly versus
what you should feel comfortable making use of implicitly. This is a case where the compiler can do the
``right thing'' safely and unambiguously, as long as what you said (in this case, to return a floating-point
expression from a routine declared as returning int) is in fact what you meant. But since the real
possibility exists that discarding the fractional part is not what you meant, some compilers will warn you
about it. Typically, compilers which warn about such things can be quieted by using an explicit cast; the
explicit cast (even though it appears to ask for the same conversion that would have happened implicitly)
serves to silence the warning. (In general, it's best to silence spurious warnings rather than just ignoring
them. If you get in the habit of ignoring them, sooner or later you'll overlook a significant one that you
would have cared about.)


The word `èxternal'' is, roughly speaking, equivalent to ``global.''
page 74
A program with ``too many data connections between functions'' hasn't managed to achieve the desirable
attributes we were talking about earlier, in particular that a function's `ìnterface to the rest of the
program is clean and narrow.'' Another bit of jargon you may hear is the word ``coupling,'' which refers
to how much one piece of a program has to know about another.
In general, as we have mentioned, the connections between functions should generally be few and welldefined, in which case they will be amenable to regular old function arguments, and you won't be
tempted to pass lots of data around in global variables. (On the other hand, global variables are fine for
some things, such as configuration information which the whole program cares about and which is set
just once at program startup and then doesn't change.)
The word ``lifetime'' refers to how long a variable and its value stick around. (The jargon term is
``duration.'') So far, we've seen that global variables persist for the life of the program, while local
variables last only as long as the functions defining them are active. However, lifetime (duration) is a
separate and orthogonal concept from scope; we'll soon be meeting local variables which persist for the
life of the program.
Deep sentence:
Thus if two functions must share some data, yet neither calls the other, it is often most
convenient if the shared data is kept in external variables rather than passed in and out via
arguments.
(Later, though, we'll learn about data structures which can make it more convenient to pass certain data
around via function arguments, so we'll have less reason for using external variables for these sorts of
purposes.)
``Reverse Polish'' is used by some (earlier, all) Hewlett-Packard calculators. (The name is based on the
nationality of the mathematician who studied and formalized this notation.) It may seem strange at first,
but it's natural if you observe that you need both numbers (operands) before you can carry out an
operation on them. (This fact is one of the reasons that reverse Polish notation is `èasier to implement.'')
The calculator example is a bit long and a bit involved, but I urge you to work through and understand it.
A calculator is something that everyone's likely to be familiar with; it's interesting to see how one might
work inside; and the techniques used here are generally useful in all sorts of programs.
A ``stack'' is simply a last-in, first-out list. You ``push'' data items onto a stack, and whenever you ``pop''
an item from the stack, you get the one most recently pushed.
pages 76-79
The code for the calculator may seem daunting at first, but it's much easier to follow if you look at each
part in isolation (as good functions are meant to be looked at), and notice that the routines fall into three
levels. At the top level is the calculator itself, which resides in the function main. The main function
calls three lower-level functions: push, pop, and getop. getop, in turn, is written in terms of the still
lower-level functions getch and ungetch.
A few details of the communication among these functions deserve mention. The getop routine actually
returns two values. Its formal return value is a character representing the next operation to be performed.
Usually, that character is just the character the user typed, that is, +, -, *, or /. In the case of a number
typed by the user, the special code NUMBER is returned (which happens to be #defined to be the
character '0', but that's arbitrary). A return value of NUMBER indicates that an entire string of digits has
been typed, and the string itself is copied into the array s passed to getop. In this case, therefore, the
array s is the second return value.
In some printings, the second line on page 76 reads
#include <math.h>
/* for atof() */
which is incorrect; it should be

#include <stdlib.h>
/* for atof() */
page 77
Make sure you understand why the code
push(pop() - pop());
/* WRONG */
might not work correctly.

``The representation can be hidden'' means that the declarations of these variables can follow main in
the file, such that main can't ``see'' them (that is, can't attempt to refer to them). Furthermore, as we'll see,
the declarations might be moved to a separate source file, and main won't care.
pages 77-78
Note that getop does not incorporate the functionality of atoi or atof--it collects and returns the
digits as a string, and main calls atof to convert the string to a floating-point number (prior to pushing
it on the stack). (There's nothing profound about this arrangement; there's no particular reason why
getop couldn't have been set up to do the conversion itself.)
The reasons for using a routine like ungetch are good and sufficient, but they may not be obvious at
first. The essential motivation, as the authors explain, is that when we're reading a string of digits, we
don't know when we've reached the end of the string of digits until we've read a non-digit, and that nondigit is not part of the string of digits, so we really shouldn't have read it yet, after all. The rest of the
program is set up based on the assumption that one call to getop will return the string of digits, and the
next call will return whatever operator followed the string of digits.
To understand why the surprising and perhaps kludgey-sounding getch/ungetch approach is in fact a
good one, let's consider the alternatives. getop could keep track of the one-too-far character somehow,
and remember to use it next time instead of reading a new character. (Exercise 4-11 asks you to
implement exactly this.) But this arrangement of getop is considerably less clean from the standpoint of
the `ìnvariants'' we were discussing earlier. getop can be written relatively cleanly if one of its
invariants is that the operator it's getting is always formed by reading the next character(s) from the input
stream. getop would be considerably messier if it always had to remember to use an old character if it
had one, or read a new character otherwise. If getop were modified later to read new kinds of operators,
and if reading them involved reading more characters, it would be easy to forget to take into account the
possibility of an old character each time a new character was needed. In other words, everywhere that
getop wanted to do the operation
read the next character
it would instead have to do
if (there's an old character)
use it
else
read the next character
It's much cleaner to push the checking for an old character down into the getch routine.
Devising a pair of routines like getch and ungetch is an excellent example of the process of
abstraction. We had a problem: while reading a string of digits, we always read one character too far.
The obvious solution--remembering the one-too-far character and using it later--would have been clumsy
if we'd implemented it directly within getop. So we invented some new functions to centralize and
encapsulate the functionality of remembering accidentally-read characters, so that getop could be
written cleanly in terms of a simple ``get next character'' operation. By centralizing the functionality, we
make it easy for getop to use it consistently, and by encapsulating it, we hide the (potentially ugly)
details from the rest of the program. getch and ungetch may be tricky to write, but once we've
written them, we can seal up the little black boxes they're in and not worry about them any more, and the
rest of the program (especially getop) is cleaner.
page 79
If you're not used to the conditional operator ?: yet, here's how getch would look without it:
int getch(void)
{
if (bufp > 0)
return buf[--bufp];
else
return getchar();
}
Also, the extra generality of these two routines (namely, that they can push back and remember several
characters, a feature which the calculator program doesn't even use) makes them a bit harder to follow.
Exercise 4-8 asks you two write simpler versions which allow only one character of pushback. (Also, as
the text notes, we don't really have to be writing ungetch at all, because the standard library already
provides an ungetc which can provide one character of pushback for getchar.)
When we defined a stack, we said that it was ``last-in, first-out.'' Are the versions of getch and
ungetch on page 79 last-in, first-out or first-in, first out? Do you agree with this choice?
One last note: the name of the variable bufp suggests that it is a pointer, but it's actually an index into
the buf array.


page 80
With respect to the ``practical matter'' of splitting the calculator program up into multiple source files,
though it's certainly small enough to fit comfortably into a single source file, it's not so small that there's
anything wrong with splitting it up into multiple source files, especially if we start adding functionality to
it.
The scope of a name is what we have been calling its ``visibility.'' When we say things like ``calling a
function with a prototype in scope'' we mean that a prototype is visible, that a declaration is in effect.
The variables sp and val can be used by the push and pop routines because they're defined in the
same file (and the definitions appear before push and pop). They can't be used in main because no
declaration for them appears in main.c (nor in calc.h, which main.c #includes). If main
attempted to refer to sp or val, they'd be flagged as undefined. (Don't worry about the visibility of
``push and pop themselves.'')
The paragraph beginning `Òn the other hand'' is explaining how global (`èxternal'') variables like sp
and val could be accessed in a file other than the file where they are defined. In the examples we've
been looking at, as we've said, sp and val can be used in push and pop because the variables are
defined above the functions. If the variables were defined elsewhere (i.e. in some other file), we'd need a
declaration above--and that's exactly what extern is for. (See page 81 for an example.)
page 81
A definition creates a variable, and for any given global variable, you only want to do that once.
Anywhere else, you want to refer to an existing variable, created elsewhere, without creating a new,
conflicting one. Referring to an existing variable or function is exactly what a declaration is for.
Note also that the definition may optionally initialize the variable. (Don't worry about why a declaration
may optionally include an array dimension.)
``This same organization would also be needed if the definitions of sp and val followed their use in one
file'' means that we could conceivably have, in one file,
extern int sp;
extern double val[];
void push(double f) { ... }
double pop(void) { ... }

int sp = 0;
double val[MAXVAL];
So `èxtern'' just means ``somewhere else''; it doesn't have to mean `ìn a different file,'' though usually
it does.


page 82
By the way, the ``.h'' traditionally used in header file names simply stands for ``header.''
We can imagine several strategies for using header files. At one extreme would be to use zero header
files, and to repeat declarations in each file which needed them. This would clearly be a poor strategy,
because whenever a declaration changed, we would have to remember to change it in several places, and
it would be easy to miss one of them, leading to stubborn bugs. At the other extreme would be to use one
header file for each source file (declaring just the things defined in that source file, to be #included by
files using those things), but such a proliferation of header files would usually be unwieldy. For small
projects (such as the calculator example), it's a reasonable strategy to use one header file for the entire
project. For larger projects, you'll usually have several header files for sets of related declarations.


page 83
Deep sentence:
The static declaration, applied to an external variable or function, limits the scope of
that object to the rest of the source file being compiled. External static thus provides a
way to hide names like buf and bufp in the getch-ungetch combination, which must
be external so they can be shared, yet which should not be visible to users of getch and
ungetch.
So we can have three kinds of declarations: local to one function, restricted to one source file, or global
across potentially many source files. We can imagine other possibilities, but these three cover most
needs.
Notice that the static keyword does two completely different things. Applied to a local variable (one
inside of a function), it modifies the lifetime (``duration'') of the variable so that it persists for as long as
the program does, and does not disappear between invocations of the function. Applied to a variable
outside of a function (or to a function) static limits the scope to the current file.
To summarize the scope of external and static functions and variables: when a function or global
variable is defined without static, its scope is potentially the entire program, although any file which
wishes to use it will generally need an extern declaration. A definition with static limits the scope
by prohibiting other files from accessing a variable or function; even if they try to use an extern
declaration, they'll get errors about `ùndefined externals.''
The rules for declaring and defining functions and global variables, and using the extern and static
keywords, are admittedly complicated and somewhat confusing. You don't need to memorize all of the
rules right away: just use simple declarations and definitions at first, and as you find yourself needing
some of the more complicated possibilities such as static variables, the rules will begin to make more
sense.

http://www.eskimo.com/~scs/cclass/krnotes/sx7f.html [22/07/2003 5:09:06 PM]

page 83
The register keyword is only a hint. The compiler might not put something in a register even though
you ask it to, and it might put something in a register even though you don't ask it to. Most modern
compilers do a good job of deciding when to put things in registers, so most of the time, you don't need
to worry about it, and you don't have to use the register keyword at all.
(A note to assembly language programmers: there's no way to specify which register a register
variable gets assigned to. Also, when you specify a function parameter as register, it just means that
the local copy of the parameter should be copied to a register if possible; it does not necessarily indicate
that the parameter is going to be passed in a register.)


pages 84-85
You've probably heard that global variables are ``bad'' because they exist everywhere and it can be hard
to keep track of who's using them. In the same way, it can be useful to limit the scope of a local variable
to just the bit of the function that uses it, which is exactly what happens if we declare a variable in an
inner block.


page 85
These are some of the rules on initialization; we'll learn a few more later as we learn about a few more
data types.
If you don't feel like memorizing the rules for default initialization, just go ahead and explicitly initialize
everything you care about.
Earlier we said that C is quite general in its treatment of expressions: anywhere you can use an
expression, you can use any expression. Here's an exception to that rule: in an initialization of an external
or static variable (strictly speaking, any variable of static duration; generally speaking, any global
variable or local static variable), the initializer must be a constant expression, with value
determinable at compile time, without calling any functions. (This rule is easy to understand: since these
initializations happen conceptually at compile time, before the program starts running, there's no way for
a function call--that is, some run-time action--to be involved.)
page 86
It probably won't concern you right away, but it turns out that there's another exception about the
allowable expressions in initializers: in the brace-enclosed list of initializers for an array, all of the
expressions must be constant expressions (even for local arrays).
There is an error in some printings: if there are fewer explicit initializers than required for an array, the
others will be initialized to zero, for external, static, and automatic (local) arrays. (When an automatic
array has no initializers at all, then it contains garbage, just as simple automatic variables do.)
If the initialization
char pattern[] = "ould";
makes sense to you, you're fine. But if the statement that
char pattern[] = "ould";
is equivalent to
char pattern[] = { 'o', 'u', 'l', 'd', '\0' };
bothers you at all, study it until it makes sense. Also, note that a character array which seems to contain
(for example) four characters actually contains five, because of the terminating '\0'.


page 86
Recursion is a simple but deep concept which is occasionally presented somewhat bewilderingly. Please
don't be put off by it. If this section stops making sense, don't worry about it; we'll revisit recursion in
chapter 6.
Earlier we said that a function is (or ought to be) a ``black box'' which does some job and does it well.
Whenever you need to get that job done, you're supposed to be able to call that function. You're not
supposed to have to worry about any reasons why the function might not be able to do that job for you
just now.
It turns out that some functions are naturally written in such a way that they can do their job by calling
themselves to do part of their job. This seems like a crazy idea at first, but based on a strict interpretation
of our observation about functions--that we ought to be able to call them whenever we need their job
done--calling a function from within itself ought not to be illegal, and in fact in C it is legal. Such a call is
called a recursive call, and it works because it's possible to have several instances of a function active
simultaneously. They don't interfere with each other, because each instance has its own copies of its
parameters and local variables. (However, if a function accesses any static or global data, it must be
written carefully if it is to be called recursively, because then different instances of it could interfere with
each other.)
Let's consider the printd example rather carefully. First, remind yourself about the reverse-order
problem from the itoa example on page 64 in section 3.6. The `òbvious'' algorithm for determining the
digits in a number, which involves successively dividing it by 10 and looking at the remainders,
generates digits in right-to-left order, but we'd usually like them in left-to-right order, especially if we're
printing them out as we go. Let's see if we can figure out another way to do it.
It's easy to find the lowest (rightmost) digit; that's n % 10. It's easy to compute all but the lowest digit;
that's n / 10. So we could print a number left-to-right, directly, without any explicit reversal step, if we
had a routine to print all but the last digit. We could call that routine, then print the last digit ourselves.
But--here's the surprise--the routine to ``print all but the last digit'' is printd, the routine we're writing,
if we call it with an argument of n / 10.
Recursion seems like cheating--it seems that if you're writing a routine to do something (in this case, to
print digits) and instead of writing code to print digits you just punt and call a routine for printing digits
and which is in fact the very routine you're supposed to write--it seems like you haven't done the job you
came to do. A recursive function seems like circular reasoning; it seems to beg the question of how it
does its job.
But if you're writing a recursive function, as long as you do a little bit of work yourself, and only pass on
a portion of the job to another instance of yourself, you haven't completely reneged on your
responsibilities. Furthermore, if you're ever called with such a small job to do that the little bit you're
willing to do encompasses the whole job, you don't have to call yourself again (there's no remaining
portion that you can't do). Finally, since each recursive call does some work, passing on smaller and
smaller portions to succeeding recursive calls, and since the last call (where the remaining portion is
empty) doesn't generate any more recursive calls, the recursion is broken and doesn't constitute an
infinite loop.
Don't worry about the quicksort example if it seems impenetrable--quicksort is an important algorithm,
but it is not easy to understand completely at first.
Note that the qsort routine described here is very different from the standard library qsort (in fact, it
probably shouldn't even have the same name).


page 88
We've been using #include and #define already, but now we'll describe them more completely.

http://www.eskimo.com/~scs/cclass/krnotes/sx7k.html [22/07/2003 5:09:13 PM]

The two syntaxes for #include lines can be used in various ways, but very simply speaking, "" is for
header files you've written, and <> is for headers which are provided for you (which someone else has
written).
page 89
Deep sentences:
#include is the preferred way to tie the declarations together for a large program. It
guarantees that all the source files will be supplied with the same definitions and variable
declarations, and thus eliminates a particularly nasty kind of bug. Naturally, when an
included file is changed, all files that depend on it must be recompiled.
That's the story on #include, in a nutshell.


#defines last for the whole file; you can't have local ones like you can for local variables.
``Substitutions are made only for tokens'' means that a substitutable macro name is only recognized when
it stands alone. Also, substitution never happens in quoted strings, because it turns out that you usually
don't want it to. Strings are generally used for communication with the user, while you want substitutions
to happen where you're talking to the compiler.
The point of the ``forever'' example is to demonstrate that the replacement text doesn't have to be a
simple number or string constant. You'd use the forever macro like this:
forever {
...
}
which the preprocessor would expand to
for (;;) {
...
}
which, as we learned in section 3.5 on page 60, is an infinite loop. (Presumably there's a break; see
section 3.7 p. 64.)
Another popular trick is
#define ever ;;
so that you can say
for(ever) {
...
}
But ``preprocessor tricks'' like these tend to get out of hand very quickly; if you use too many of them
you're not writing in C any more but rather in your own peculiar dialect, and no one will be able to read
your code without understanding all of your ``silly little macros.'' It is best if simple macros expand to
simple constants (or expressions).
Macros with arguments are also called ``function-like macros'' because they act almost like miniature
functions. There are some important differences, however:
no call-by-value copying semantics

no space saving
hard to have local variables or block structure
have to parenthesize carefully (see below)
page 90
The correct way to write the square() macro is
#define square(x) ((x) * (x))
There are three rules to remember when defining function-like macros:
1. The macro expansion must always be parenthesized so that any low-precedence operators it
contains will still be evaluated first. If we didn't write the square() macro carefully, the
invocation
1 / square(n)
might expand to
1 / n * n
while it should expand to
1 / (n * n)
2. Within the macro definition, all occurrences of the parameters must be parenthesized so that any
low-precedence operators the actual arguments contain will be evaluated first. If we didn't write
the square() macro carefully, the invocation
square(n + 1)
might expand to
n + 1 * n + 1
while it should expand to
(n + 1) * (n + 1)
3. If a parameter appears several times in the expansion, the macro may not work properly if the
actual argument is an expression with side effects. No matter how we parenthesize the
square() macro, the invocation
square(i++)
would result in
i++ * i++
(perhaps with some parentheses), but this expression is undefined, because we don't know when
the two increments will happen with respect to each other or the multiplication.
Since the square() macro can't be written perfectly safely, (arguments with side effects will always be
troublesome), its callers will always have to be careful (i.e. not to call it with arguments with side
effects). One convention is to capitalize the names of macros which can't be treated exactly as if they
were functions:
#define Square(x) ((x) * (x))
page 90 continued
#undef can be used when you want to give a macro restricted scope, if you can remember to undefine it
when you want it to go out of scope. Don't worry about ``[ensuring] that a routine is really a function, not
a macro'' or the getchar example.
Also, don't worry about the # and ## operators. These are new ANSI features which aren't needed except
in relatively special circumstances.


page 91
The #if !defined(HDR) trick is a bit esoteric to start out with. Let's look at a simpler example: in
ANSI C, the remove function deletes a file. On some older Unix systems, however, the function to
delete a file is instead named unlink. Therefore, when deleting a file, we might use code like this:
#if defined(unix)
unlink(filename);
#else
remove(filename);
#endif
We would arrange to have the macro unix defined when we were compiling our program on a Unix
machine, and not otherwise.
You may wonder what the difference is between the if() statement we've been using all along, and this
new #if preprocessing directive. if() acts at run time; it selects whether or not a statement or group of
statements is executed, based on a run-time condition. #if, on the other hand, acts at compile time; it
determines whether certain parts of your program are even seen by the compiler or not. If for some
reason you want to have two slightly different versions of your program, you can use #if to separate the
different parts, leaving the bulk of the code common, such that you don't have to maintain two totally
separate versions.
#if can be used to conditionally compile anything: not just statements and expressions, but also
declarations and entire functions.
Back to the HDR example (though this is somewhat of a tangent, and it's not vital for you to follow it): it's
possible for the same header file to be #included twice during one compilation, either because the
same #include line appears twice within the same source file, or because a source file contains
something like
#include "a.h"
#include "b.h"
but b.h also #includes a.h. Since some declarations which you might put in header files would
cause errors if they were acted on twice, the #if !defined(HDR) trick arranges that the contents of
a header file are only processed once.
Note that two different macros, both named HDR, are being used on page 91, for two entirely different
purposes. At the top of the page, HDR is a simple on-off switch; it is #defined (with no replacement
text) when hdr.h is #included for the first time, and any subsequent #inclusion merely tests whether
HDR is #defined. (Note that it is in fact quite possible to define a macro with no replacement text; a
macro so defined is distinguishable from a macro which has not been #defined at all. One common
use of a macro with no replacement text is precisely as a simple #if switch like this.)
At the bottom of the page, HDR ends up containing the name of a header file to be #included; the
name depends on the #if and #elif directives. The line
#include HDR
#includes one of them, depending on the final value of HDR.


page 93
Pointers are often thought to be the most difficult aspect of C. It's true that many people have various
problems with pointers, and that many programs founder on pointer-related bugs. Actually, though, many
of the problems are not so much with the pointers per se but rather with the memory they point to, and
more specifically, when there isn't any valid memory which they point to. As long as you're careful to
ensure that the pointers in your programs always point to valid memory, pointers can be useful, powerful,
and relatively trouble-free tools. (In these notes, we'll be emphasizing techniques for ensuring that
pointers always point where they should.)
If you haven't worked with pointers before, they're bound to be a bit baffling at first. Rather than
attempting a complete definition (which probably wouldn't mean anything, either) up front, I'll ask you to
read along for a few pages, withholding judgment, and after we've seen a few of the things that pointers
can do, we'll be in a better position to appreciate what they are.
section 5.1: Pointers and Addresses
section 5.2: Pointers and Function Arguments
section 5.3: Pointers and Arrays
section 5.4: Address Arithmetic
section 5.5: Character Pointers and Functions
section 5.6: Pointer Arrays; Pointers to Pointers
section 5.7: Multi-dimensional Arrays
section 5.8: Initialization of Pointer Arrays
section 5.9: Pointers vs. Multi-dimensional Arrays
section 5.10: Command-line Arguments


If you like to use concrete examples and to think about exactly what's going on at the machine level,
you'll want to know how many bytes are occupied by shorts, longs, pointers, etc. It's equally possible,
though, to understand pointers at a more abstract level, thinking about them only in terms of boxes and
arrows, as in the figures on pages 96, 98, 104, 107, and 114-5. (Not worrying about the exact size in
bytes basically means not worrying about how big the boxes are.) The figure at the bottom of page 93 is
probably the least pretty pointer picture in the whole book; don't worry if it doesn't mean much to you.
When we say that a pointer holds an `àddress,'' and that unary & is the `àddress of'' operator, our
language is of course influenced by the fact that the underlying hardware assigns addresses to memory
locations, but again, it is not necessary (nor necessarily desirable) to think about actual machine
addresses when working with pointers. Thinking about the machine addresses can make certain aspects
of pointers easier to understand, but doing so can also make certain mistakes and misunderstandings
easier. In particular, a pointer in C is more than just an address; as we'll see on the next page, a pointer
also carries the notion of what type of data it points to.
page 94
The presentation on this page is going to seem very artificial at first. At best, you're going to say, ``This
makes sense, but what's it for?'' In fact, it is artificial, and no real program would ever do meaningless
little pointer operations such as are embodied in the example on this page. However, this is the traditional
way to introduce pointers from scratch, and once we've moved past it, we'll be able to talk about some
more meaningful uses of pointers, and to forget about these artificial ones. (Once we're done talking
about the traditional, artificial introduction on page 94, we'll also attempt a slightly more elaborate,
slightly less traditional, slightly more meaningful parallel introduction, so stay tuned.)
Deep sentence:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an int.
We'll have more to say about this sentence in a bit.
As an even more traditional, even less meaningful, even simpler example, we could say
int i = 1;
int *ip;
/* an integer */
/* a pointer-to-int */
ip = &i;
printf("%d\n", *ip);
*ip = 5;
/* ip points to i */
/* prints i, which is 1 */
/* sets i to 5 */
(The obvious questions are, `ìf you want to print i, or set it to 5, why not just do it? Why mess around
with this `pointer' thing?'' More on that in a minute.)
The unary & and * operators are complementary. Given an object (i.e. a variable), & generates a pointer
to it; given a pointer, * ``returns'' the value of the pointed-to object. ``Returns'' is in quotes because, as
you may have noticed in the examples, you're not restricted to fetching values via pointers: you can also
store values via pointers. In an assignment like
*ip = 0;
the subexpression *ip is conceptually ``replaced'' by the object which ip points to, and since *ip
appears on the left-hand side of the assignment operator, what happens to the pointed-to object is that it
gets assigned to.
One of the things that's hard about pointers is simply talking about what's going on. We've been using the
words ``return'' and ``replace'' in quotes, because they don't quite reflect what's actually going on, and
we've been using clumsy locutions like ``fetch via pointers'' and ``store via pointers.'' There is some
jargon for referring to pointer use; one word you'll often see is dereference, a term which, though its
derivation is suspect, is used to mean ``follow a pointer to get at, and use, the object it points to.'' Thus,
we sometimes call unary * the ``pointer dereferencing operator,'' and we may say that the expressions
and
*ip = 5;
both ``dereference the pointer ip.'' We may also talk about indirecting on a pointer: to indirect on a
pointer is again to follow it to see what it points to; and * may also be called the ``pointer indirection
operator.''
Our examples of pointers so far have been, admittedly, artificial and rather meaningless. Let's try a
slightly more realistic example. In the previous chapter, we used the routines atoi and atof to convert
strings representing numbers to the actual numbers represented. Often the strings were typed by the user,
and read with getline. As you may have noticed, neither atoi nor atof does any validity or error
checking: both simply stop reading when they reach a character that can't be part of the number they're
converting, and if there aren't any numeric characters in the string, they simply return 0. (For example,
atoi("49er") is 49, and atoi("three") is 0, and atof("1.2.3") is 1.2 .) These attributes
make atoi and atof easy to write and easy (for the programmer) to use, but they are not the most userfriendly routines possible. A good user interface would warn the user and prompt again in case of
invalid, non-numeric input.
Suppose we were writing a simple inventory-control system. For each part stored in our warehouse, we
might record the part number, location, and number of parts on hand. For simplicity, we'll assume that
the location is always a simple bin number.
Somewhere in the inventory-control program, we might find the variables
int part_number;
int location;
int number_on_hand;
and there might be a routine that lets the user enter any of these numbers. Suppose that there is another
variable,
int which_entry;
which indicates which of the three numbers is being entered (1 for part_number, 2 for location, or
3 for number_on_hand). We might have code like this:
char instring[30];
switch (which_entry) {
case 1:
printf("enter part number:\n");
getline(instring, 30);
part_number = atoi(instring);
break;
case 2:
printf("enter location:\n");
location = atoi(instring);
break;
case 3:
printf("enter number on hand:\n");
number_on_hand = atoi(instring);
break;
}
Suppose that we now begin to add a bit of rudimentary verification to the input routines. The first case
might look like
case 1:
do {
printf("enter part number:\n");
if(!isdigit(instring[0]))
continue;
part_number = atoi(instring);
} while (part_number == 0);
break;
If the first character is not a digit, or if atoi returns 0, the code goes around the loop another time, and
prompts the user again, in hopes that the user will type some proper numeric input this time. (The tests
for numeric input are not sufficient, nor even wise if 0 is a possible input value, as it presumably is for
number on hand. In fact, the two tests really do the same thing! But please overlook these faults. If you're
curious, you can learn about a new ANSI function, strtol, which is like atoi but gives you a bit
more control, and would be a better routine to use here.)
The code fragment above is for just one of the three input cases. The obvious way to perform the same
checking for the other two cases would be to repeat the same code two more times, changing the prompt
string and the name of the variable assigned to (location or number_on_hand instead of
part_number). Duplicating the code is a nuisance, though, especially if we later come up with a better
way to do input verification (perhaps one not suffering from the imperfections mentioned above). Is there
a better way?
One way would be to use a temporary variable in the input loop, and then set one of the three real
variables to the value of the temporary variable, depending on which_entry:
int temp;
do {
printf("enter the number:\n");
continue;
temp = atoi(instring);
} while (temp == 0);

case 1:
part_number = temp;
break;
case 2:
location = temp;
break;
case 3:
number_on_hand = temp;
break;
}
Another way, however, would be to use a pointer to keep track of which variable we're setting. (In this
example, we'll also get the prompt right.)
char instring[30];
int *numpointer;
char *prompt;
case 1:
numpointer = &part_number;
prompt = "part number";
break;
case 2:
numpointer = &location;
prompt = "location";
break;
case 3:
numpointer = &number_on_hand;
prompt = "number on hand";
break;
}
do {
printf("enter %s:\n", prompt);
continue;
*numpointer = atoi(instring);
} while (*numpointer == 0);
The idea here is that prompt is the prompt string and numpointer points to the particular numeric
value we're entering. That way, a single input verification loop can print any of the three prompts and set
any of the three numeric variables, depending on where numpointer points. (We won't officially see
character pointers and strings until section 5.5, so don't worry if the use of the prompt pointer seems
new or inexplicable.)
This example is, in its own ways, quite artificial. (In a real inventory-control program, we'd obviously
need to keep track of many parts; we couldn't use single variables for the part number, location, and
quantity. We probably wouldn't really have a which_entry variable telling us which number to
prompt for, and we'd do the numeric validation quite differently. We might well do numeric entry and
validation in a separate function, removing this need for the pointers.) However, the pointer aspect of this
example--using a pointer to refer to one of several different things, so that one generic piece of code can
access any of the things--is a very typical (i.e. realistic) use of pointers.
There's one nuance of pointer declarations which deserves mention. We've seen that
int *ip;
declares the variable ip as a pointer to an int. We might look at that declaration and imagine that int
* is the type and ip is the name of the variable being declared. (Actually, so far, these assumptions are
both true.) We might therefore imagine that a more `òbvious'' way of writing the declaration would be
int* ip;
This would work, but it is misleading, as we'll see if we try to declare two int pointers at once. How
shall we do it? If we try
int* ip1, ip2;
/* WRONG */
we don't succeed; this would declare ip1 as a pointer-to-int, but ip2 as an int (not a pointer). The
correct declaration for two pointers is
int *ip1, *ip2;

As the authors said in the middle of page 94, the intent of pointer (and in fact all) declarations is that they
give little miniature expressions indicating what type a certain use of the variables will have. The
declaration
int *ip1;
doesn't so much say that ip is a pointer-to-int; it says that *ip is an int. (To be sure, ip is a pointerto-int.) In the declaration
int *ip1, *ip2;
both *ip1 and *ip2 are ints; so ip1 and ip2 are both pointers-to-int. You'll hear this aspect of C
declarations referred to as ``declaration mimics use.'' If it bothers you, or if you think you might
accidentally write things like
int *ip1, ip2;
then to stay on the safe side you might want to get in the habit of writing declarations on separate lines:
int *ip1;
int *ip2;
I promised to point out the safe techniques for ensuring that pointers always point where they should.
The examples in this section, which have all involved pointers pointing to single variables, are relatively
safe; a single variable is not a very risky thing to point to, so code like the examples in this section is
relatively unlikely to go awry and result in invalid pointers. (One potential problem, though, which we'll
talk more about later, is that since local, `àutomatic'' variables are automatically deallocated when the
function containing them returns, any pointer to a local variable also becomes invalid. Therefore, a
function which returns a pointer must never return a pointer to one of its own local variables, and it
would also be invalid to take a pointer to a local variable and assign it to a global pointer variable.)


page 95
This section discusses a very common use of pointers: setting things up so that a function can modify
values in its caller, or return values, via its arguments. Remember that, normally, C passes arguments by
value, and that if a function modifies one of its arguments, it modifies only its local copy, not the value in
the caller. (Normally, this is a good thing; having a function which inadvertently assigns to its arguments
and hence inadvertently modifies a value in its caller can be a source of obscure bugs in languages which
don't use call-by-value.) However, what happens if a function wants to modify a value in its caller, and
its caller wants to let it? How can a function return two values? (A function's formal return value is
always a single value.)
The answer to both questions is that a function can declare a parameter which is a pointer. The caller then
passes a pointer to (that is, the `àddress of'') a variable in the caller which is to be modified or which is
to receive the second return value. In fact, we've seen examples of this already: getline returns the
length of the line it reads as well as the line itself, and the getop routine in the calculator example in
section 4.3 returned both a code for an operator and a string representing the full text of the operator.
(We needed that string when the operator was '0' indicating numeric input, so that the string could
return the full numeric input.) Though we didn't say so at the time, we were actually using pointers in
these examples. (We'll explore the relationship between arrays and pointers, which made this possible, in
section 5.3.)
With all of this in mind, make sure that you understand why the swap example on page 95 would not
work, and how and why the swap example on page 96 does work, and what the figure on page 96 shows.
The swap example demonstrated a function which modified some variables (a and b) in its caller. The
getint example demonstrates how to return two values from a function by returning one value as the
normal function return value and the other one by writing to a pointer. (There is no fundamental
difference, though, between ``modifying a variable in the caller'' and ``returning a value by writing to a
pointer''; these are just two applications of pointer parameters.)
The version of getint on page 97 is somewhat complicated because it allows free-form input, that is,
the values need only be separated by whitespace or punctuation; they are not restricted to being one per
line or anything. (C source code is also free-form in this regard; see page 4 of chapter 1 of these notes.)
To see more clearly the essence of what getint is supposed to do, imagine for a moment that the input
is restricted to one value per line, as in the ``primitive calculator'' example on page 72 of section 4.2. In
that case, we might use the following simpler (i.e. more primitive) code:
int getint(int *pn)
{
char line[20];
if (getline(line, 20) <= 0)
return EOF;
*pn = atoi(line);
return 1;
}
The getint function on page 97 is documented as returning nonzero for a valid number and 0 for
something other than a number. Our stripped-down version does not, and as it happens, the example code
at the bottom of page 96 does not make use of the valid/invalid distinction. Can you see a way to rewrite
the code at the bottom of page 96 to fill in the cells of the array with only valid numbers?
You might also notice, again from the code at the bottom of page 96, that & need not be restricted to
single, simple variables; it can take the address of any data object, in this case, one cell of the array.
Just as for all of C's other operators, & can be applied to arbitrary expressions, although it is restricted to
expressions which represent addressable objects. Expressions like &1 or &(2+3) are meaningless and
illegal.
You may remember a discussion from section 1.5.1 on page 16 of how C's getchar routine is able to
return all possible characters, plus an end-of-file indication, in its single return value. Why does getint
need two return values? Why can't it use the same trick that getchar does?
The examples in this section are again relatively safe. The pointers have all been parameters, and the
callers have passed pointers (that is, the `àddresses'' of) their own, properly-allocated variables. That is,
code like
int a = 1, b = 2;
swap(&a, &b);
and
int a;
getint(&a);
is correct and quite safe.
Something to beware of, though, is the temptation to inadvertently pass an uninitialized pointer variable
(rather than the `àddress'' of some other variable) to a routine which expects a pointer. We know that the
getint routine expects as its argument a pointer to an int in which it is to store the integer it gets.
Suppose we took that description literally, and wrote
int *ip;
/* a pointer to an int */
getint(ip);
Here we have in fact passed a pointer-to-int to getint, but the pointer we passed doesn't point
anywhere! When getint writes to (``dereferences'') the pointer, in an expression like *pn = 0, it will
scribble on some random part of memory, and the program may crash. When people get caught in this
trap, they often think that to fix it they need to use the & operator:
getint(&ip);
/* WRONG */
or maybe the * operator:

getint(*ip);
/* WRONG */
but these go from bad to worse. (If you think about them carefully, &ip is a pointer-to-pointer-to-int,
and *ip is an int, and neither of these types matches the pointer-to-int which getint expects.) The
correct usage for now, as we showed already, is something like
int a;
getint(&a);
In this case, a is an honest-to-goodness, allocated int, so when we generate a pointer to it (with &a) and
call getint, getint receives a pointer that does point somewhere.


page 97
For some people, section 5.3 is evidently the hardest section in this book, or even if they haven't read this
book, the most confusing aspect of the language. C introduces a novel and, it can be said, elegant
integration of pointers and arrays, but there are a distressing number of ways of misunderstanding arrays,
or pointers, or both. Take this section very slowly, learn the things it does say, and don't learn anything it
doesn't say (i.e. don't make any false assumptions).
It's not necessarily true that ``the pointer version will in general be faster''; efficiency is (or ought to be) a
secondary concern when considering the use of pointers.
page 98
On the top half of this page, we aren't seeing anything we haven't seen before. We already knew (or
should have known) that the declaration int a[10]; declares an array of ten contiguous int's
numbered from 0 to 9. We saw on page 94 and again on page 96 that & can be used to take the address of
one cell of an array.
What's new on this page are first the nice pictures (and they are nice pictures; I think they're the right
way of thinking about arrays and pointers in C) and the definition of pointer arithmetic. If the phrase
``then by definition pa+1 points to the next element'' alarms you; if you hadn't known that pa+1 points
to the next element; don't worry. You hadn't known this, and you aren't expected even to have suspected
it: the reason that pa+1 points to the next element is simply that it's defined that way, as the sentence
says. Furthermore, subtraction works in an exactly analogous way: If we were to say
pa = &a[5];
then *(pa-1) would refer to the contents of a[4], and *(pa-i) would refer to the contents of the
location i elements before cell 5 (as long as i <= 5).
Note furthermore that we do not have to worry about the size of the objects pointed to. Adding 1 to a
pointer (or subtracting 1) always means to move over one object of the type pointed to, to get to the next
element. (If you're too worried about machine addresses, or the actual address values stored in pointers,
or the actual sizes of things, it's easy to mistakenly assume that adding or subtracting 1 adds or subtracts
1 from the machine address, but as we mentioned, you don't have to think at this low level. We'll see in
section 5.4 how pointer arithmetic is actually scaled, automatically, by the size of the object pointed to,
but we don't have to worry about it if we don't want to.)
Deep sentence:
The meaning of `àdding 1 to a pointer,'' and by extension, all pointer arithmetic, is that
pa+1 points to the next object, and pa+i points to the i-th object beyond pa.
This aspect of pointers--that arithmetic works on them, and in this way--is one of several vital facts about
pointers in C. On the next page, we'll see the others.
page 99
Deep sentences:
The correspondence between indexing and pointer arithmetic is very close. By definition,
the value of a variable or expression of type array is the address of element zero of the
array.
This is a fundamental definition, which we'll now spend several pages discussing.
Don't worry too much yet about the assertion that ``pa and a have identical values.'' We're not surprised
about the value of pa after the assignment pa = &a[0]; we've been taking the address of array
elements for several pages now. What we don't know--we're not yet in a position to be surprised about it
or not--is what the ``value'' of the array a is. What is the value of the array a?
In some languages, the value of an array is the entire array. If an array appears on the right-hand sign of
an assignment, the entire array is assigned, and the left-hand side had better be an array, too. C does not
work this way; C never lets you manipulate entire arrays.
In C, by definition, the value of an array, when it appears in an expression, is a pointer to its first
element. In other words, the value of the array a simply is &a[0]. If this statement makes any kind of
intuitive sense to you at this point, that's great, but if it doesn't, please just take it on faith for a while.
This statement is a fundamental (in fact the fundamental) definition about arrays and pointers in C, and if
you don't remember it, or don't believe it, then pointers and arrays will never make proper sense. (You
will also need to know another bit of jargon: we often say that, when an array appears in an expression, it
decays into a pointer to its first element.)
Given the above definition, let's explore some of the consequences. First of all, though we've been saying
pa = &a[0];
we could also say
pa = a;
because by definition the value of a in an expression (i.e. as it sits there all alone on the right-hand side)
is &a[0]. Secondly, anywhere we've been using square brackets [] to subscript an array, we could also
have used the pointer dereferencing operator *. That is, instead of writing
i = a[5];
we could, if we wanted to, instead write
i = *(a+5);
Why would this possibly work? How could this possibly work? Let's look at the expression *(a+5) step
by step. It contains a reference to the array a, which is by definition a pointer to its first element. So
*(a+5) is equivalent to *(&a[0]+5). To make things clear, let's pretend that we'd assigned the
pointer to the first element to an actual pointer variable:
int *pa = &a[0];
Now we have *(a+5) is equivalent to *(&a[0]+5) is equivalent to *(pa+5). But we learned on
page 98 that *(pa+5) is simply the contents of the location 5 cells past where pa points to. Since pa
points to a[0], *(pa+5) is a[5]. Thus, for whatever it's worth, any time you have an array subscript
a[i], you could write it as *(a+i).
The idea of the previous paragraph isn't worth much, because if you've got an array a, indexing it using
the notation a[i] is considerably more natural and convenient than the alternate *(a+i). The
significant fact is that this little correspondence between the expressions a[i] and *(a+i) holds for
more than just arrays. If pa is a pointer, we can get at locations near it by using *(pa+i), as we learned
on page 98, but we can also use pa[i]. This time, using the `òther'' notation (array instead of pointer,
when we thought we had a pointer) can be more convenient.
At this point, you may be asking why you can write pa[i] instead of *(pa+i). You may be
wondering how you're going to remember that you can do this, or remember what it means if you see it
in someone else's code, when it's such a surprising fact in the first place. There are several ways to
remember it; pick whichever one suits you:
1. It's an arbitrary fact, true by definition; just memorize it.
2. If, for an array a, instead of writing a[i], you can also write *(a+i) (as we proved a few
paragraphs back); then it's only fair that for a pointer pa, instead of writing *(pa+i), you can
also write pa[i].
3. Deep sentence: `Ìn evaluating a[i], C converts it to *(a+i) immediately; the two forms are
equivalent.''
4. An array is a contiguous block of elements of a particular type. A pointer often points to a
contiguous block of elements of a particular type. Therefore, it's very handy to treat a pointer to a
contiguous block of elements as if it were an array, by saying things like pa[i].

5. [This is the most radical explanation, though it's also the most true; but if it offends your
sensibilities or only seems to make things more confusing, please ignore it.] When you said
a[i], you weren't really subscripting an array at all, because an array like a in an expression
always turns into a pointer to its first element. So the array subscripting operator [] always finds
itself working on pointers, and it's a simple identity (another definition) that pa[i] is *(pa+i).
(But do pick at least one reason to remember this fact, as it's a fact you'll need to remember; expressions
like pa[i] are quite common.)
The authors point out that ``There is one difference between an array name and a pointer that must be
kept in mind,'' and this is quite true, but note very carefully that there is in fact every difference between
an array and a pointer. When an array name appears in most expressions, it turns into a pointer (to the
array's first element), but that does not mean that the array is a pointer. You may hear it stated that `àn
array is just a constant pointer,'' and this is a convenient explanation, but it is a simplified and potentially
misleading explanation.
With that said, do make sure you understand why a=pa and a++ (where a is an array) cannot mean
anything.
Deep sentence:
When an array name is passed to a function, what is passed is the location of the initial
element.
Though perhaps surprising, this sentence doesn't say anything new. A function call, and more
importantly, each of its arguments, is an expression, and in an expression, a reference to an array is
always replaced by a pointer to its first element. So given
int a[10];
f(a);
it is not the entire array a that is passed to f but rather just a pointer to its first element. For an example
closer to the text on page 99, given
char string[] = "Hello, world!";
int len = strlen(string);
it is not the entire array string that is passed to strlen (recall that C never lets you do anything with
a string or an array all at once), but rather just a pointer to its first element.
We now realize that we've been operating under a gentle fiction during the first four chapters of the book.
Whenever we wrote a function like getline or getop which seemed to accept an array of characters,
and whenever we thought we were passing arrays of characters to these routines, we were actually
passing pointers. This explains, among other things, how getline and getop were able to modify the
arrays in the caller, even though we said that call-by-value meant that functions can't modify variables in
their callers since they receive copies of the parameters. When a function receives a pointer, it cannot
modify the original pointer in the caller, but it can definitely modify what the pointer points to.
If that doesn't make sense, make sure you appreciate the full difference between a pointer and what it
points to! It is intirely possible to modify one without modifying the other. Let's illustrate this with an
example. If we say
char a[] = "hello";
char b[] = "world";
we've declared two character arrays, a and b, each containing a string. If we say
char *p = a;
we've declared p as a pointer-to-char, and initialized it to point to the first character of the array a. If
we then say
*p = 'H';
we've modified what p points to. We have not modified p itself. After saying *p = 'H'; the string in
the array a has been modified to contain "Hello".
If we say
p = b;
on the other hand, we have modified the pointer p itself. We have not really modified what p points to.
In a sense, ``what p points to'' has changed--it used to be the string in the array a, and now it's the string
in the array b. But saying p = b didn't modify either of the strings.
page 100
Since, as we've just seen, functions never receive arrays as parameters, but instead always receive
pointers, how have we been able to get away with defining functions (like getline and getop) which
seemed to accept arrays? The answer is that whenever you declare an array parameter to a function, the
compiler pretends that you actually declared a pointer. (It does this mostly so that we can get away with
the ``gentle fiction'' of pretending that we can pass arrays to functions.)
When you see a statement like ``char s[]; and char *s; are equivalent'' (as in fact you see at the
top of page 100), you can be sure that (and you must remember that) it is only function formal parameters
that are being talked about. Anywhere else, arrays and pointers are quite different, as we've discussed.
Expressions like p[-1] (at the end of section 5.3) may be easier to understand if we convert them back
to the pointer form *(p + -1) and thence to *(p-1) which, as we've seen, is the object one before
what p points to.
With the examples in this section, we begin to see how pointer manipulations can go awry. In sections
5.1 and 5.2, most of our pointers were to simple variables. When we use pointers into arrays, and when
we begin using pointer arithmetic to access nearby cells of the array, we must be careful never to go off
the end of the array, in either direction. A pointer is only valid if it points to one of the allocated cells of
an array. (There is also an exception for a pointer just past the end of an array, which we'll talk about
later.) Given the declarations
int a[10];
int *pa;
the statements
pa = a;
*pa = 0;
*(pa+1) = 1;
pa[2] = 2;
pa = &a[5];
*pa = 5;
*(pa-1) = 4;
pa[1] = 6;
pa = &a[9];
*pa = 9;
pa[-1] = 8;
are all valid. These statements set the pointer pa pointing to various cells of the array a, and modify
some of those cells by indirecting on the pointer pa. (As an exercise, verify that each cell of a that
receives a value receives the value of its own index. For example, a[6] is set to 6.)
However, the statements
pa = a;
*(pa+10) = 0;
*(pa-1) = 0;
pa = &a[5];
/* WRONG */
/* WRONG */
*(pa+10) = 0;
pa = &a[10];
*pa = 0;
/* WRONG */
/* WRONG */
and
int *pa2;
pa = &a[5];
pa2 = pa + 10;
pa2 = pa - 10;
/* WRONG */
/* WRONG */
are all invalid. The first examples set pa to point into the array a but then use overly-large offsets (+10, 1) which end up trying to store a value outside of the array a. The statements in the last set of examples
set pa2 to point outside of the array a. Even though no attempt is made to access the nonexistent cells,
these statements are illegal, too. Finally, the code
int a[10];
int *pa, *pa2;
pa = &a[5];
pa2 = pa + 10;
*pa2 = 0;
/* WRONG */
/* WRONG */
would be very wrong, because it not only computes a pointer to the nonexistent 15<sup>th</sup> cell
of a 10-element array, but it also tries to store something there.


This section is going to get pretty hairy. Some of it talks about things we've already seen (adding integers
to pointers); some of it talks about things we need to learn (comparing and subtracting pointers); and
some of it talks about a rather sophisticated example (a storage allocator). Don't worry if you can't follow
all the details of the storage allocator, but do read along so that you can pick up the other new points. (In
other words, make sure you read from ``Zero is the sole exception'' in the middle of page 102 to ``that is,
the string length'' on page 103, and also the last paragraph on page 103.)
What is a storage allocator for? So far, we've used pointers to point to existing variables and arrays,
which the compiler allocated for us. But eventually, we may want to allocate data structures (arrays, and
others we haven't seen yet) of a size which we don't know at compile time. Earlier, we spoke briefly
about a hypothetical inventory-management system, which recorded information about each part stored
in a warehouse. How many different parts could there be? If we used fixed-size arrays, there would be a
fixed upper limit on the number of parts we could enter into the system, and we'd be annoyed if that limit
were reached. A better solution is not to allocate a fixed array at compile time, but rather to use a runtime storage allocator to allocate memory for the data structures used to describe each part. That way, the
number of parts which the system can hold is limited only by available memory, not on any static limit
built into the program. Using a storage allocator to allocate memory at run time in this way is called
dynamic allocation.
However, dynamic memory allocation is where C programming can really get tricky, because you the
programmer are responsible for most aspects of it, and there are plenty of things you can do wrong (e.g.
not allocate quite enough memory, accidentally keep using it after you deallocate it, have random invalid
pointers pointing everywhere, etc.). Therefore, we won't be talking about dynamic allocation for a while,
which is why you can skim over the storage allocator in this section for now.
page 102
The first new piece of information in this section (which you'll need to remember even if you're not
following the details of the storage allocator example) is the introduction of the ``null pointer.''
So far, all of our pointers have pointed somewhere, and we've cautioned about pointers which don't. To
help us distinguish between pointers which point somewhere and pointers which don't, there is a single,
special pointer value we can use, which is guaranteed not to point anywhere. When a pointer doesn't
point anywhere, we can set it to this value, to make explicit the fact that it doesn't point anywhere.
This special pointer value is called the null pointer. The way to set a pointer to this value is to use a
constant 0:
int *ip = 0;
The 0 is just a shorthand; it does not necessarily mean machine address 0. To make it clear that we're
talking about the null pointer and not the integer 0, we often use a macro definition like
#define NULL 0
so that we can say things like
int *ip = NULL;
(If you've used Pascal or LISP, the nil pointer in those languages is analogous.)
In fact, the above #definition of NULL has been placed in the standard header file <stdio.h> for us
(and in several other standard header files as well), so we don't even need to #define it. I agree
completely with the authors that using NULL instead of 0 makes it more clear that we're talking about a
null pointer, so I'll always be using NULL, too.
Just as we can set a pointer to NULL, we can also test a pointer to see if it's NULL. The code
if(p != NULL)
*p = 0;
else
printf("p doesn't point anywhere\n");
tests p to see if it's non-NULL. If it's not NULL, it assumes that it points somewhere valid, and writes a 0
there. Otherwise (i.e. if p is the null pointer) the code complains.
Though we can use null pointers as markers to remind ourselves of which of our pointers don't point
anywhere, it's up to us to do so. It is not guaranteed that all uninitialized pointer variables (which
obviously don't point anywhere) are initialized to NULL, so if we want to use the null pointer convention
to remind ourselves, we'd best explicitly initialize all unused pointers to NULL. Furthermore, there is no
general mechanism that automatically checks whether a pointer is non-null before we use it. If we think
that a pointer might not point anywhere, and if we're using the convention that pointers that don't point
anywhere are set to NULL, it's up to us to compare the pointer to NULL to decide whether it's safe to use
it.
The next new piece of information in this section (which we've already alluded to) is pointer comparison.
You can compare two pointers for equality or inequality (== or !=): they're equal if they point to the
same place or are both null pointers; they're unequal if they point to different places, or if one points
somewhere and one is a null pointer. If two pointers point into the same array, the relational comparisons
<, <=, >, and >= can also be used.
page 103
The sentences
...n is scaled according to the size of the objects p points to, which is determined by the
declaration of p. If an int is four bytes, for example, the int will be scaled by four.
say something we've seen already, but may only confuse the issue. We've said informally that in the code
int a[10];
int *pa = &a[0];
*(pa+1) = 1;
pa contains the `àddress'' of the int object a[0], but we've discouraged thinking about this address as
an actual machine memory address. We've said that the expression pa+1 moves to the next int in the
array (in this case, a[1]). Thinking at this abstract level, we don't even need to worry about any
``scaling by the size of the objects pointed to.''
If we do look at a lower, machine level of addressing, we may learn that an int occupies some number
of bytes (usually two or four), such that when we add 1 to a pointer-to-int, the machine address is
actually increased by 2 or 4. If you like to consider the situation from this angle, you're welcome to, but
if you don't, you certainly don't have to. If you do start thinking about machine addresses and sizes, make
extra sure that you remember that C does do the necessary scaling for you. Don't write something like
int a[10];
int *pa = &a[0];
*(pa+sizeof(int)) = 1;
where sizeof(int) is the size of an int in bytes, and expect it to access a[1].
Since adding an int to a pointer gives us another pointer:
int a[10];
int *pa1 = &a[0];
int *pa2 = pa1 + 5;
we might wonder if we can rearrange the expression
pa2 = pa1 + 5
to get
pa2 - pa1
(where this is no longer a C assignment, we're just wondering if we can subtract pa1 from pa2, and
what the result might be). The answer is yes: just as you can compare two pointers which point into the
same array, you can subtract them, and the result is, naturally enough, the distance between them, in cells
or elements.
(In the large parenthetical statement in the middle of the page, don't worry too much about ptrdiff_t,
size_t, and sizeof.)


page 104
Since text strings are represented in C by arrays of characters, and since arrays are very often
manipulated via pointers, character pointers are probably the most common pointers in C.
Deep sentence:
C does not provide any operators for processing an entire string of characters as a unit.
We've said this sort of thing before, and it's a general statement which is true of all arrays. Make sure you
understand that in the lines
char *pmessage;
pmessage = "now is the time";
pmessage = "hello, world";
all we're doing is assigning two pointers, not copying two entire strings.
At the bottom of the page is a very important picture. We've said that pointers and arrays are different,
and here's another illustration. Make sure you appreciate the significance of this picture: it's probably the
most basic illustration of how arrays and pointers are implemented in C.
We also need to understand the two different ways that string literals like "now is the time" are
used in C. In the definition
char amessage[] = "now is the time";
the string literal is used as the initializer for the array amessage. amessage is here an array of 16
characters, which we may later overwrite with other characters if we wish. The string literal merely sets
the initial contents of the array. In the definition
char *pmessage = "now is the time";
on the other hand, the string literal is used to create a little block of characters somewhere in memory
which the pointer pmessage is initialized to point to. We may reassign pmessage to point somewhere
else, but as long as it points to the string literal, we can't modify the characters it points to.
As an example of what we can and can't do, given the lines
char amessage[] = "now is the time";

we could say
amessage[0] = 'N';
to make amessage say "Now is the time". But if we tried to do
pmessage[0] = 'N';
(which, as you may recall, is equivalent to *pmessage = 'N'), it would not necessarily work; we're
not allowed to modify that string. (One reason is that the compiler might have placed the ``little block of
characters'' in read-only memory. Another reason is that if we had written
char *qmessage = "now is the time";
the compiler might have used the same little block of memory to initialize both pointers, and we wouldn't
want a change to one to alter the other.)
Deep sentence:
The first function is strcpy(s,t), which copies the string t to the string s. It would be
nice just to say s=t but this copies the pointer, not the characters.
This is a restatement of what we said above, and a reminder of why we'll need a function, strcpy, to
copy whole strings.
page 105
Once again, these code fragments are being written in a rather compressed way. To make it easier to see
what's going on, here are alternate versions of strcpy, which don't bury the assignment in the loop test.
First we'll use array notation:
void strcpy(char s[], char t[])
{
int i;
for(i = 0; t[i] != '\0'; i++)
s[i] = t[i];
s[i] = '\0';
}
Note that we have to manually append the '\0' to s after the loop. Note that in doing so we depend
upon i retaining its final value after the loop, but this is guaranteed in C, as we learned in Chapter 3.
Here is a similar function, using pointer notation:
void strcpy(char *s, char *t)
{
while(*t != '\0')
*s++ = *t++;
*s = '\0';
}
Again, we have to manually append the '\0'. Yet another option might be to use a do/while loop.
All of these versions of strcpy are quite similar to the copy function we saw on page 29 in section
1.9.
page 106
The version of strcpy at the top of this page is my least favorite example in the whole book. Yes,
many experienced C programmers would write strcpy this way, and yes, you'll eventually need to be
able to read and decipher code like this, but my own recommendation against this kind of cryptic code is
strong enough that I'd rather not show this example yet, if at all.
We need strcmp for about the same reason we need strcpy. Just as we cannot assign one string to
another using =, we cannot compare two strings using ==. (If we try to use ==, all we'll compare is the
two pointers. If the pointers are equal, they point to the same place, so they certainly point to the same
string, but if we have two strings in two different parts of memory, pointers to them will always compare
different even if the strings pointed to contain identical sequences of characters.)
Note that strcmp returns a positive number if s is greater than t, a negative number if s is less than t,
and zero if s compares equal to t. ``Greater than'' and ``less than'' are interpreted based on the relative
values of the characters in the machine's character set. This means that 'a' < 'b', but (in the ASCII
character set, at least) it also means that 'B' < 'a'. (In other words, capital letters will sort before
lower-case letters.) The positive or negative number which strcmp returns is, in this implementation at
least, actually the difference between the values of the first two characters that differ.
Note that strcmp returns 0 when the strings are equal. Therefore, the condition
if(strcmp(a, b))
do something...
doesn't do what you probably think it does. Remember that C considers zero to be ``false'' and nonzero to
be ``true,'' so this code does something if the strings a and b are unequal. If you want to do something if
two strings are equal, use code like
if(strcmp(a, b) == 0)
do something...
(There's nothing fancy going on here: strcmp returns 0 when the two strings are equal, so that's what
we explicitly test for.)
To continue our ongoing discussion of which pointer manipulations are safe and which are risky or must
be done with care, let's consider character pointers. As we've mentioned, one thing to beware of is that a
pointer derived from a string literal, as in
is usable but not writable (that is, the characters pointed to are not writable.) Another thing to be careful
of is that any time you copy strings, using strcpy or some other method, you must be sure that the
destination string is a writable array with enough space for the string you're writing. Remember, too, that
the space you need is the number of characters in the string you're copying, plus one for the terminating
'\0'.
For the above reasons, all three of these examples are incorrect:
char *p1 = "Hello, world!";
char *p2;
strcpy(p2, p1);
/* WRONG */
char *p = "Hello, world!";

char a[13];
strcpy(a, p);
/* WRONG */

char *p4 = "A string to overwrite";
strcpy(p4, p3);
/* WRONG */
In the first example, p2 doesn't point anywhere. In the second example, a is a writable array, but it
doesn't have room for the terminating '\0'. In the third example, p4 points to memory which we're not
allowed to overwrite. A correct example would be
char *p = "Hello, world!";
char a[14];
strcpy(a, p);
(Another option would be to obtain some memory for the string copy, i.e. the destination for strcpy,
using dynamic memory allocation, but we're not talking about that yet.)
page 106 continued (bottom)
Expressions like *p++ and *--p may seem cryptic at first sight, but they're actually analogous to array
subscript expressions like a[i++] and a[--i], some of which we were using back on page 47 in
section 2.8.


page 107
Deep sentence:
Since pointers are variables themselves, they can be stored in arrays just as other variables
can.
This is just one aspect of the generality of C's data types, which we'll be seeing in the next few sections.
We've used a recursive definition of `èxpression'': a constant or variable is an expression, an expression
in parentheses is an expression, an expression plus an expression is an expression, etc. There are
obviously an infinite number of expressions, of arbitrary complexity. In exactly the same way, there are
an infinite number of data types in C. We've already seen the basic data types: int, char, double, etc.
But then we have the derived data types such as array-of-char and pointer-to-int and functionreturning-double. So we can say that for any type, array-of-type is another type, and pointer-to-type is
another type, and function-returning-type is another type. Once we've said that, we can see that there is
also the possibility of arrays of pointers, and arrays of arrays, and functions returning pointers, and even
(in section 5.11, though this is a deeper topic) pointers to functions. (The only possibilities that C doesn't
support are functions returning arrays, and arrays of functions, and functions returning functions.)
Make sure you understand why an integer is something that can be ``compared or moved in a single
operation,'' but that a string (that is, an array of char) is not. Then, realize that a pointer is also
something that can be ``compared or moved in a single operation.'' (Actually, though, the string
comparisons we'll be doing are not single operations.) From time to time you'll hear me caution you not
to worry too much about certain aspects of efficiency. Here, it's true that the overhead of copying entire
strings from one place to another, a character at a time (which is the overhead we'll be getting around by
manipulating pointers instead) can be significant, but that's not the only concern: once we're comfortable
with the idea, manipulating pointers will be somewhat easier on us, too. (Copying lots of characters
around is a nuisance, and it can also be dangerous, if the destination isn't big enough or isn't in the right
place.)
Don't worry about the `òne long character array'' that the ``lines to be sorted are stored end-to-end in.''
Instead, look at the picture at the bottom of page 107, which shows the pointers that might be set up after
reading the lines
defghi
jklmnopqrst
abc
On the left are the pointers before sorting, and on the right are the pointers after sorting. The three strings
have not been moved, but by reshuffling the pointers, the three pointers in order now point to the lines
abc
defghi
jklmnopqrst
page 108
Once again, we see a nice simple decomposition of the problem, which might seem deceptively simple
except that when problems are decomposed in simple ways like this, and then implemented faithfully,
they really can be this simple. Deferring the sorting step is an excellent idea, especially if we didn't quite
follow the details of the sorting functions in the previous chapter. (Actually, in practice, we can usually
defer the sorting step forever, since there's often a general-purpose sort routine provided for us
somewhere. C is no exception: a qsort function is a required part of its standard library. For the most
part, the only people who have to write sort routines are programming students and the few people who
get stuck implementing system functions.)
The main program at the bottom of page 108 looks a bit more elaborate than the pseudocode at the top
of the page, but the essence of the program is the three calls to readlines, qsort, and
writelines. Everything else is declarations, plus an error message which is printed if readlines is
for some reason not able to read the input. Eventually, you should be able to understand why all of the
various declarations are required, but you can skim over them at first.
page 109
The readlines function first calls our old friend getline to read each line into a local array, line.
On page 29 in section 1.9, we saw a program for finding the longest line in the input: it read each line
into a local array line, and kept a copy of the longest line in a second array longest. In that program,
it didn't matter that the input array line was continually overwritten with each new input line, and that
most lines (except the longest one) were lost and forgotten. Here, however, we do need to save all of the
input lines somewhere, so that we can sort them and print them later.
The lines are saved by calling alloc, a function which we wrote in section 5.4 but may have skimmed
over. alloc allocates n bytes of new memory for something which we need to save. Each time we read
another line, we call alloc to allocate some new memory to store it, then call strcpy to copy the line
from the line array to the newly allocated memory. This way, it's okay that the next line is read into the
same line array; we save each line, as it's read, into its own little alloc'ed piece of memory.
Note that memory allocated with a routine such as alloc persists, just as global and static variables
do; it does not disappear when the function that allocated it returns.
Hopefully you're getting used to reading compressed condition statements by now, because here's
another doozy:
if (nlines >= maxlines || (p == alloc(len)) == NULL)
This line checks to make sure we have enough room to store the new line we just read. We need two
things: (1) a slot in the lineptr array to store the pointer, and (2) space allocated by alloc to store
the line itself. If we don't have either of these things, we return -1, indicating that we ran out of memory.
We don't have a slot in the lineptr array if we've already read maxlines lines, and we don't have
room to store the line itself if alloc returns NULL. The subexpression (p = alloc(len)) ==
NULL is equivalent in form to to other assign-and-test combinations we've been using involving
getchar and getline: it assigns alloc's return value to p, then compares it to NULL.
Normally, we might be suspicious of the call alloc(len). Why? Remember that strings are always
terminated by '\0', so the space required to store a string is always one more than the the number of
characters in it. Normally, we'll call things like alloc(len + 1), and accidentally calling
alloc(len) is usually a bug. Here, it happens to be okay, because before we copy the line to the
newly-allocated memory, we strip the newline '\n' from the end of it, by overwriting it with '\0',
hence making the string one shorter than len. (Why is the last character in line, namely the '\n', at
line[len-1], and not line[len]?)
The fragments
if (nlines >= maxlines ...
and
lineptr[nlines++] = p;
deserve some attention. These represent a common way of filling in an array in C. nlines always holds
the number of lines we've read so far (it's another invariant). It starts out as 0 (we haven't read any lines
yet) and it ends up as the total number of lines we've read. Each time we read a new line, we store the
line (more precisely, a pointer to it) in lineptr[nlines++]. By using postfix ++, we store the
pointer in the slot indexed by the previous value of nlines, which is what we want, because arrays are
0-based in C. The first time through the loop, nlines is 0, so we store a pointer to the first line in
lineptr[0], and then increment nlines to 1. If nlines ever becomes equal to maxlines, we've
filled in all the slots of the array, and we can't use any more (even though, at that point, the highest-filled
cell in the array is lineptr[maxlines-1], which is the last cell in the array, again because arrays
are 0-based). We test for this condition by checking nlines >= maxlines, as a little measure of
paranoia. The test nlines == maxlines would also work, but if we ever accidentally introduce a
bug into the program such that we fill past the last slot without noticing it, we wouldn't want to keep on
filling farther and farther past the end.
Deep sentences:
...lineptr is an array of MAXLINES elements, each element of which is a pointer to a

char. That is, lineptr[i] is a character pointer...
We can see that lineptr[i] has to be a character pointer, by looking at two things: in the function
readlines, the line
lineptr[lines++] = p;
has a character pointer on the right-hand side, and the only thing we can assign a character pointer to is
another character pointer. Also, in the function writelines, in the line
printf("%s\n", lineptr[i]);
printf's %s format expects a pointer to a character, so that's what lineptr[i] had better be.
Note that writelines prints a newline after each line, since newlines were stripped out of the input
lines by readlines.
Don't worry too much about the discussion at the bottom of page 109. We saw in section 5.3 that due to
the ``strong relationship'' between pointers and arrays, it is always possible to manipulate an array using
pointer-like notation, and to manipulate a pointer using array-like notation. Since lineptr is an array,
it is possible to manipulate it using pointer-like notation, but since what it's an array of is other pointers,
it can start to get a bit confusing. Though many programmers do write things like
printf("%s\n", *lineptr++);
and though this is correct code, and though one should probably understand it to have a 100% complete
understanding of C, I've decided that code like that is just a bit too hard to follow, and I'd always write
(perhaps more pedestrian and mundane) things like
printf("%s\n", lineptr[i]);
or
printf("%s\n", lineptr[i++]);
page 110
Since I didn't ask you to follow the qsort example in section 4.10 in complete detail, I won't ask you to
work through this one completely, either. But if you compare the code here to the code on pages 87-88,
you will see that the only significant differences are that the variables and arrays containing the things
being sorted have been changed from int to char * (pointer-to-char), and the comparison
if (v[i] < v[left])

has been changed to
if (strcmp(v[i], v[left]) < 0)


page 111
The month_day function is another example of a function which simulates having multiple return values by
using pointer parameters. month_day is declared as void, so it has no formal return value, but two of its
parameters, pmonth and pday, are pointers, and it fills in the locations pointed to by these two pointers with
the two values it wants to ``return.'' One line of the definition of month_day on page 111 is cut off in all
printings I have seen: it should read
void month_day(int year, int yearday, int *pmonth, int *pday)
As we've said, although any nonzero value is considered ``true'' in C, the built-in relational and Boolean
operators always ``return'' 0 or 1. Therefore, the line
int leap = year%4 == 0 && year%100 != 0 || year%400 == 0;
sets leap to 1 or 0 (``true'' or ``false'') depending on the condition
year%4 == 0 && year%100 != 0 || year%400 == 0
which is the condition for leap years in the Gregorian calendar. (It's a little-known fact that century years are not
leap years unless they are also divisible by 400. Thus, 2000 will be a leap year.) The 1/0 value that leap
receives is what the authors are referring to when they say that ``the arithmetic value of a logical expression...
can be used as a subscript of the array daytab.'' This line could also have been written
int leap;
if (year%4 == 0 && year%100 != 0 || year%400 == 0)
leap = 1;
else
leap = 0;
or
int leap = (year%4 == 0 && year%100 != 0 || year%400 == 0) ? 1 : 0;
page 112
The daytab array holds small integers (in the range 0-31), so it can legally be made an array of char, though
whether this is a legitimate use is a question of style.
Deep sentence:
In C, a two-dimensional array is really a one-dimensional array, each of whose elements is an

array.
Earlier we said that `àrray-of-type is another type,'' and here we must believe it: since array-of-type is a type,
array-of-(array-of-type) is yet another type.
The statement that `Èlements are stored by rows, so the rightmost subscript, or column, varies fastest as
elements are accessed in storage order'' probably won't make much sense unless you've done a lot of work with
other languages, such as FORTRAN, which do have true multi-dimensional arrays. It's pretty arbitrary what you
call a ``row'' and what you call a ``column''; the most important thing to know is which subscript goes with
which dimension. If you have
int a[10][20];
then in the reference a[i][j], i can range from 0 to 9 and j can range from 0 to 19. In other words, you
might write
for (i = 0; i < 10; i++)
for (j = 0; j < 20; j++)
do something with a[i][j]
We also want to know what a actually is. Is it an array of 10 arrays, each of size 20, or is it an array of 20
arrays, each of size 10? There are other ways of convincing ourselves of the answer, but for now let's just say
that the ``closer'' dimensions are closer to what a is. Therefore, a is first an array of size 10, and what it's an
array of is arrays of 20 ints. This also tells us that if we ever refer to a[i] (without a second subscript), then
we're referring to just one of those 10 arrays (of size 20) in its entirety.
When we look back at the initialization of the daytab array on page 111, everything lines up. daytab is
defined as
char daytab[2][13]
and we can see from the initializer that there are two (sub)arrays, each of size 13. (We can also see that there is
some justification for saying that the first subscript refers to ``rows'' and the second to ``columns.'')
The authors illustrate one way of dealing with C's 0-based arrays when you have an algorithm that really wants
to treat an array as if it were 1-based. Here, rather than remembering to subtract one from the 1-based month
number each time, they chose to waste a ``column'' of the array, and declare it one larger than necessary, so that
they could refer to subscripts from [1] to [12].
One last note about the initialization of daytab: you may have seen code in other programming books that
kept an array of the cumulative days of all the months:
{0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365}
Precomputing an array like that might make things a tiny bit easier on the computer (it wouldn't have to loop
through the entire array each time, as it does in the day_of_year function), but it makes it considerably
harder to see what the numbers mean, and to verify that they are correct. The simple table of individual month
lengths is much clearer, and if the computer has to do a bit more grunge work, well, that's what computers are
for. As explained in another book co-authored by Brian Kernighan:
A cumulative table of days must be calculated by someone and checked by someone else. Since
few people are familiar with the number of days up to the end of a particular month, neither
writing nor checking is easy. But if instead we use a table of days per month, we can let the
computer count them for us. (``Let the machine do the dirty work.'')
The bottom of page 112 begins to get confusing. The ``number of rows'' of an array like daytab `ìs
irrelevant'' when passed to a function such as the hypothetical f because the compiler doesn't need to know the
number of rows when calculating subscripts. It does need to know the number of columns or ``width,'' because
that's how it knows that the second element on the second row of a 10-column array is actually 12 cells past the
beginning of the array, which is essentially what it needs to know when it goes off and actually accesses the
array in memory. But it doesn't need to know how long the overall array is, as long as we promise not to run off
the end of it, and that's always up to us. (This is why we haven't specified the array sizes in the definitions of
functions such as getline on pages 29 and 69, or atoi on pages 43, 61, and 73, or readlines on page
109, although we did carry the array size as a separate argument to getline and readlines, to assist us in
our promise not to run off the end.)
The third version of f on page 112 comes about because of the ``gentle fiction'' involving array parameters. We
learned on page 99 that functions don't really receive arrays as parameters; they receive arrays (since any array
passed by the caller decayed immediately to a pointer). On page 39 we wrote a strlen function as
int strlen(char s[])
but on page 99 we rewrote it as
int strlen(char *s)
which is closer to the way the compiler sees the situation. (In fact, when we write int strlen(char
s[]), the compiler essentially rewrites it as int strlen(char *s) for us.) In the same way, a function
declared as
f(int daytab[][13])
can be rewritten by us (or if not, is rewritten by the compiler) to
f(int (*daytab)[13])
which declares the daytab parameter as a pointer-to-array-of-13-ints. Here we see two things: (1) the rewrite
which changes an array parameter to a pointer parameter happens only once (we end up with a pointer to an
array, not a pointer to a pointer), and (2) the syntax for pointers to arrays is a bit messy, because of some
required extra parentheses, as explained in the text.
If this seems obscure, don't worry about it too much; just declare functions with array parameters matching the
arrays you call them with, like
f(int daytab[2][13])
and let the compiler worry about the rewriting.
Deep sentence:
More generally, only the first dimension (subscript) of an array is free; all the others have to be
specified.
This just says what we said already: when declaring an array as a function parameter, you can leave off the first
dimension because it is the overall length and not knowing it causes no immediate problems (unless you
accidentally go off the end). But the compiler always needs to know the other dimensions, so that it knows how
the rown and columns line up.


page 113
This section is short and sweet, and there are only two things I feel the need to comment on. The
sentence ``The characters of the i-th string are placed somewhere'' simply refers to the fact that string
literals always work that way (except when they're used as array initializers, as explained on page 104).
We don't really care where the characters are, as long as we can keep hold of a pointer to them.
The other thing to notice is that the month_name function does verify that its argument is valid. If it
didn't check n against the boundary values 1 and 12, what would happen if we called
month_name(123)?


Actually, some people (and not just newcomers) are sometimes confused about the difference between a
one-dimensional array and a single pointer, too; moving to two-dimensional arrays, arrays of pointers,
and pointers to pointers only makes things worse. (But don't lose heart: if you pay attention and keep
your head screwed on straight, you should be able to keep the differences clearly in mind.)
The adjective ``syntactically'' in the paragraph at the bottom of the page is significant: after saying
int *b[10];
an immediate reference to b[3][4] would not be completely legal. It wouldn't be a syntax error or
anything, but when the compiler tried to fetch the third pointer and then the fourth integer pointed to, it
would go off into deep space, because there isn't a third pointer yet and it doesn't point anywhere.
You might want to draw a picture of the data structures that would result ``[a]ssuming that each element
of b does point to a twenty-element array,'' and verify that there are ``200 ints set aside, plus ten cells
for the pointers.'' (The picture will be similar to the one on the next page.)
Actually, I'm not sure if having rows of different lengths is the only important advantage of using a
pointer array. Another is that the size of the arrays (as we'll see later) can be decided at run-time; another
is that the pointers make certain manipulations easier (such as the sorting example we worked through in
section 5.6).
page 114
Do study the pictures on this page carefully, and make sure you understand the representations of the
name and aname arrays and how they differ. (You might want to refer back to the similar discussion of
pmessage and amessage on page 104 in section 5.5.)


page 115
The picture at the top of page 115 doesn't quite match the declaration
char *argv[]
it's actually a picture of the situation declared by
char **argv
which is what main actually receives. (The array parameter declaration char *argv[] is rewritten by
the compiler to char **argv, in accordance with the discussion in sections 5.3 and 5.8.) Also, the ``0''
at the bottom of the array is just a representation of the null pointer which conventionally terminates the
argv array. (Normally, you'll never encounter the terminating null pointer, because if you think of
argv as an array of size argc, you'll never access beyond argv[argc-1].)
The loop
for (i = 1; i < argc; i++)
looks different from most loops we see in C (which either start at 0 and use <, or start at 1 and use <=).
The reason is that we're skipping argv[0], which contains the name of the program.
The expression
printf("%s%s", argv[i], (i < argc-1) ? " " : "");
is a little nicety to print a space after each word (to separate it from the next word) but not after the last
word. (The nicety is just that the code doesn't print an extra space at the end of the line.) It would also be
possible to fold in the following printf of the newline:
printf("%s%s", argv[i], (i < argc-1) ? " " : "\n");
As I mentioned in comment on the bottom of page 109, it's not necessary to write pointer-incrementing
code like
while(--argc > 0)
printf("%s%s", *++argv, (argc > 1) ? " " : "");
if you don't feel comfortable with it. I used to try write code like this, because it seemed to be what
everybody else did, but it never sat well, and it was always just a bit too hard to write and to prove
correct. I've reverted to simple, obvious loops like
int argi;
char *sep = "";
for (argi = 1; argi < argc; argi++) {
printf("%s%s", sep, argv[argi]);
sep = " ";
}
printf("\n");
Often, it's handy to have the original argc and argv around later, anyway. (This loop also shows
another way of handling space separators.)
page 116
Page 116 shows a simple improvement on the matching-lines program first presented on page 69; page
117 adds a few more improvements. The differences between page 69 and page 116 are that the pattern is
read from the command line, and strstr is used instead of strindex. The difference between page
116 and page 117 is the handling of the -n and -x options. (The next obvious improvement, which we're
not quite in a position to make yet, is to allow a file name to be specified on the command line, rather
than always reading from the standard input.)
page 117
Several aspects of this code deserve note.
The line
while (c = *++argv[0])
is not in error. (In isolation, it might look like an example of the classic error of accidentally writing =
instead of == in a comparison.) What it's actually doing is another version of a combined set-and-test: it
assigns the next character pointed to by argv[0] to c, and compares it against '\0'. You can't see the
comparison against '\0', because it's implicit in the usual interpretation of a nonzero expression as
``true.'' An explicit test would look like this:
while ((c = *++argv[0]) != '\0')
argv[0] is a pointer to a character in a string; ++argv[0] increments that pointer to point to the next
character in the string; and *++argv[0] increments the pointer while returning the next character
pointed to. argv[0] is not the first string on the command line, but rather whichever one we're looking
at now, since elsewhere in the loop we increment argv itself.
Some of the extra complexity in this loop is to make sure that it can handle both
-x -n
and
-xn
In pseudocode, the option-parsing loop is
for ( each word on the command line )
if ( it begins with '-' )
for ( each character c in that word )
switch ( c )
...
For comparison, here is another way of writing effectively the same loop:
int argi;
char *p;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (p = &argv[argi][1]; *p != '\0'; p++)
switch (*p) {
case 'x':
...
This uses array notation to access the words on the command line, but pointer notation to access the
characters within a word (more specifically, a word that begins with '-'). We could also use array
notation for both:
int argi, chari;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (chari = 1; argv[argi][chari] != '\0'; chari++)
switch (argv[argi][chari]) {
case 'x':
...
In either case, the inner, character loop starts at the second character (index [1]), not the first, because
the first character (index [0]) is the '-'.
It's easy to see how the -n option is implemented. If -n is seen, the number flag is set to 1 (a.k.a.
``true''), and later, in the line-matching loop, each time a line is printed, if the number flag is true, the
line number is printed first. It's harder to see how -x works. An except flag is set to 1 if -x is present,
but how is except used? It's buried down there in the line
if ((strstr(line, *argv) != NULL) != except)
What does that mean? The subexpression
(strstr(line, *argv) != NULL)
is 1 if the line contains the pattern, and 0 if it does not. except is 0 if we should print matching lines,
and 1 if we should print non-matching lines. What we've actually implemented here is an `èxclusive
OR,'' which is `ìf A or B but not both.'' Other ways of writing this would be
int matched = (strstr(line, *argv) != NULL);
if (matched && !except || !matched && except) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
if (except ? !matched : matched) {
if (number)
printf("%s", line);
found++;
}
or

if (!except) {
if (matched) {
if (number)
printf("%s", line);
found++;
}
}
else {
if (!matched) {
if (number)
printf("%s", line);
found++;
}
}
There's clearly a tradeoff: the last version is in some sense the most clear (and the most verbose), but it
ends up repeating the line-number printing and any other processing which must be done for found lines.
Therefore, the compressed, perhaps slightly more cryptic forms are better: some day, it's a virtual
certainty that more processing will be added for printed lines (for example, if we're searching multiple
files, we'll want to print the filename for matching lines, too), and if the printing is duplicated in two
places, it's far too likely that we'll overlook that fact and add the new code in only one place.
One last point on the pattern-matching program: it's probably clearer to declare a pointer variable
char *pat;
and set it to the word from argv to be used as the search pattern (argv[1] or *argv, depending on
whether we're looking at page 116 or 117), and then use that in the call to strstr:
if (strstr(line, pat) != NULL ...

page 127
There's one other piece of motivation behind structures that it's useful to discuss. Suppose we didn't have
structures (or didn't know what they were or how to use them). Suppose we wanted to implement payroll
records. We might set up a bunch of parallel arrays, holding the names, mailing addresses, social security
numbers, and salaries of all of our employees:
char *name[100];
char *address[100];
long ssn[100];
float salary[100];
The idea here is that name[0], address[0], ssn[0], and salary[0] would describe one
employee, array slots with subscript [1] would describe the second employee, etc. There are at least two
problems with this scheme: first, if we someday want to handle more than 100 employees, we have to
remember to change the size of several arrays. (Using a symbolic constant like
#define MAXEMPLOYEES 100
would certainly help.)
More importantly, there would be no easy way to pass around all the information associated with a single
employee. Suppose we wanted to write the function print_employee, which will print all the
information associated with a particular employee. What arguments would this function take? We could
pass it the index to use to retrieve the information from the arrays, but that would mean that all of the
arrays would have to be global. We could pass the function an individual name, address, SSN, and salary,
but that would mean that whenever we added a new piece of information to the database (perhaps next
week we'll want to keep track of employee's shoe sizes), we would have to add another argument to the
print_employee function, and change all of the calls. (Pretty soon, the number of arguments to the
print_employee function would become unwieldy.) What we'd really like is a way to encapsulate all
of the data about a single employee into a single data structure, so we could just pass that data structure
around.
The right solution to this problem, in languages such as C which support the idea, is to define a structure
describing an employee. We can make one array of these structures to describe all the employees, and we
can pass around single instances of the structure where they're needed.
section 6.1: Basics of Structures
section 6.2: Structures and Functions

section 6.3: Arrays of Structures
The sizeof operator
section 6.4: Pointers to Structures
section 6.5: Self-referential Structures


Don't get too excited about the prospect of doing graphics in C--there's no one standard or portable way
of doing it, so the points and rectangles we're going to be discussing must remain abstract for now (we
won't be able to plot them out).
page 128
To summarize the syntax of structure declarations: A structure declaration has about four parts, most of
them optional: the keyword struct, a structure tag (optional), a brace-enclosed list of declarations for
the members (also called ``fields'' or ``components'') of the structure (optional), and a list of variables of
the new structure type (optional). The arrangement looks like this:
struct tag {
member declarations
} declared variables ;
Normally, a structure declaration defines either a tag and the members, or some variables based on an
existing tag, or sometimes all three at once. That is, we might first declare a structure:
struct point {
int x;
int y;
};
/* 1 */
and then some variables of that type:

struct point here, there;
/* 2 */
Or, we could combine the two:

struct point {
int x;
int y;
} here, there;
/* 3 */
The list of members (if present) describes what the new structure ``looks like inside.'' The list of
variables (if present) is (obviously) the list of variables of this new type which we're defining and which
the rest of the program will use. The tag (if present) is just an arbitrary name for the structure type itself
(not for any variable we're defining). The tag is used to associate a structure definition (as in fragment 1)
with a later declaration of variables of that same type (as in fragment 2).
One thing to beware of: when you declare the members of a structure without defining any variables,
always remember the trailing semicolon, as shown in fragment 1 above. (If you forget it, the compiler
will wait until the next thing it finds in your source file, and try to define that as a variable or function of
the structure type.)


In this section, we'll begin playing with structures more or less as if they were ordinary variables such as
we've been using all along (which they more or less are). As we'll see, we can declare variables of
structure type, declare functions which accept structures as parameters and return them, declare pointers
to structures, take the address of a structure (creating a pointer-to-structure) with &, and assign structures.
Notice that when we declare something as `à structure type,'' we always have to say which structure
type, usually by using the struct tag. If we've set up a ``point'' structure as above, then to declare a
variable of this type, we say
struct point thepoint;
Both
struct thepoint;
/* WRONG */
point thepoint;
/* WRONG */
and
would be errors.
The above list of things the language lets us do with structures lets us keep them and move them around,
but there isn't really anything defined by the language that we can do with structures. It's up to us to
define any operations on structures, usually by writing functions. (The addpoint function on page 130
is a good example. It will make a bit more sense if you think of it as adding not isolated points, but rather
vectors. [We can't add Seattle plus Los Angeles, but we could add (two miles south, one mile east) plus
(one mile east, two miles north).])
page 131
As an aside, how safe are the min() and max() macros defined at the top of page 131, with respect to
the criteria discussed on pages 15 and 16 of the notes on section 4.11.2 (page 90 in the text)?
The precise meaning of the ``shorthand'' -> operator is that sp->m is, by definition, equivalent to
(*sp).m, for any structure pointer sp and member m of the pointed-to structure.


page 132
In the previous section we introduced pointers to structures and functions returning structures without
fanfare. But now let's pay attention to the fact that structures fit the pattern of the other types: a structure
is a type, so we can have pointer-to-struct, array-of-struct, and function-returning-struct. (We can also
say, following our ongoing pattern of recursive definitions, that for any list of types t1, t2, t3, ..., we can
make a new type
struct tag {
t1 m1;
t2 m2;
t3 m3;
...
};
which is a structure composed of members of those types.)
page 134
We glossed over the binary search routine on page 58 in section 3.3, so we can skip the details of this
one, too. This illustrates another benefit of breaking functionality out into functions, though: as long as
you know what a function does, you can understand a program that it's in without necessarily
understanding all of it. In this case, binsearch searches an array tab, containing n cells of type
struct key, looking for one whose word field matches the parameter word. If it finds a matching
cell, it returns its index in the array; otherwise, it returns -1.

The <cw>sizeof</> operator
The sizeof operator

page 135
This may seem like an excessively roundabout or low-level way of finding the number of elements in an
array, but it is the way it's done in C, and it's perfectly safe and straightforward once you get used to it. (I
would, however, be hard-pressed to defend against the accusation that it's a bit too low-level.)
Note that sizeof works on both type names (things like int, char *, struct key, etc.) and
variables (strictly speaking, any expression). Parentheses are required when you're using sizeof with a
type name and optional when you're using it with a variable or expression (just like return), but it's
safe to just always use parentheses.
sizeof returns the size counted in bytes, where the C definition of ``byte'' is ``the size of a char.'' In
other words, sizeof(char) is always 1. (It turns out that it's not necessarily the case, though, that a
byte or a char is 8 bits.) When we start doing our own dynamic memory allocation (which will be pretty
soon), we'll always be needing to know the size of things so that we can allocate space for them, so it's
just as well that we're meeting and getting used to the sizeof operator now.
The sentence ``But the expression in the #define is not evaluated by the preprocessor'' means that, as
far as the preprocessor is concerned, the ``value'' of the macro NKEYS (like the value of any macro) is
just a string of characters like
(sizeof(keytab) / sizeof keytab[0])
which it replaces wherever NKEYS is used, and which will then be evaluated by the compiler as usual, so
it doesn't matter that the preprocessor wouldn't have known how to deal with the sizeof operator, or
how big the keytab array or a struct key were.
A third way of defining NKEYS would be
#define NKEYS (sizeof(keytab) / sizeof *keytab)
Note that the definition of NKEYS depends on the definition of the keytab array (which appears on
page 133), and both of them will have to precede the use of NKEYS in main on page 134. (Also, all three
will have to be in the same source file, unless other steps are taken.)
page 136
Notice that getword has a lot in common with the getop function of the calculator example (section
4.3, page 80).
The <cw>sizeof</> operator


The bulk of this section illustrates how to rewrite the binsearch function (which we've already been
glossing over) in terms of pointers instead of arrays (an exercise which we've been downplaying). There
are a few important points towards the end of the section, however.
page 138
When we began talking about pointers and arrays, we said that it was important never to access outside
of the defined and allocated bounds of an array, either with an out-of-range index or an out-of-bounds
pointer. There is one exception: you may compute (but not access, or ``dereference'') a pointer to the
imaginary element which sits one past the end of an array. Therefore, a common idiom for accessing an
array using a pointer looks like
int a[10];
int *ip;
for (ip = &a[0]; ip < &a[10]; ip++)
...
or
int a[10];
int *endp = &a[10];
int *ip;
for (ip = a; ip < endp; ip++)
...
The element a[10] does not exist (the allocated elements run from [0] to [9]), but we may compute
the pointer &a[10], and use it in expressions like ip < &a[10] and endp = &a[10].
Deep sentence:
Don't assume, however, that the size of a structure is the sum of the sizes of its members.
If this isn't the sort of thing you'd be likely to assume, you don't have to remember the reason, which is
mildly esoteric (having to do with memory alignment requirements).


page 139
In section 4.10, we met recursive functions. Now, we're going to meet recursively-defined data
structures. Don't throw up your hands: the two should be easier to understand in combination.
The mention of ``quadratic running time'' is tangential, but it's a useful-enough concept that it might be
worth a bit of explanation. If we were keeping a simple list (``linear array'') in order, each time we had a
new word to install, we'd have to scan over the old list. On average, we'd have to scan over half the old
list. (Even if we used binary search to find the position, we'd still have to move some part of the list to
insert it.) Therefore, the more words that were in the list, the longer it would take to install each new
word. It turns out that the running time of this linear insertion algorithm would grow as the square of the
number of items in the list (that's what ``quadratically'' means). If you doubled the size of the list, the
running time would be four times longer. An algorithm like this may seem to work fine when you run it
on small test inputs, but then when you run it on a real problem consisting of a thousand or ten thousand
or a million words, it bogs down hopelessly.
A binary tree is a great way to keep a set of words (or other values) in sorted order. The definition of a
binary tree is simply that, at each node, all items in the left subtree are less than the item at that node, and
all items in the right subtree are greater. (Note that the top item in the left subtree is not necessarily
immediately less than the item at that node or anything; the immediately-preceding item is merely down
in the left subtree somewhere, along with all the rest of the preceding items. In the ``now is the time''
example, the word ``now'' is neither the first, last, nor middle word in the sorted list; it's merely the word
that happened to be installed first. The word preceding it is ``men''; the word following it is `òf.'' The
first word in the sorted list is `àid,'' and the last word is ``to.'')
The binary tree may not immediately seem like much of an improvement over the linear array--we still
have to scan over part of the existing tree in order to insert each new word, and the time to add each new
word will get longer as there are more words in the tree. But, if you do the math, it turns out that on
average you have to scan over a much smaller part of the tree, and it's not a simple fraction like half or
one quarter, but rather the log (base two) of the number of items already in the tree. Furthermore,
inserting a new node doesn't involve reshuffling any old data. For these reasons, the running time of
binary tree insertion doesn't slow down nearly as badly as linear insertion does.
By the way, the reason that the word ``binary'' comes up so often is because it simply means ``two.'' The
binary number system has two digits (0 and 1); a binary operator has two operands; binary search
eliminates half (one over two) of the possibilities at each step; a binary tree has two subtrees at each
node.
One other bit of nomenclature: the word ``node'' simply refers to one of the structures in a set of
structures that is linked together in some way, and as we're about to see, we're going to use a set of linked
structures to implement a binary tree. Just as we talk about a ``cell'' or `èlement'' of an array, we talk
about a ``node'' in a tree or linked list.
When we look at the description of the algorithm for finding out whether a word is already in the tree, we
may begin to see why the binary tree is more efficient than the linear list. When searching through a
linear list, each time we discard a value that's not the one we're looking for, we've only discarded that one
value; we still have the entire rest of the list to search. In a binary tree, however, whenever we move
down the tree, we've just eliminated half of the tree. (We might say that a binary tree is a data structure
which makes binary search automatic.) Consider guessing a number between one and a hundred by
asking `Ìs it 1? Is it 2? Is it 3?'' etc., versus asking `Ìs it less than 50? Is it greater than 25? Is it less than
12?''
page 140
Make sure you're comfortable with the idea of a structure which contains pointers to other instances of
itself. If you draw some little box-and-arrow diagrams for a binary tree, the idea should fall into place
easily. (As the authors point out, what would be impossible would be for a structure to contain not a
pointer but rather another entire instance of itself, because that instance would contain another, and
another, and the structure would be infinitely big.)
page 141
Note that addtree accepts as an argument the tree to be added to, and returns a pointer to a tree,
because it may have to modify the tree in the process of adding a new node to it. If it doesn't have to
modify the tree (more precisely, if it doesn't have to modify the top or root of the tree) it returns the same
pointer it was handed.
Another thing to note is the technique used to mark the edges or ``leaves'' of the tree. We said that a null
pointer was a special pointer value guaranteed not to point anywhere, and it is therefore an excellent
marker to use when a left or right subtree does not exist. Whenever a new node is built, addtree
initializes both subtree pointers (``children'') to null pointers. Later, another chain of calls to addtree
may replace one or the other of these with a new subtree. (Eventually, both might be replaced.)
If you don't completely see how addtree works, leave it for a moment and look at treeprint on the
next page first.
The bottom of page 141 discusses a tremendously important issue: memory allocation. Although we only
have one copy of the addtree function (which may call itself recursively many times), by the time
we're done, we'll have many instances of the tnode structure (one for each unique word in the input).
Therefore, we have to arrange somehow that memory for these multiple instances is properly allocated.
We can't use a local variable of type struct tnode in addtree, because local variables disappear
when their containing function returns. We can't use a static variable of type struct tnode in
addtree, or a global variable of type struct tnode, because then we'd have only one node in the
whole program, and we need many.
What we need is some brand-new memory. Furthermore, we have to arrange it so that each time
addtree builds a brand-new node, it does so in another new piece of brand-new memory. Since each
node contains a pointer (char *) to a string, the memory for that string has to be dynamically allocated,
too. (If we didn't allocate memory for each new string, all the strings would end up being stored in the
word array in main on page 140, and they'd step all over each other, and we'd only be able to see the
last word we read.)
For the moment, we defer the questions of exactly where this brand-new memory is to come from by
defining two functions to do it. talloc is going to return a (pointer to a) brand-new piece of memory
suitable for holding a struct tnode, and strdup is going to return a (pointer to a) brand-new piece
of memory containing a copy of a string.
page 142
treeprint is probably the cleanest, simplest recursive function there is. If you've been putting off
getting comfortable with recursive functions, now is the time.
Suppose it's our job to print a binary tree: we've just been handed a pointer to the base (root) of the tree.
What do we do? The only node we've got easy access to is the root node, but as we saw, that's not the
first or the last element to print or anything; it's generally a random node somewhere in the middle of the
eventual sorted list (distinguished only by the fact that it happened to be inserted first). The node that
needs to be printed first is buried somewhere down in the left subtree, and the node to print just before
the node we've got easy access to is buried somewhere else down in the left subtree, and the node to print
next (after the one we've got) is buried somewhere down in the right subtree. In fact, everything down in
the left subtree is to be printed before the node we've got, and everything down in the right subtree is to
be printed after. A pseudocode description of our task, therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another tree, so printing it out
sounds about as hard as printing an entire tree, which is what we were supposed to do. In fact, it's exactly
as hard: it's the same problem. Are we going in circles? Are we getting anywhere? Yes, we are: the left
subtree, even though it is still a tree, is at least smaller than the full tree we started with. The same is true
of the right subtree. Therefore, we can use a recursive call to do the hard work of printing the subtrees,
and all we have to do is the easy part: print the node we're at. The fact that the subtrees are smaller gives
us the leverage we need to make a recursive algorithm work.
In any recursive function, it is (obviously) important to terminate the recursion, that is, to make sure that
the function doesn't recursively call itself forever. In the case of binary trees, when you reach a ``leaf'' of
the tree (more precisely, when the left or right subtree is a null pointer), there's nothing more to visit, so
the recursion can stop. We can test for this in two different ways, either before or after we make the
``last'' recursive call:
void treeprint(struct tnode *p)
{
if(p->left != NULL)
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
if(p->right != NULL)
treeprint(p->right);
}
or
{
if(p == NULL)
return;
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
}
Sometimes, there's little difference between one approach and the other. Here, though, the second
approach (which is equivalent to the code on page 142) has a distinct advantage: it will work even if the
very first call is on an empty tree (in this case, if there were no words in the input). As we mentioned
earlier, it's extremely nice if programs work well at their boundary conditions, even if we don't think
those conditions are likely to occur.
(One more thing to notice is that it's quite possible for a node to have a left subtree but not a right, or vice
versa; one example is the node labeled `òf'' in the tree on page 139.)
Another impressive thing about a recursive treeprint function is that it's not just a way of writing it,
or a nifty way of writing it; it's really the only way of writing it. You might try to figure out how to write
a nonrecursive version. Once you've printed something down in the left subtree, how do you know where
to go back up to? Our struct tnode only has pointers down the tree, there aren't any pointers back to
the ``parent'' of each node. If you write a nonrecursive version, you have to keep track of how you got to
where you are, and it's not enough to keep track of the parent of the node you're at; you have to keep a
stack of all the nodes you've passed down through. When you write a recursive version, on the other
hand, the normal function-call stack essentially keeps track of all this for you.
We now return to the problem of dynamic memory allocation. The basic approach builds on something
we've been seeing glimpses of for a few chapters now: we use a general-purpose function which returns a
pointer to a block of n bytes of memory. (The authors presented a primitive version of such a function in
section 5.4, and we used it in the sorting program in section 5.6.) Our problem is then reduced to (1)
remembering to call this allocation function when we need to, and (2) figuring out how many bytes we
need. Problem 1 is stubborn, but problem 2 is solved by the sizeof operator we met in section 6.3.
You don't need to worry about all the details of the ``digression on a problem related to storage
allocators.'' The vast majority of the time, this problem is taken care of for you, because you use the
system library function malloc.
The problem of malloc's return type is not quite as bad as the authors make it out to be. In ANSI C, the
void * type is a ``generic'' pointer type, specifically intended to be used where you need a pointer
which can be a pointer to any data type. Since void * is never a pointer to anything by itself, but is
always intended to be converted (``coerced'') into some other type, it turns out that a cast is not strictly
required: in code like
struct tnode *tp = malloc(sizeof(struct tnode));
or
return malloc(sizeof(struct tnode));
the compiler is willing to convert the pointer types implicitly, without warning you and without requiring
you to insert explicit casts. (If you feel more comfortable with the casts, though, you're welcome to leave
them in.)
page 143
strdup is a handy little function that does two things: it allocates enough memory for one of your
strings, and it copies your string to the new memory, returning a pointer to it. (It encapsulates a pattern
which we first saw in the readlines function on page 109 in section 5.6.) Note the +1 in the call to
malloc! Accidentally calling malloc(strlen(s)) is an easy but serious mistake.
As we mentioned at the beginning of chapter 5, memory allocation can be hard to get right, and is at the
root of many difficulties and bugs in many C programs. Here are some rules and other things to
remember:
1. Make sure you know where things are allocated, either by the compiler or by you. Watch out for
things like the local line array we've been tending to use with getline, and the local word
array on page 140. When a function writes to an array or a pointer supplied by the caller, it
depends on the caller to have allocated storage correctly. When you're the caller, make sure you
pass a valid pointer! Make sure you understand why
char *ptr;
getline(ptr, 100);
2.
3.
4.
5.
6.
is wrong and can't work. (For one thing: what does that 100 mean? If getline is only allowed
to read at most 100 characters, where have we allocated those 100 characters that getline is not
allowed to write to more of than?)
Be aware of any situations where a single array or data structure is used to store multiple different
things, in succession. Think again about the local line array we've been tending to use with
getline, and the local word array on page 140. These arrays are overwritten with each new
line, word, etc., so if you need to keep all of the lines or words around, you must copy them
immediately to allocated memory (as the line-sorting program on pages 108-9 in section 5.6 did,
but as the longest line program on page 29 in section 1.9 and the pattern-matching programs on
page 69 in section 4.1 and pages 116-7 in section 5.10 did not have to do).
Make sure you allocate enough memory! If you allocate memory for an array of 10 things, don't
accidentally store 11 things in it. If you have a string that's 10 characters long, make sure you
always allocate 11 characters for it (including one for the terminating '\0').
When you free (deallocate) memory, make sure that you don't have any pointers lying around
which still point to it (or if you do, make sure not to use them any more).
Always check the return value from memory-allocation functions. Memory is never infinite:
sooner or later, you will run out of memory, and allocation functions generally return a null
pointer when this happens.
When you're not using dynamically-allocated memory any more, do try to free it, if it's convenient
to do so and the program's not just about to exit. Otherwise, you may eventually have so much
memory allocated to stuff you're not using any more that there's no more memory left for new
stuff you need to allocate. (However, on all but a few broken systems, all memory is
automatically and definitively returned to the operating system when your program exits, so if one
of your programs doesn't free some memory, you shouldn't have to worry that it's wasted forever.)
Unfortunately, checking the return values from memory allocation functions (point 5 above) requires a
few more lines of code, so it is often left out of sample code in textbooks, including this one. Here are
versions of main and addtree for the word-counting program (pages 140-1 in the text) which do
check for out-of-memory conditions:
/* word frequency count */
main()
{
struct tnode *root;
char word[MAXWORD];
root = NULL;
while (getword(word, MAXWORD) != EOF) {
if (isalpha(word[0])) {
root = addtree(root, word);
if(root == NULL) {
printf("out of memory\n");
return 1;
}
}
}
treeprint(root);
return 0;
}
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
p = talloc();
/* make a new node */
if (p == NULL)
return NULL;
p->word = strdup(w);
if (p->word == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
p->left = addtree(p->left, w);
if(p->left == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
p->right = addtree(p->right, w);
if(p->right == NULL)
return NULL;
}
return p;
}
In practice, many programmers would collapse the calls and tests:
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
if ((p = talloc()) == NULL)
return NULL;
if ((p->word = strdup(w)) == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
if ((p->left = addtree(p->left, w)) == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
if ((p->right = addtree(p->right, w)) == NULL)
return NULL;
}
return p;
}

page 151
By `Ìnput and output facilities are not part of the C language itself,'' we mean that things like printf
are just function calls like any other. C has no built-in input or output statements. For our purposes, the
implications of this fact--that I/O is not built in--is mainly that the compiler may not do as much
checking as we might like it to. If we accidentally write
double d = 1.23;
printf("%d\n", d);
the compiler says, ``Hmm, a function named printf is being called with a string and a double. Okay
by me.'' The compiler does not (and, in general, could not even if it wanted to) notice that the %d format
requires an int.
Although the title of this chapter is `Ìnput and Output,'' it appears that we'll also be meeting a few other
routines from the standard library.
If you start to do any serious programming on a particular system, you'll undoubtedly discover that it has
a number of more specialized input/output (and other system-related) routines available, which promise
better performance or nicer functionality than the pedestrian routines of C's standard library. You should
resist the temptation to use these nonstandard routines. Because the standard library routines are defined
precisely and `èxist in compatible form on any system where C exists,'' there are some real advantages
to using them. (On the other hand, when you need to do something which C's standard library routines
don't provide, you'll generally turn to your machine's system-specific routines right away, as they may be
your only choice. One common example is when you'd like to read one character immediately, without
waiting for the RETURN key. How you do that depends on what system you're using; it is not defined by
C.)
section 7.1: Standard Input and Output
section 7.2: Formatted Output--Printf
section 7.3: Variable-length Argument Lists
section 7.4: Formatted Input--Scanf
section 7.5: File Access
section 7.6: Error Handling--Stderr and Exit

section 7.7: Line Input and Output
section 7.8.1: String Operations
section 7.8.2: Character Class Testing and Conversion
section 7.8.3: Ungetc
section 7.8.4: Command Execution
section 7.8.5: Storage Management
section 7.8.6: Mathematical Functions
section 7.8.7: Random Number Generation


Note that `à text stream'' might refer to input (to the program) from the keyboard or output to the screen,
or input and output from files on disk. (For that matter, it can also refer to input and output from other
peripheral devices, or the network.)
Note that the stdio library generally does newline translation for you. If you know that lines are
terminated by a linefeed on Unix and a carriage return on the Macintosh and a carriage-return/linefeed
combination on MS-DOS, you don't have to worry about these things in C, because the line termination
will always appear to a C program to be a single '\n'. (That is, when reading, a single '\n' represents
the end of the line being read, and when writing, writing a '\n' causes the underlying system's actual
end-of-line representation to be written.)
pages 152-153
The ``lower'' program is an example of a filter: it reads its standard input, ``filters'' (that is, processes) it
in some way, and writes the result to its standard output. Filters are designed for (and are only really
useful under) a command-line interface such as the Unix shells or the MS-DOS command.com interface.
Obviously, you would rarely invoke a program like lower by itself, because you would have to type the
input text at it and you could only see the output ephemerally on your screen. To do any real work, you
would always redirect the input:
lower < inputfile
and perhaps the output:
lower < inputfile > outputfile
(notice that spaces may precede and follow the < and > characters). Or, a filter program like lower
might appear in a longer pipeline:
oneprogram | lower | anotherprogram
or
anotherprogram < inputfile | lower | thirdprogram > outputfile
Filters like these are not terribly useful, though, under a Graphical User Interface such as the Macintosh
or Microsoft Windows.

section 7.2: Formatted Output -- Printf
section 7.2: Formatted Output -- Printf

pages 153-155
To summarize the important points of this section:
printf's output goes to the standard output, just like putchar.

Everything in printf's format string is either a plain character to be printed as-is, or a %specifier which generally causes one argument to be consumed, formatted, and printed.
(Occasionally, a single %-specifier consumes two or three arguments if the width or precision is *,
or zero arguments if the specifier is %%.)
There's a fairly long list of conversion specifiers; see the table on page 154.
Always be careful that the conversions you request (in the format string) match the arguments you
supply.
You can ``print'' to a string (instead of the standard output) with sprintf. (This is the usual way
of converting numbers to strings in C; the itoa function we were playing with in section 3.6 on
page 64 is nonstandard, and unnecessary.)


This is an advanced section which you don't need to read.

section 7.4: Formatted Input -- Scanf

page 157
Somehow we've managed to make it through six chapters without meeting scanf, which it turns out is
just as well.
In the examples in this book so far, all input (from the user, or otherwise) has been done with getchar
or getline. If we needed to input a number, we did things like
char line[MAXLINE];
int number;
number = atoi(line);
Using scanf, we could ``simplify'' this to
int number;
scanf("%d", &number);
This simplification is convenient and superficially attractive, and it works, as far as it goes. The problem
is that scanf does not work well in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true of scanf: you can have
baffling problems if you try to intermix calls to scanf with calls to getchar or getline. Worse, it
turns out that scanf's error handling is inadequate for many purposes. It tells you whether a conversion
succeeded or not (more precisely, it tells you how many conversions succeeded), but it doesn't tell you
anything more than that (unless you ask very carefully). Like atoi and atof, scanf stops reading
characters when it's processing a %d or %f input and it finds a non-numeric character. Suppose you've
prompted the user to enter a number, and the user accidentally types the letter `x'. scanf might return 0,
indicating that it couldn't convert a number, but the unconvertable text (the `x') remains on the input
stream unless you figure out some other way to remove it.
For these reasons (and several others, which I won't bother to mention) it's generally recommended that
scanf not be used for unstructured input such as user prompts. It's much better to read entire lines with
something like getline (as we've been doing all along) and then process the line somehow. If the line
is supposed to be a single number, you can use atoi or atof to convert it. If the line has more
complicated structure, you can use sscanf (which we'll meet in a minute) to parse it. (It's better to use
sscanf than scanf because when sscanf fails, you have complete control over what you do next.
When scanf fails, on the other hand, you're at the mercy of where in the input stream it has left you.)
With that little diatribe against scanf out of the way, here are a few comments on individual points
made in section 7.4.

We've met a few functions (e.g. getline, month_day in section 5.7 on page 111) which return more
than one value; the way they do so is to accept a pointer argument that tells them where (in the caller) to
write the returned value. scanf is the epitome of such functions: it returns potentially many values (one
for each %-specifier in its format string), and for each value converted and returned, it needs a pointer
argument.
The statement on page 157 that ``blanks or tabs'' in the format string `àre ignored'' (which is repeated on
page 159) is a simplification: in actuality, a blank or tab (or newline; actually any whitespace) in the
format string causes scanf to skip whitespace (blanks, tabs, etc.) in the input stream.
A * character in a scanf conversion specifier means something completely different than it does for
printf: for scanf, it means to suppress assignment (i.e. for that conversion specifier, there isn't a
pointer in the argument list to receive the converted value, so the converted value is discarded). With
scanf, there is no direct way of taking a field width from the argument list, as * does for printf.
Conversion specifiers like %d and %f automatically skip leading whitespace while looking for something
to convert. This means that the format strings "%d %d" and "%d%d" act exactly the same--the
whitespace in the first format string causes whitespace to be skipped before the second %d, but the second
%d would have skipped that whitespace anyway. (Yet another scanf foible is that the innocuouslooking format string "%d\n" converts a number and then skips whitespace, which means that it will
gobble up not only a newline following the number it converts, but any number of newlines or
whitespace, and in fact it will keep reading until it finds a non-whitespace character, which it then won't
read. This sounds confusing, but so is scanf's behavior when given a format string like "%d\n". The
moral is simple: don't use trailing \n's in scanf format strings.)
page 158
Notice that, for scanf, the %e, %f, and %g formats are all the same, and signify conversion of a float
value (they accept a pointer argument of type float *). To convert a double, you need to use %le,
%lf, or %lg. (This is quite different from the printf family, which uses %e, %f, and %g for floats
and doubles, though all three request different formats. Furthermore, %le, %lf, and %lg are
technically incorrect for printf, though most compilers probably accept them.)
page 159
More precisely, the reason that you don't need to use a & with monthname is that an array, when it
appears in an expression like this, is automatically converted to a pointer.
The dual-format date conversion example in the middle of page 159 is a nice example of the advantages
of calling getline and then sscanf. At the beginning of this section, I said that ``when sscanf fails,
you have complete control over what you do next.'' Here, ``what you do next'' is try calling sscanf
again, on the very same input string (thus effectively backing up to the very beginning of it), using a
different format string, to try parsing the input a different way.


page 160
We've come an amazingly long way without ever having to open a file (we've been relying exclusively
on those predefined standard input and output streams) but now it's time to take the plunge.
The concept of a file pointer is an important one. It would theoretically be possible to mention the name
of a file each time it was desired to read from or write to it. But such an approach would have a number
of drawbacks. Instead, the usual approach (and the one taken in C's stdio library) is that you mention the
name of the file once, at the time you open it. Thereafter, you use some little token--in this case, the file
pointer--which keeps track (both for your sake and the library's) of which file you're talking about.
Whenever you want to read from or write to one of the files you're working with, you identify that file by
using the file pointer you obtained from fopen when you opened the file. (It is possible to have several
files open, as long as you use distinct variables to store the file pointers.)
Not only do you not need to know the details of a FILE structure, you don't even need to know what the
``buffer'' is that the structure contains the location of.
In general, the only declaration you need for a file pointer is the declaration of the file pointer:
FILE *fp;
You should never need to type the line
FILE *fopen(char *name, char *mode);
because it's provided for you in <stdio.h>.
If you skipped section 6.7, you don't know about typedef, but don't worry. Just assume that FILE is a
type, like int, except one that is defined by <stdio.h> instead of being built into the language.
Furthermore, note that you will never be using variables of type FILE; you will always be using pointers
to this type, or FILE *.
A ``binary file'' is one which is treated as an arbitrary series of byte values, as opposed to a text file. We
won't be working with binary files, but if you ever do, remember to use fopen modes like "rb" and
"wb" when opening them.
page 161
We won't worry too much about error handling for now, but if you start writing production programs, it's
something you'll want to learn about. It's extremely annoying for a program to say ``can't open file''
without saying why. (Some particularly unhelpful programs don't even tell you which file they couldn't
open.)
On this page we learn about four new functions, getc, putc, fprintf, and fscanf, which are just
like functions that we've already been using except that they let you specify a file pointer to tell them
which file (or other I/O stream) to read from or write to. (Note that for putc, the extra FILE *
argument comes last, while for fprintf and fscanf, it comes first.)
page 162
cat is about the most basic and important file-handling program there is (even if its name is a bit
obscure). The cat program on page 162 is a bit like the ``hello, world'' program on page 6--it may seem
trivial, but if you can get it to work, you're over the biggest first hurdle when it comes to handling files at
all.
Compare the cat program (and especially its filecopy function) to the file copying program on page
16 of section 1.5.1--cat is essentially the same program, except that it accepts filenames on the
command line.
Since the authors advise calling fclose in part to ``flush the buffer in which putc is collecting
output,'' you may wonder why the program at the top of the page does not call fclose on its output
stream. The reason can be found in the next sentence: an implicit fclose happens automatically for any
streams which remain open when the program exits normally.
In general, it's a good idea to close any streams you open, but not to close the preopened streams such as
stdin and stdout. (Since ``the system'' opened them for you as your program was starting up, it's
appropriate to let it close them for you as your program exits.)

section 7.6: Error Handling -- Stderr and Exit
section 7.6: Error Handling -- Stderr and Exit

page 163
stdout and stderr are both predefined output streams; for our purposes, the only difference between
them is that stderr is not likely to be redirected by the user, so the error messages printed to stderr
will always appear on the screen, where they can be seen.
page 164
The cryptic note about `à pattern-matching program'' simply means that if you want to search the source
code of a program for all the exit status values it can return, `èxit'' might be an easier string to search for
than ``return.'' (Every call to exit represents an exit from the program, but not every return statement
does.)
The feof and ferror functions can be used to check for error conditions more carefully. In general,
input routines (such as getchar and getline) return some special value to tell you that they couldn't
read any more. Often, this value is EOF, reinforcing the notion that the only possible reason they couldn't
read any more was because end-of-file had been reached. However, it's also possible that there was a
read error, and you can call feof or ferror to determine whether this was the case. On the output
side, though the output routines generally do return an error indication, few programs bother to check the
return values from every call to functions such as putchar and printf. One way to check for output
errors, without having to check the return value of every function, is to call ferror on the output
stream (which might be stdout) at key points.

http://www.eskimo.com/~scs/cclass/krnotes/sx10f.html [22/07/2003 5:10:01 PM]

pages 164-165
To summarize, puts is like fputs except that the stream is assumed to be the standard output
(stdout), and a newline ('\n') is automatically appended. gets is like fgets except that the stream
is assumed to be stdin, and the newline ('\n') is deleted, and there's no way to specify the maximum
line length. This last fact means that you almost never want to use gets at all: since you can't tell it how
big the array it's to read into is, there's no way to guarantee that some unexpectedly-long input line won't
overflow the array, with dire results. (When discussing the drawbacks of gets, it's customary to point
out that the `Ìnternet worm,'' a program that wreaked havoc in 1988 by breaking into computers all over
the net, was able to do so in part because a key network utility on many Unix systems used gets, and
the worm was able to overflow the buffer in a particularly low, cunning way, with the dire result that the
worm achieved superuser access to the attacked machine.)


page 166
One thing to beware of is that strcpy's arguments--more precisely, the strings pointed to by its
arguments--must not overlap.
Another string function we've seen is strstr:
strstr(s,t) return pointer to first t in s, or NULL if not present

section 7.8.2: Character Class Testing and Conversion
section 7.8.2: Character Class Testing and

Conversion
One quirk of these functions, which the authors mention briefly, is that although they accept arguments
of type int, it is not legal to pass just any int value to them. If you were to attempt to call
isupper(12345), it might do something bizarre. You should only call these functions with
arguments which represent valid character values. (Also, they are guaranteed to accept the value EOF
gracefully.)


There's not much more to say about ungetc, but two more stdio functions which might deserve mention
are fread and fwrite.
getc and putc (and getchar and putchar) allow you to read and write a character at a time, while
fgets and fputs read and write a line at a time. The printf family of routines does formatted
output, and the scanf family does formatted input. But what if you want to read or write a big block of
unformatted characters, not necessarily one line long? You could use getc or putc in a loop, but
another solution is to use the fread and fwrite functions, which are (briefly) described in appendix
B1.5 on page 247.

http://www.eskimo.com/~scs/cclass/krnotes/sx10j.html [22/07/2003 5:10:07 PM]

page 167
The only thing to add to this brief description of system concerns the disposition of the executed
command's output. (Similar arguments apply to its input.) The output generally goes wherever the calling
program's output goes, though if the calling program has done anything with stdout (such as closing it,
or redirecting it within the program with freopen), those changes will probably not affect the output of
system. One way to achieve redirection of the command executed by system, if the operating system
permits it, is to use redirection notation within the command line passed to system:
system("date > outfile");
Note also that the exit status returned by the program (and hence perhaps by system) does not
necessarily have anything to do with anything printed by the program. One way to capture the output
printed by the program is to use redirection, as above, then open and read the output file.

http://www.eskimo.com/~scs/cclass/krnotes/sx10k.html [22/07/2003 5:10:09 PM]

The important thing to know about malloc and free and friends is to be careful when calling them. It
is easy to abuse them, either by using more space than you ask for (that is, writing beyond the ends of an
allocated block) or by continuing to use a pointer after the memory it points to has been freed (perhaps
because you had several pointers to the same block of memory, and you forgot that when you freed one
pointer they all became invalid). malloc-related bugs can be difficult and frustrating to track down, so
it's good to use programming practices which help to assure that the bugs don't happen in the first place.
(One such practice is to make sure that pointer variables are set to NULL when they don't point anywhere,
and to occasionally check pointer values--for instance at entry to an important pointer-using function--to
make sure that they're not NULL.)
As we mentioned on page 142 in section 6.5, it is no longer necessary (that is, in ANSI C) to cast
malloc's value to the appropriate type, though it doesn't hurt to do so.


page 168
Note that the pow function is how you do exponentiation in C--C does not have a built-in exponentiation
operator (such as ** or ^ in some other languages).
Before calling these functions, remember to #include <math.h>. (It's always a good idea to
#include the appropriate header(s) before using any library functions, but the math functions are
particularly unlikely to work correctly if you forget.) Also, under Unix, you may have to explicitly
request the math library by adding the -lm option at the end of the command line when
compiling/linking.

http://www.eskimo.com/~scs/cclass/krnotes/sx10m.html [22/07/2003 5:10:11 PM]

There is a typo in some printings; the code for returning a floating-point random number in the interval
[0,1) should be
#define frand() ((double) rand() / (RAND_MAX+1.0))
If you want to get random integers from M to N, you can use something like
M + (int)(frand() * (N-M+1))
``[Setting] the seed for rand'' refers to the fact that, by default, the sequence of pseudo-random numbers
returned by rand is the same each time your program runs. To randomize it, you can call srand at the
beginning of the program, handing it some truly random number, such as a value having to do with the
time of day. (One way is with code like
#include <stdlib.h>
#include <time.h>
srand((unsigned int)time((time_t *)NULL));
which uses the time function mentioned on page 256 in appendix B10.)
One other caveat about rand: don't try to generate random 0/1 values (to simulate a coin flip, perhaps)
with code like
rand() % 2
This looks like it ought to work, but it turns out that on some systems rand isn't always perfectly
random, and returns values which consistently alternate even, odd, even, odd, etc. (In fact, for similar
reasons, you shouldn't usually use rand() % N for any value of N.) A good way to get random 0/1
values would be
(int)(frand() * 2)
based on the other frand() examples above.
Read sequentially: prev up top

Steve Summit's home page
Steve Summit's home page

I maintain the Usenet comp.lang.c FAQ list.
It's also available as a book.
I used to teach a pair of C programming classes, and their notes are all on line.
(More information about C.)
Other stuff I've written (mostly C-related Usenet posts).
Here are my collections of links, home pages, and other assorted references.
The ISP hosting this page and where I receive my e-mail, eskimo.com, is a rare and special one, perfect
for my needs. If you're looking for an ISP that gives you the access you need (including Unix shells)
without getting in your way or charging too much, check it out.
Steve Summit.
http://www.eskimo.com/~scs/ [22/07/2003 5:10:14 PM]
comp.lang.c Frequently Asked Questions

This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book ``C
Programming FAQs: Frequently Asked Questions'' (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
made available here by permission of the author and the publisher as a service to the community. It is
intended to complement the use of the published text and is protected by international copyright laws.
The content is made available here and may be accessed freely for personal use but may not be published
or retransmitted without written permission.
This page is the top of an HTML version of the Usenet comp.lang.c Frequently Asked Questions (FAQ)
list. An FAQ list is a collection of questions commonly asked on Usenet, together with presumably
definitive answers, provided in an attempt to keep repeated questions on the newsgroup down to a low
background drone so that discussion can move on to more interesting matters. Since they distill
knowledge gleaned from many sources and answer questions which are demonstrably Frequent, FAQ
lists serve as useful references outside of their originating Usenet newsgroups. This list is, I dare to
claim, no exception, and the HTML version you're looking at now, as well as other versions referenced
just below, are intended to be useful to C programmers everywhere.
Several other versions of this FAQ list are available, including a book-length version published by
Addison-Wesley. (The book, though longer, also has a few more errors; I've prepared an errata list.) See
also question 20.40.
Like so many web pages, this is very much a ``work in progress.'' I would, of course, like it if it were
perfect, but it's been two years or so since I first started talking about putting this thing on the web, and if
I were to wait until all the glitches were worked out, you might never see it. Each page includes a ``mail
feedback'' button, so you can help me debug it. (At first, you don't have to worry about reporting minor
formatting hiccups; many of these result from lingering imperfections in the programs that generate these
pages, or from the fact that I have not exhaustively researched how various browsers implement the
HTML tags I'm using, or from the fact that I haven't gone the last yard in trying to rig up HTML that
looks good in spite of the fact that HTML doesn't have everything you need to make things look good.)
These pages are synchronized with the posted Usenet version and the Addison-Wesley book version.
Since not all questions appear in all versions, the question numbers are not always contiguous.
[Note to web authors, catalogers, and bookmarkers: the URL <http://www.eskimo.com/~scs/Cfaq/top.html> is the right way to link to these pages. All other URL's implementing this collection are
subject to change.]
You can browse these pages in at least three ways. The table of contents below is of the list's major
sections; these links lead to sub-lists of the questions for those sections. The `àll questions'' link leads to
a list of all the questions; each question is (obviously) linked to its answer. Finally, the ``read
http://www.eskimo.com/~scs/C-faq/top.html (1 of 3) [22/07/2003 5:10:15 PM]
sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com
1. Declarations and Initializations

2. Structures, Unions, and Enumerations
3. Expressions
4. Pointers
5. Null Pointers
6. Arrays and Pointers
7. Memory Allocation
8. Characters and Strings
9. Boolean Expressions and Variables
10. C Preprocessor
11. ANSI/ISO Standard C
12. Stdio
13. Library Functions
14. Floating Point
15. Variable-Length Argument Lists
16. Strange Problems
17. Style
18. Tools and Resources
19. System Dependencies
20. Miscellaneous
Bibliography
Acknowledgements
All Questions
Read Sequentially
versions of comp.lang.c FAQ list
comp.lang.c FAQ list(s)

You probably just came from there, but there is a browsable, web-based HTML version. (Beware: as of
1999, the web-based version is somewhat out-of-date with respect to the plain-text versions below.)
(Please don't ask me for a downloadable archive of the HTML version, as I'm currently unable to provide
one. Just browse it here, or download one of the versions below.)
An expanded, book-length version, with even longer answers to even more questions, has been published
by Addison-Wesley (ISBN 0-201-84519-9). Printed books, alas, tend to have a few errors; I've prepared
an errata list for this one.
Here is a recent, compressed copy of the ASCII FAQ list, as posted to Usenet (~100k compressed, ~260k
when uncompressed). This is currently the most up-to-date version. [This and the other compressed files
ending in .Z referenced from this page are compressed with the Unix "compress" utility and can be
uncompressed with "uncompress" or "gunzip", versions of which are, I believe, available for all popular
operating systems.]
Here is the abridged version (~26k compressed, ~55k when uncompressed).
Here are the differences from the previous version (compressed, sometimes quite large; or maybe
uncompressed, if they were minimal). Here is a collection of incremental differences with respect to even
older versions. NOTE: All of these diff lists pertain to the versions posted to Usenet, which are not
always synchronized with the web/html version.
Here is a (considerably older) compressed, PostScript rendition (152k compressed). BEWARE: the
question numbers don't match current versions. (Rather than printing it out, you could -- hint, hint -- get
the book.)
There are several translations into other languages:
to German, by Jochen Schoof et al. (If that link doesn't work, try this one.)
to Japanese, by Kinichi Kitano. (I don't know of a URL, but it is or was posted regularly to
fj.comp.lang.c, and has been published by Toppan, ISBN 4-8101-8097-2.)
Seong-Kook Cin has completed a Korean translation, which is at
http://pcrc.hongik.ac.kr/~cinsk/cfaqs/.
A French C FAQ list (not a direct translation of this one) is at http://www.istyinfo.uvsq.fr/~rumeau/fclc/.
Here is an, um, er, `àlternate version'' by Peter Seebach.

http://www.eskimo.com/~scs/C-faq/versions.html (1 of 2) [22/07/2003 5:10:17 PM]
versions of comp.lang.c FAQ list
If you're interested in C++, Marshall Cline maintains a C++ FAQ list.

For web access to other Usenet FAQ lists, visit faqs.org.
scs
http://www.eskimo.com/~scs/C-faq/versions.html (2 of 2) [22/07/2003 5:10:17 PM]
C Programming FAQs Errata
Errata list for "C Programming FAQs: Frequently Asked Questions",

by Steve Summit, Addison-Wesley, 1996, ISBN 0-201-84519-9
(first printing).
A possibly more up-to-date copy of this errata list may be
obtained at any time by anonymous ftp from ftp.eskimo.com
in the file ~scs/C-faq/book/Errata, or on the web at
http://www.eskimo.com/~scs/C-faq/book/Errata.html .
(If you read this years from now and those addresses don't
work, try ftp://ftp.aw.com/cseng/authors/summit/cfaq/ or
http://www.awl.com/cseng/titles/0-201-84519-9 .)
scs 2002-Oct-26
page
----
question
--------
front cover
The ladder has no rungs.
xxix
"woundn't" should be "wouldn't"
1.1
The fourth bulleted guarantee (about the sizes

following the "obvious progression") is
improperly stated. What the C Standard actually
talks about, as in the rest of this answer, is
just the ranges of the standard types, not their
sizes in bits. So the real guarantees (as
summarized below) are that
sizeof(char)
sizeof(short)
sizeof(int)
sizeof(long)
is
is
is
is
at
at
at
at
least
least
least
least
8 bits
16 bits
16 bits
32 bits
and, in C99,
sizeof(long long) is at least 64 bits
3-4
1.3
In C99, the new <inttypes.h> header provides

Standard names for exact-size types: int16_t,
uint32_t, etc.
1.4
In C99, long long is defined as an integer type

with, in effect, at least 64 bits.
1.7
There may be zero definitions of an external
http://www.eskimo.com/~scs/C-faq/book/Errata.html (1 of 12) [22/07/2003 5:10:20 PM]
function or variable that is not referenced

in any expression.
[Thanks and $1 to James Stern]
7
1.7
"use include to bring" should be

"use #include to bring"
11
1.14
In the second fix, at the bottom of the page,

it could conceivably be necessary to precede
the line
typedef struct node *NODEPTR;
with the line
struct node;
for the
in that
clearly
[Thanks
13
1.15
reason mentioned on page 13, although

case one of the two other fixes would
be preferable.
to James Stern]
In the alternate fix, at the bottom of the page,

it could conceivably be necessary to precede
the typedef declarations with the lines
struct a;
struct b;
although again, putting those typedefs after the
complete structure definitions would clearly be
preferable in that case.
[Thanks to James Stern]
18
1.22
The odd "return 0;" line is not really necessary.
20
1.24
Another possible arrangement is

/* file1.h */
#define ARRAYSZ 3
extern int array[ARRAYSZ];
/* file1.c */
#include "file1.h"
int array[ARRAYSZ];
/* file2.c"
#include "file1.h"
[Thanks to Jon Jagger]
23
1.29
[2nd bullet] "everything else termed" should be

"everything else, termed"
24
1.29
[Rule 3] "if the header" should be "if any header".

24
1.29
[Rule 4] "(i.e., function names)" should be

"(e.g., function names)".
24
1.29
The text at the bottom of the page suggests that

"future directions" name patterns such as str[a-z]*
are reserved only if their corresponding headers
(e.g. <stdlib.h>) are included. The reserved
function names are unconditionally reserved;
it is only the macro names that are reserved only
if the header is included.
[Thanks and $1 to Mark Brader]
25
1.29
"if you don't include the header files" should be

"if you don't include any header files".
32
2.4
Besides -> and sizeof, the . operator, as well as

declarations of actual structures, also require
the compiler to know more about the structure and
so preclude incomplete or hidden definitions.
33-36
2.6
In C99, a structure can contain a variable-length

array (VLA) as its last member, providing a
well-defined, Standard-compliant alternative.
38
2.10
C99 *does* have a way of generating anonymous

structure values: "compound literals".
40
2.12
When trying to minimize wasted space in structures,

array members should be ordered based on the size
of their primitive types, not their overall size.
43
2.20
"ANSI/SIO" should be "ANSI/ISO"

In C99, the "designated initializer" mechanism
allows any member of a union to be initialized.

50
3.3
Of course, another way to increment i is i += 1.

51
3.4
"higher precedence than *):" should be

"higher precedence than *:"
52
3.6
Delete the close parenthesis at the end of the answer.
57
3.12
In C++, the prefix form ++i is preferred.

68
4.5
The reference to ANSI Sec. 3.3.4 should say

"esp. footnote 44".
[Thanks to Willis Gooch]
72-3
4.10
In C99, it is possible to use a "compound

literal" to generate a pointer to an (unnamed)
constant value.
73
4.11
The reference to K&R2 sec. 5.2 should be pp. 95-7.

[Thanks and $1 to Nikos Triantafillis]
75
4.13
"can interconverted" should be "can be interconverted".

[Thanks and $1 to Howard Ham]
84
5.8
Either the comma or the parentheses in the answer

should be changed.
95
6.2
The typography in the following line is inconsistent

for the "x" of "x[3]".
104-5
6.15
C99 introduces variable-length arrays (VLA's) which,

among other things, *do* allow declaration of a
local array of size matching a passed-in array.
105-7
6.16
In C99, another solution is to use a

variable-length array.
110
6.19
C99's variable-length arrays are also a nice

solution to this problem.
115
7.1
The close parenthesis and period ")." at the bottom

of the page are not part of the #define line.
121
7.9
There is an extra semicolon at the end of the first
line of mymalloc's definition.

[Thanks and $1 to Todd Burruss]
126
7.10
Missing "it"; should be "even if it is not

dereferenced".
[Thanks and $1 to Clinton Sheppard]
132
7.30
It would be even safer to add a second test on

nchmax:
if(nchread >= nchmax) {
nchmax += 20;
if(nchread >= nchmax) {
free(retbuf);
return NULL;
}
newbuf = realloc(retbuf, nchmax + 1);
The concern is that, while reading a *very* long line,
nchmax might overflow, wrapping back around to 0.
[Thanks to Mark Brader]
134
7.32
C99's variable-length arrays (VLA's) can be used

to more cleanly accomplish most of the tasks
which alloca used to be put to.
136
8.1
"Although string literal" should be

"Although a string literal"
136
8.2
C can be tricked into seeming to assign an array

as a whole if you hide the array inside a
structure or union.
143
9.2
The example variable isvegetable should perhaps

be named is_vegetable to avoid naming conflicts
(see question 1.29).
[Thanks and $1 to Jon Jagger]
151
10.4
Extra space in "/* (no trailing ; ) */".
152
10.6
[paragraph below bullets] "bring the header wherever"

should be "bring the header in wherever"
158
10.15
If you have to, you can obviously #define a companion

macro name for each typedef, and use #ifdef with that.
161
10.21
The suggested replacement macro should

parenthesize c:
#define CTRL(c) ((c) & 037)
163-4
10.29
C99 introduces formal support for macros with

variable numbers of arguments.
164-5
10.27
The file parameter of the dbginfo() function and

the fmt parameter of the debug() function could
be of type const char *.
168
11.1
The story has gotten longer: A new revision of

the C Standard, "C99", has been ratified,
superseding the original ANSI C Standard.
This Errata list has been updated to note those
answers in the book which have become dated due
to C99 changes.
169-70
11.2
C99 *is* available in electronic form, for $18

from www.ansi.org .
174
11.10
As written, the "complicated series of assignments"

of course includes some declarations and initializations.
175
11.10
"e.g., (const char) ** in this case" should be

"e.g., (const char **) in this case"
"when the pointers which" should either be
"when the pointers" or "with pointers which"
180
11.19
"questions 20.20" should be "question 20.20"
182
11.25
"The function offers" should be

"The memmove function offers".
[Thanks and $1 to Gordon Burditt]
183-4
11.27
In C99, external identifiers are required

to be unique in the first 32 characters;
C90's extremely Spartan limitation to six
characters has been relaxed.
186
11.29
You may also need to rework calls to realloc

that use NULL or 0 as first or second arguments
186
11.29
You may also need to rework conditional compilation

involving #elif.
See also the Rationale's list of "Quiet Changes"
189
11.33
A fourth class of behavior is locale-specific.

198
12.11
A semicolon is missing after "int i = 0".

The } just before the line "*p = '\0'" is
indented one tab too few.
Two instances of "*--p" have the minus signs merged
so as to appear as one.
201
12.16
[case 2] The variable line is not declared;

it should probably be a char [], suitably
initialized, e.g.:
char line[] = "1 2.3 4.5e6 789e10";
205
12.19
There's an extraneous double quote in what

should be "intervening whitespace:".
207-8
12.21
The technique of writing to a file may give the

wrong answer if the disk fills up.
[Thanks and $1 to Mark Brader]
The "hope that a future revision of the ANSI/ISO
C Standard will include" the snprintf function
has been fulfilled: C99 does specify it.
As a bonus, the C99 snprintf can be used to predict
the size required for an arbitrary sprintf call,
too -- it can be called with a null pointer
instead of a destination buffer (and 0 as the
size of that nonexistent buffer) and it returns
the number of characters it would have written.
212
12.28
The answer is in the wrong font.
213
12.30
Updating (overwriting) a text file in-place is

not fully portable; the C Standard leaves it
implementation-defined whether a write to a
text file truncates it at that point.
[Thanks and $1 to Tanmoy Bhattacharya]
224
13.4
"upper- or lowercase" should probably be

"upper or lower case".
225
13.6
Since the fragment calls printf, it must

#include <stdio.h>.
226
13.6
[last code fragment] A declaration and initialization

char string[] = "this\thas\t\tmissing\tfield";
similar to the one on p. 225 should appear.
[Thanks and $1 to Doug Liu]
227
13.6
Also, since the input string is modified,

it must be writable; see question 1.32.
234
13.14
"time_ts" should perhaps be "time_t's"
240
13.17
The code
srand((unsigned int)time((time_t *)NULL));
though popular and generally effective is, alas,
not strictly conforming. It's theoretically
possible that time_t could be defined as a
floating-point type under some implementation,
and that the time_t values returned by time()
could therefore exceed the range of an unsigned
int in such a way that a well-defined cast to
(unsigned int) is not possible.
242-3
13.20
The attributions listed for methods 2 and 3 are

scrambled. Method 2 is the one described in
the 1958 Box and Muller paper (as well as by
Abramowitz and Stegun, apparently). Method 3
is originally due to Marsaglia.
244
13.21
If you're not familiar with the notation [0, 1),

it means that drand48() returns a number x
such that 0 <= x and x < 1.

250
14.5
The suggested expression should read

fabs(a - b) <= epsilon * fabs(a)
It performs poorly if a == 0.0 (which is another
argument in favor of "mak[ing] the threshold
a function of b, or of both a and b").
253
14.8
Of course, you can always compute pi using

4*atan(1.0) or acos(-1.0).
[Thanks to James Stern and Clinton Sheppard]
253
14.9
C99 specifies isnan() and several other

classification routines.
254-5
14.11
C99 supports complex as a standard type.
260-1
15.4
The first argument to vstrcat() could be const char *,

as could the fmt argument to miniprintf().
264
15.5
The fmt argument to error() could be const char *.
269-71
15.12
The fmt arguments to faterror(), verror(), and

error() could all be const char *.
274
16.4
[point 2] The problem could be caused by a setbuf

or setvbuf buffer local to any function.
276
16.7
Variable "s" isn't declared. It's pretty obvious

what it should be, but to make it explicit, change
the struct declaration to
struct mystruct { ... } s;
[Thanks to Peter Hryczanek]
287
18.1
The URL in the list of metrics tools is really

"http://www.qucis.queensu.ca:1999/SoftwareEngineering/Cmetrics.html".
294
18.13
The conventional spelling is "NetBSD".

[Thanks and $1 to Peter Seebach]
294
18.14
Extra space in site which should be "sunsite.unc.edu".
296
18.16
Extra space in address which should be

"archie@archie.cs.mcgill.ca".
308
19.11
Note that a test using fopen() *is* approximate;

failure does not necessarily indicate nonexistence.
310
19.14
Updating (overwriting) a text file in-place is

not fully portable; the C Standard leaves it
implementation-defined whether a write to a
text file truncates it at that point.
314
19.23
In C99, the guarantee on the possible size of a

single object has been raised to 64K.
315
19.25
Use of the `volatile' qualifier is often

appropriate when performing memory-mapped I/O.
[Thanks to Lee Crawford]
317
19.27
The return value of system() is not guaranteed

to be the command's exit status.
[Thanks and $1 to Peter Seebach]
318
19.30
If you forget to call pclose, it's probably at

least as likely that you'll run out of file
descriptors as processes.
[Thanks and $1 to Jens Schweikhardt]
319
19.31
argv[0] may also be a null pointer.

324
19.42
"control characters, such as" should be

"control characters such as"
339-40
342-44
The page break makes the code very hard to follow.

20.13
The tone of this question's answer can be read as

suggesting that efficiency isn't important at all.
That's not the case, of course -- efficiency can
very important, and poorly-written programs can
run abysmally inefficiently.
The point is that there are good ways and bad
ways of achieving an appropriate level of
performance for a given program, and that (for
example) picking a good algorithm tends to make a

much bigger difference than does microoptimizing
the coding details of a lesser algorithm.
346
20.17
Missing tab in line which should be

#define CODE_NONE
350
20.21
The overbars are misaligned.
355
20.29
"and computes that number" should either be

"computed" or "and is computed".
363
[aggregate] Unions are not aggregates.

[Thanks and $1 to Kinichi Kitano]
368
[parameter] Extraneous semicolon at end of

line which should be
f(int i)
370-1
The glossary entry for "undefined" is misplaced.

376
The two minus signs in the index entry for

"-- operator" overlap and appear to be one.
379
The pairs of underscores in the index entry for

"__FILE__ macro" overlap and might appear to be one.
382
The pairs of underscores in the index entry for

"__LINE__ macro" overlap and might appear to be one.
back cover
"on the Usenet/Internet on the C FAQ" is muddled

and should say something else.
"com.lang.c" should be "comp.lang.c".
The ftp address for source code should be
ftp://ftp.aw.com/cseng/authors/summit/cfaq .
more information about this book

on-line version of FAQ list
scs home page

This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book ``C
Programming FAQs: Frequently Asked Questions'' (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
This page is the top of an HTML version of the Usenet comp.lang.c Frequently Asked Questions (FAQ)
list. An FAQ list is a collection of questions commonly asked on Usenet, together with presumably
definitive answers, provided in an attempt to keep repeated questions on the newsgroup down to a low
background drone so that discussion can move on to more interesting matters. Since they distill
knowledge gleaned from many sources and answer questions which are demonstrably Frequent, FAQ
lists serve as useful references outside of their originating Usenet newsgroups. This list is, I dare to
claim, no exception, and the HTML version you're looking at now, as well as other versions referenced
just below, are intended to be useful to C programmers everywhere.
Several other versions of this FAQ list are available, including a book-length version published by
Addison-Wesley. (The book, though longer, also has a few more errors; I've prepared an errata list.) See
also question 20.40.
Like so many web pages, this is very much a ``work in progress.'' I would, of course, like it if it were
perfect, but it's been two years or so since I first started talking about putting this thing on the web, and if
I were to wait until all the glitches were worked out, you might never see it. Each page includes a ``mail
feedback'' button, so you can help me debug it. (At first, you don't have to worry about reporting minor
formatting hiccups; many of these result from lingering imperfections in the programs that generate these
pages, or from the fact that I have not exhaustively researched how various browsers implement the
HTML tags I'm using, or from the fact that I haven't gone the last yard in trying to rig up HTML that
looks good in spite of the fact that HTML doesn't have everything you need to make things look good.)
These pages are synchronized with the posted Usenet version and the Addison-Wesley book version.
Since not all questions appear in all versions, the question numbers are not always contiguous.
[Note to web authors, catalogers, and bookmarkers: the URL <http://www.eskimo.com/~scs/Cfaq/top.html> is the right way to link to these pages. All other URL's implementing this collection are
subject to change.]
You can browse these pages in at least three ways. The table of contents below is of the list's major
sections; these links lead to sub-lists of the questions for those sections. The `àll questions'' link leads to
a list of all the questions; each question is (obviously) linked to its answer. Finally, the ``read
http://www.eskimo.com/~scs/C-faq.top.html (1 of 3) [22/07/2003 5:10:23 PM]
sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com

3. Expressions
4. Pointers
5. Null Pointers
10. C Preprocessor
12. Stdio
14. Floating Point
17. Style
20. Miscellaneous
Bibliography
Acknowledgements
All Questions
Read Sequentially
Question 20.40
Question 20.40
Where can I get extra copies of this list? What about back issues?
An up-to-date copy may be obtained from aw.com in directory xxx or ftp.eskimo.com in directory
u/s/scs/C-faq/. You can also just pull it off the net; it is normally posted to comp.lang.c on the first of
each month, with an Expires: line which should keep it around all month. A parallel, abridged version is
available (and posted), as is a list of changes accompanying each significantly updated version.
The various versions of this list are also posted to the newsgroups comp.answers and news.answers .
Several sites archive news.answers postings and other FAQ lists, including this one; two sites are
rtfm.mit.edu (directories pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and ftp.uu.net
(directory usenet/news.answers/C-faq/). An archie server (see question 18.16) should help you find
others; ask it to ``find C-faq''. If you don't have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ
lists: send a message containing the single word help to mail-server@rtfm.mit.edu . See the meta-FAQ
list in news.answers for more information.
An extended version of this FAQ list is being published by Addison-Wesley as C Programming FAQs:
Frequently Asked Questions (ISBN 0-201-84519-9). It should be available in November 1995.
This list is an evolving document of questions which have been Frequent since before the Great
Renaming, not just a collection of this month's interesting questions. Older copies are obsolete and don't
contain much, except the occasional typo, that the current list doesn't.
This page by Steve Summit // Copyright 1995 // mail feedback
http://www.eskimo.com/~scs/C-faq/q20.40.html [22/07/2003 5:10:24 PM]
Question 18.16
Question 18.16
Where and how can I get copies of all these freely distributable programs?
As the number of available programs, the number of publicly accessible archive sites, and the number of
people trying to access them all grow, this question becomes both easier and more difficult to answer.
There are a number of large, public-spirited archive sites out there, such as ftp.uu.net, archive.umich.edu,
oak.oakland.edu, sumex-aim.stanford.edu, and wuarchive.wustl.edu, which have huge amounts of
software and other information all freely available. For the FSF's GNU project, the central distribution
site is prep.ai.mit.edu . These well-known sites tend to be extremely busy and hard to reach, but there are
also numerous ``mirror'' sites which try to spread the load around.
On the connected Internet, the traditional way to retrieve files from an archive site is with anonymous
ftp. For those without ftp access, there are also several ftp-by-mail servers in operation. More and more,
the world-wide web (WWW) is being used to announce, index, and even transfer large data files. There
are probably yet newer access methods, too.
Those are some of the easy parts of the question to answer. The hard part is in the details--this article
cannot begin to track or list all of the available archive sites or all of the various ways of accessing them.
If you have access to the net at all, you probably have access to more up-to-date information about active
sites and useful access methods than this FAQ list does.
The other easy-and-hard aspect of the question, of course, is simply finding which site has what you're
looking for. There is a tremendous amount of work going on in this area, and there are probably new
indexing services springing up every day. One of the first was `àrchie'': for any program or resource
available on the net, if you know its name, an archie server can usually tell you which anonymous ftp
sites have it. Your system may have an archie command, or you can send the mail message ``help'' to
archie@archie.cs.mcgill.ca for information.
If you have access to Usenet, see the regular postings in the comp.sources.unix and comp.sources.misc
newsgroups, which describe the archiving policies for those groups and how to access their archives. The
comp.archives newsgroup contains numerous announcements of anonymous ftp availability of various
items. Finally, the newsgroup comp.sources.wanted is generally a more appropriate place to post queries
for source availability, but check its FAQ list, ``How to find sources,'' before posting there.
See also question 14.12.
http://www.eskimo.com/~scs/C-faq/q18.16.html (1 of 2) [22/07/2003 5:10:26 PM]
Question 18.16
Question 14.12
Question 14.12
I'm looking for some code to do:
Fast Fourier Transforms (FFT's)
matrix arithmetic (multiplication, inversion, etc.)
complex arithmetic
Ajay Shah maintains an index of free numerical software; it is posted periodically, and available where
this FAQ list is archived (see question 20.40). See also question 18.16.
Question 14.11
Question 14.11
What's a good way to implement complex numbers in C?
It is straightforward to define a simple structure and some arithmetic functions to manipulate them. See
also questions 2.7, 2.10, and 14.12.
Question 2.7
Question 2.7
I heard that structures could be assigned to variables and passed to and from functions, but K&R1 says
not.
What K&R1 said was that the restrictions on structure operations would be lifted in a forthcoming
version of the compiler, and in fact structure assignment and passing were fully functional in Ritchie's
compiler even as K&R1 was being published. Although a few early C compilers lacked these operations,
all modern compilers support them, and they are part of the ANSI C standard, so there should be no
reluctance to use them. [footnote]
(Note that when a structure is assigned, passed, or returned, the copying is done monolithically; anything
pointed to by any pointer fields is not copied.)
References: K&R1 Sec. 6.2 p. 121
K&R2 Sec. 6.2 p. 129
ANSI Sec. 3.1.2.5, Sec. 3.2.2.1, Sec. 3.3.16
ISO Sec. 6.1.2.5, Sec. 6.2.2.1, Sec. 6.3.16
H&S Sec. 5.6.2 p. 133
Footnote 1
However, passing large structures to and from functions can be expensive (see question 2.9), so you may
want to consider using pointers, instead (as long as you don't need pass-by-value semantics, of course).
back
http://www.eskimo.com/~scs/C-faq/fn1.html [22/07/2003 5:10:32 PM]
Question 2.9
Question 2.9
How are structure passing and returning implemented?
When structures are passed as arguments to functions, the entire structure is typically pushed on the
stack, using as many words as are required. (Programmers often choose to use pointers to structures
instead, precisely to avoid this overhead.) Some compilers merely pass a pointer to the structure, though
they may have to make a local copy to preserve pass-by-value semantics.
Structures are often returned from functions in a location pointed to by an extra, compiler-supplied
``hidden'' argument to the function. Some older compilers used a special, static location for structure
returns, although this made structure-valued functions non-reentrant, which ANSI C disallows.
References: ANSI Sec. 2.2.3
ISO Sec. 5.2.3
Question 2.8
Question 2.8
Why can't you compare structures?
There is no single, good way for a compiler to implement structure comparison which is consistent with
C's low-level flavor. A simple byte-by-byte comparison could founder on random bits present in unused
``holes'' in the structure (such padding is used to keep the alignment of later fields correct; see question
2.12). A field-by-field comparison might require unacceptable amounts of repetitive code for large
structures.
If you need to compare two structures, you'll have to write your own function to do so, field by field.
ANSI Sec. 4.11.4.1 footnote 136
Rationale Sec. 3.3.9
H&S Sec. 5.6.2 p. 133
Question 2.12
Question 2.12
My compiler is leaving holes in structures, which is wasting space and preventing ``binary'' I/O to
external data files. Can I turn off the padding, or otherwise control the alignment of structure fields?
Your compiler may provide an extension to give you this control (perhaps a #pragma; see question
11.20), but there is no standard method.
H&S Sec. 5.6.4 p. 135
Question 11.20
Question 11.20
What are #pragmas and what are they good for?
The #pragma directive provides a single, well-defined `èscape hatch'' which can be used for all sorts of
implementation-specific controls and extensions: source listing control, structure packing, warning
suppression (like lint's old /* NOTREACHED */ comments), etc.
ISO Sec. 6.8.6
H&S Sec. 3.7 p. 61
Question 11.19
Question 11.19
I'm getting strange syntax errors inside lines I've #ifdeffed out.
Under ANSI C, the text inside a ``turned off'' #if, #ifdef, or #ifndef must still consist of ``valid
preprocessing tokens.'' This means that there must be no newlines inside quotes, and no unterminated
comments or quotes (note particularly that an apostrophe within a contracted word looks like the
beginning of a character constant). Therefore, natural-language comments and pseudocode should always
be written between the `òfficial'' comment delimiters /* and */. (But see question 20.20, and also
10.25.)
References: ANSI Sec. 2.1.1.2, Sec. 3.1
ISO Sec. 5.1.1.2, Sec. 6.1
H&S Sec. 3.2 p. 40
Question 20.20
Question 20.20
Why don't C comments nest? How am I supposed to comment out code containing comments? Are
comments legal inside quoted strings?
C comments don't nest mostly because PL/I's comments, which C's are borrowed from, don't either.
Therefore, it is usually better to ``comment out'' large sections of code, which might contain comments,
with #ifdef or #if 0 (but see question 11.19).
The character sequences /* and */ are not special within double-quoted strings, and do not therefore
introduce comments, because a program (particularly one which is generating C code as output) might
want to print them.
Note also that // comments, as in C++, are not currently legal in C, so it's not a good idea to use them in
C programs (even if your compiler supports them as an extension).
References: K&R1 Sec. A2.1 p. 179
K&R2 Sec. A2.2 p. 192
ANSI Sec. 3.1.9 (esp. footnote 26), Appendix E
ISO Sec. 6.1.9, Annex F
H&S Sec. 2.2 pp. 18-9
PCS Sec. 10 p. 130
Question 20.19
Question 20.19
Are the outer parentheses in return statements really optional?
Yes.
Long ago, in the early days of C, they were required, and just enough people learned C then, and wrote
code which is still in circulation, that the notion that they might still be required is widespread.
(As it happens, parentheses are optional with the sizeof operator, too, as long as its operand is a
variable or a unary expression.)
ANSI Sec. 3.3.3, Sec. 3.6.6
ISO Sec. 6.3.3, Sec. 6.6.6
H&S Sec. 8.9 p. 254
Question 20.18
Question 20.18
Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)?
No. The switch statement was originally designed to be quite simple for the compiler to translate,
therefore case labels are limited to single, constant, integral expressions. You can attach several case
labels to the same statement, which will let you cover a small range if you don't mind listing all cases
explicitly.
If you want to select on arbitrary ranges or non-constant expressions, you'll have to use an if/else chain.
See also questions question 20.17.
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
Rationale Sec. 3.6.4.2
H&S Sec. 8.7 p. 248
Question 20.17
Question 20.17
Is there a way to switch on strings?
Not directly. Sometimes, it's appropriate to use a separate function to map strings to integer codes, and
then switch on those. Otherwise, of course, you can fall back on strcmp and a conventional if/else
chain. See also questions 10.12, 20.18, and 20.29.
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
H&S Sec. 8.7 p. 248
Question 10.12
Question 10.12
How can I construct preprocessor #if expressions which compare strings?
You can't do it directly; preprocessor #if arithmetic uses only integers. You can #define several
manifest constants, however, and implement conditionals on those.
References: K&R2 Sec. 4.11.3 p. 91
ANSI Sec. 3.8.1
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 10.11
Question 10.11
I seem to be missing the system header file <sgtty.h>. Can someone send me a copy?
Standard headers exist in part so that definitions appropriate to your compiler, operating system, and
processor can be supplied. You cannot just pick up a copy of someone else's header file and expect it to
work, unless that person is using exactly the same environment. Ask your compiler vendor why the file
was not provided (or to send a replacement copy).
Question 10.9
Question 10.9
I'm getting strange syntax errors on the very first declaration in a file, but it looks fine.
Perhaps there's a missing semicolon at the end of the last declaration in the last header file you're
#including. See also questions 2.18 and 11.29.
Question 2.18
Question 2.18
This program works correctly, but it dumps core after it finishes. Why?
struct list {
char *item;
struct list *next;
}
/* Here is the main program. */
main(argc, argv)
{ ... }
A missing semicolon causes main to be declared as returning a structure. (The connection is hard to see
because of the intervening comment.) Since structure-valued functions are usually implemented by
adding a hidden return pointer (see question 2.9), the generated code for main() tries to accept three
arguments, although only two are passed (in this case, by the C start-up code). See also questions 10.9
and 16.4.
References: CT&P Sec. 2.3 pp. 21-2
Question 11.21
Question 11.21
What does ``#pragma once'' mean? I found it in some header files.
It is an extension implemented by some preprocessors to help make header files idempotent; it is

essentially equivalent to the #ifndef trick mentioned in question 10.7.
Question 10.7
Question 10.7
Is it acceptable for one header file to #include another?
It's a question of style, and thus receives considerable debate. Many people believe that ``nested
#include files'' are to be avoided: the prestigious Indian Hill Style Guide (see question 17.9)
disparages them; they can make it harder to find relevant definitions; they can lead to multiple-definition
errors if a file is #included twice; and they make manual Makefile maintenance very difficult. On the
other hand, they make it possible to use header files in a modular way (a header file can #include
what it needs itself, rather than requiring each #includer to do so); a tool like grep (or a tags file)
makes it easy to find definitions no matter where they are; a popular trick along the lines of:
#ifndef HFILENAME_USED
#define HFILENAME_USED
...header file contents...
#endif
(where a different bracketing macro name is used for each header file) makes a header file `ìdempotent''
so that it can safely be #included multiple times; and automated Makefile maintenance tools (which
are a virtual necessity in large projects anyway; see question 18.1) handle dependency generation in the
face of nested #include files easily. See also question 17.10.
References: Rationale Sec. 4.1.2
Question 17.9
Question 17.9
Where can I get the `Ìndian Hill Style Guide'' and other coding standards?
Various documents are available for anonymous ftp from:

Site:
File or directory:
cs.washington.edu
pub/cstyle.tar.Z
(the updated Indian Hill guide)
ftp.cs.toronto.edu
ftp.cs.umd.edu
doc/programming
(including Henry Spencer's
``10 Commandments for C Programmers'')
pub/style-guide
You may also be interested in the books The Elements of Programming Style, Plum Hall Programming
Guidelines, and C Style: Standards and Guidelines; see the Bibliography. (The Standards and Guidelines
book is not in fact a style guide, but a set of guidelines on selecting and creating style guides.)
Question 18.9
Question 18.9
Are there any C tutorials or other resources on the net?
There are several of them:

``Notes for C programmers,'' by Christopher Sawtell, are available from ftp.funet.fi in
pub/languages/C/tutorials/sawtell_C.tar.gz.
Tim Love's ``C for Programmers'' is at
http://www.eng.cam.ac.uk/help/tpl/languages/C/teaching_C/teaching_C.html .
The Coronado Enterprises C tutorials are available on Simtel mirrors in pub/msdos/c/ or on the web at
http://www.swcp.com/~dodrill/controlled/cdoc/cmain.html.
Rick Rowe has a tutorial which is available from ftp.netcom.com as pub/rowe/tutorde.zip or
ftp.wustl.edu as pub/MSDOS_UPLOADS/programming/c_language/ctutorde.zip .
There is evidently a web-based course at http://www.strath.ac.uk/IT/Docs/Ccourse/ccourse.html .
Finally, on some Unix machines you can try typing learn c at the shell prompt.
[Disclaimer: I have not reviewed these tutorials; I have heard that at least one of them contains a number
of errors. Also, this sort of information rapidly becomes out-of-date; these addresses may not work by the
time you read this and try them.]
Several of these tutorials, plus a great deal of other information about C, are accessible via the web at
http://www.lysator.liu.se/c/index.html .
Vinit Carpenter maintains a list of resources for learning C and C++; it is posted to comp.lang.c and
comp.lang.c++, and archived where this FAQ list is (see question 20.40), or on the web at
http://www.cyberdiem.com/vin/learn.html .

Question 18.9
Question 18.10
Question 18.10
What's a good book for learning C?
There are far too many books on C to list here; it's impossible to rate them all. Many people believe that
the best one was also the first: The C Programming Language, by Kernighan and Ritchie (``K&R,'' now
in its second edition). Opinions vary on K&R's suitability as an initial programming text: many of us did
learn C from it, and learned it well; some, however, feel that it is a bit too clinical as a first tutorial for
those without much programming background.
An excellent reference manual is C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele,
now in its fourth edition.
Though not suitable for learning C from scratch, this FAQ list has been published in book form; see the
Bibliography.
Mitch Wright maintains an annotated bibliography of C and Unix books; it is available for anonymous
ftp from ftp.rahul.net in directory pub/mitch/YABL/.
This FAQ list's editor maintains a collection of previous answers to this question, which is available upon
request. See also question 18.9.
Question 18.13
Question 18.13
Where can I find the sources of the standard C libraries?
One source (though not public domain) is The Standard C Library, by P.J. Plauger (see the
Bibliography). Implementations of all or part of the C library have been written and are readily available
as part of the netBSD and GNU (also Linux) projects. See also question 18.16.
Question 18.14
Question 18.14
I need code to parse and evaluate expressions.
Two available packages are ``defunc,'' posted to comp.sources.misc in December, 1993 (V41 i32,33), to
alt.sources in January, 1994, and available from sunsite.unc.edu in
pub/packages/development/libraries/defunc-1.3.tar.Z, and ``parse,'' at lamont.ldgo.columbia.edu. Other
options include the S-Lang interpreter, available via anonymous ftp from amy.tch.harvard.edu in
pub/slang, and the shareware Cmm (``C-minus-minus'' or ``C minus the hard stuff''). See also question
18.16.
There is also some parsing/evaluation code in Software Solutions in C (chapter 12, pp. 235-55).
Question 18.15
Question 18.15
Where can I get a BNF or YACC grammar for C?
The definitive grammar is of course the one in the ANSI standard; see question 11.2. Another grammar
(along with one for C++) by Jim Roskind is in pub/c++grammar1.1.tar.Z at ics.uci.edu . A fleshed-out,
working instance of the ANSI grammar (due to Jeff Lee) is on ftp.uu.net (see question 18.16) in
usenet/net.sources/ansi.c.grammar.Z (including a companion lexer). The FSF's GNU C compiler contains
a grammar, as does the appendix to K&R2.
The comp.compilers archives contain more information about grammars; see question 18.3.
References: K&R1 Sec. A18 pp. 214-219
K&R2 Sec. A13 pp. 234-239
ANSI Sec. A.2
ISO Sec. B.2
H&S pp. 423-435 Appendix B
Question 11.2
Question 11.2
How can I get a copy of the Standard?
[Late-breaking news: I've been told that copies of the new C99 can be obtained directly from
www.ansi.org; the price for an electronic document is only US $18.00.]
Copies are available in the United States from
American National Standards Institute
11 W. 42nd St., 13th floor
New York, NY 10036 USA
(+1) 212 642 4900
and
Global Engineering Documents
15 Inverness Way E
Englewood, CO 80112 USA
(+1) 303 397 2715
(800) 854 7179 (U.S. & Canada)
In other countries, contact the appropriate national standards body, or ISO in Geneva at:
ISO Sales
Case Postale 56
CH-1211 Geneve 20
Switzerland
(or see URL http://www.iso.ch or check the comp.std.internat FAQ list, Standards.Faq).
At the time of this writing, the cost is $130.00 from ANSI or $410.00 from Global. Copies of the original
X3.159 (including the Rationale) may still be available at $205.00 from ANSI or $162.50 from Global.
Note that ANSI derives revenues to support its operations from the sale of printed standards, so
electronic copies are not available.
In the U.S., it may be possible to get a copy of the original ANSI X3.159 (including the Rationale) as
``FIPS PUB 160'' from
Question 11.2
National Technical Information Service (NTIS)

U.S. Department of Commerce
Springfield, VA 22161
703 487 4650
The mistitled Annotated ANSI C Standard, with annotations by Herbert Schildt, contains most of the text
of ISO 9899; it is published by Osborne/McGraw-Hill, ISBN 0-07-881952-0, and sells in the U.S. for
approximately $40. It has been suggested that the price differential between this work and the official
standard reflects the value of the annotations: they are plagued by numerous errors and omissions, and a
few pages of the Standard itself are missing. Many people on the net recommend ignoring the
annotations entirely. A review of the annotations (`ànnotated annotations'') by Clive Feather can be
found on the web at http://www.lysator.liu.se/c/schildt.html .
The text of the Rationale (not the full Standard) can be obtained by anonymous ftp from ftp.uu.net (see
question 18.16) in directory doc/standards/ansi/X3.159-1989, and is also available on the web at
http://www.lysator.liu.se/c/rat/title.html . The Rationale has also been printed by Silicon Press, ISBN 0929306-07-4.
Question 11.1
Question 11.1
What is the `ÀNSI C Standard?''
In 1983, the American National Standards Institute (ANSI) commissioned a committee, X3J11, to
standardize the C language. After a long, arduous process, including several widespread public reviews,
the committee's work was finally ratified as ANS X3.159-1989 on December 14, 1989, and published in
the spring of 1990. For the most part, ANSI C standardizes existing practice, with a few additions from
C++ (most notably function prototypes) and support for multinational character sets (including the
controversial trigraph sequences). The ANSI C standard also formalizes the C run-time library support
routines.
More recently, the Standard has been adopted as an international standard, ISO/IEC 9899:1990, and this
ISO Standard replaces the earlier X3.159 even within the United States. Its sections are numbered
differently (briefly, ISO sections 5 through 7 correspond roughly to the old ANSI sections 2 through 4).
As an ISO Standard, it is subject to ongoing revision through the release of Technical Corrigenda and
Normative Addenda.
In 1994, Technical Corrigendum 1 amended the Standard in about 40 places, most of them minor
corrections or clarifications. More recently, Normative Addendum 1 added about 50 pages of new
material, mostly specifying new library functions for internationalization. The production of Technical
Corrigenda is an ongoing process, and a second one is expected in late 1995. In addition, both ANSI and
ISO require periodic review of their standards. This process is beginning in 1995, and will likely result in
a completely revised standard (nicknamed ``C9X'' on the assumption of completion by 1999).
The original ANSI Standard included a ``Rationale,'' explaining many of its decisions, and discussing a
number of subtle points, including several of those covered here. (The Rationale was ``not part of ANSI
Standard X3.159-1989, but... included for information only,'' and is not included with the ISO Standard.)
Question 10.26
Question 10.26
How can I write a macro which takes a variable number of arguments?
One popular trick is to define and invoke the macro with a single, parenthesized `àrgument'' which in the
macro expansion becomes the entire argument list, parentheses and all, for a function such as printf:
#define DEBUG(args) (printf("DEBUG: "), printf args)
if(n != 0) DEBUG(("n is %d\n", n));
The obvious disadvantage is that the caller must always remember to use the extra parentheses.
gcc has an extension which allows a function-like macro to accept a variable number of arguments, but
it's not standard. Other possible solutions are to use different macros (DEBUG1, DEBUG2, etc.)
depending on the number of arguments, to play games with commas:
#define DEBUG(args) (printf("DEBUG: "), printf(args))
#define _ ,
DEBUG("i = %d" _ i)
It is often better to use a bona-fide function, which can take a variable number of arguments in a welldefined way. See questions 15.4 and 15.5.
Question 15.4
Question 15.4
How can I write a function that takes a variable number of arguments?
Use the facilities of the <stdarg.h> header.

Here is a function which concatenates an arbitrary number of strings into malloc'ed memory:
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
/* for malloc, NULL, size_t */

/* for va_ stuff */
/* for strcat et al. */
char *vstrcat(char *first, ...)

{
size_t len;
char *retbuf;
va_list argp;
char *p;
if(first == NULL)
return NULL;
len = strlen(first);
va_start(argp, first);
while((p = va_arg(argp, char *)) != NULL)
len += strlen(p);
va_end(argp);
retbuf = malloc(len + 1);
/* +1 for trailing \0 */
if(retbuf == NULL)
return NULL;
/* error */
(void)strcpy(retbuf, first);
va_start(argp, first);
/* restart for second scan */
while((p = va_arg(argp, char *)) != NULL)

(void)strcat(retbuf, p);
Question 15.4
va_end(argp);
return retbuf;
}
Usage is something like
char *str = vstrcat("Hello, ", "world!", (char *)NULL);
Note the cast on the last argument; see questions 5.2 and 15.3. (Also note that the caller must free the
returned, malloc'ed storage.)
References: K&R2 Sec. 7.3 p. 155, Sec. B7 p. 254
ANSI Sec. 4.8
ISO Sec. 7.8
Rationale Sec. 4.8
H&S Sec. 11.4 pp. 296-9
CT&P Sec. A.3 pp. 139-141
PCS Sec. 11 pp. 184-5, Sec. 13 p. 242
Question 5.2
Question 5.2
How do I get a null pointer in my programs?
According to the language definition, a constant 0 in a pointer context is converted into a null pointer at
compile time. That is, in an initialization, assignment, or comparison when one side is a variable or
expression of pointer type, the compiler can tell that a constant 0 on the other side requests a null pointer,
and generate the correctly-typed null pointer value. Therefore, the following fragments are perfectly
legal:
char *p = 0;
if(p != 0)
(See also question 5.3.)
However, an argument being passed to a function is not necessarily recognizable as a pointer context,
and the compiler may not be able to tell that an unadorned 0 ``means'' a null pointer. To generate a null
pointer in a function call context, an explicit cast may be required, to force the 0 to be recognized as a
pointer. For example, the Unix system call execl takes a variable-length, null-pointer-terminated list of
character pointer arguments, and is correctly called like this:
execl("/bin/sh", "sh", "-c", "date", (char *)0);
If the (char *) cast on the last argument were omitted, the compiler would not know to pass a null
pointer, and would pass an integer 0 instead. (Note that many Unix manuals get this example wrong .)
When function prototypes are in scope, argument passing becomes an `àssignment context,'' and most
casts may safely be omitted, since the prototype tells the compiler that a pointer is required, and of which
type, enabling it to correctly convert an unadorned 0. Function prototypes cannot provide the types for
variable arguments in variable-length argument lists however, so explicit casts are still required for those
arguments. (See also question 15.3.) It is safest to properly cast all null pointer constants in function
calls: to guard against varargs functions or those without prototypes, to allow interim use of non-ANSI
compilers, and to demonstrate that you know what you are doing. (Incidentally, it's also a simpler rule to
remember.)
Summary:
Unadorned 0 okay:
Explicit cast required:
Question 5.2
initialization
function call,
no prototype in scope
assignment
comparison
variable argument in
varargs function call
function call,
prototype in scope,
fixed argument
References: K&R1 Sec. A7.7 p. 190, Sec. A7.14 p. 192
K&R2 Sec. A7.10 p. 207, Sec. A7.17 p. 209
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
H&S Sec. 4.6.3 p. 95, Sec. 6.2.7 p. 171
Question 5.3
Question 5.3
Is the abbreviated pointer comparison `ìf(p)'' to test for non-null pointers valid? What if the internal
representation for null pointers is nonzero?
When C requires the Boolean value of an expression (in the if, while, for, and do statements, and
with the &&, ||, !, and ?: operators), a false value is inferred when the expression compares equal to
zero, and a true value otherwise. That is, whenever one writes
if(expr)
where `èxpr'' is any expression at all, the compiler essentially acts as if it had been written as
if((expr) != 0)
Substituting the trivial pointer expression ``p'' for `èxpr,'' we have
if(p)
is equivalent to
if(p != 0)
and this is a comparison context, so the compiler can tell that the (implicit) 0 is actually a null pointer
constant, and use the correct null pointer value. There is no trickery involved here; compilers do work
this way, and generate identical code for both constructs. The internal representation of a null pointer
does not matter.
The boolean negation operator, !, can be described as follows:
!expr
is essentially equivalent to
or to
((expr) == 0)
(expr)?0:1
which leads to the conclusion that

if(!p)
is equivalent to
if(p == 0)
`Àbbreviations'' such as if(p), though perfectly legal, are considered by some to be bad style (and by
others to be good style; see question 17.10).
References: K&R2 Sec. A7.4.7 p. 204
Question 5.3
ANSI Sec. 3.3.3.3, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 5.3.2 p. 122
Question 17.10
Question 17.10
Some people say that goto's are evil and that I should never use them. Isn't that a bit extreme?
Programming style, like writing style, is somewhat of an art and cannot be codified by inflexible rules,
although discussions about style often seem to center exclusively around such rules.
In the case of the goto statement, it has long been observed that unfettered use of goto's quickly leads
to unmaintainable spaghetti code. However, a simple, unthinking ban on the goto statement does not
necessarily lead immediately to beautiful programming: an unstructured programmer is just as capable of
constructing a Byzantine tangle without using any goto's (perhaps substituting oddly-nested loops and
Boolean control variables, instead).
Most observations or ``rules'' about programming style usually work better as guidelines than rules, and
work much better if programmers understand what the guidelines are trying to accomplish. Blindly
avoiding certain constructs or following rules without understanding them can lead to just as many
problems as the rules were supposed to avert.
Furthermore, many opinions on programming style are just that: opinions. It's usually futile to get
dragged into ``style wars,'' because on certain issues (such as those referred to in questions 9.2, 5.3, 5.9,
and 10.7), opponents can never seem to agree, or agree to disagree, or stop arguing.
Question 9.2
Question 9.2
Isn't #defining TRUE to be 1 dangerous, since any nonzero value is considered ``true'' in C? What if a
built-in logical or relational operator ``returns'' something other than 1?
It is true (sic) that any nonzero value is considered true in C, but this applies only `òn input'', i.e. where a
Boolean value is expected. When a Boolean value is generated by a built-in operator, it is guaranteed to
be 1 or 0. Therefore, the test
if((a == b) == TRUE)
would work as expected (as long as TRUE is 1), but it is obviously silly. In general, explicit tests against
TRUE and FALSE are inappropriate, because some library functions (notably isupper, isalpha, etc.)
return, on success, a nonzero value which is not necessarily 1. (Besides, if you believe that `ìf((a ==
b) == TRUE)'' is an improvement over `ìf(a == b)'', why stop there? Why not use `ìf(((a ==
b) == TRUE) == TRUE)''?) A good rule of thumb is to use TRUE and FALSE (or the like) only for
assignment to a Boolean variable or function parameter, or as the return value from a Boolean function,
but never in a comparison.
The preprocessor macros TRUE and FALSE (and, of course, NULL) are used for code readability, not
because the underlying values might ever change. (See also questions 5.3 and 5.10.)
On the other hand, Boolean values and definitions can evidently be confusing, and some programmers
feel that TRUE and FALSE macros only compound the confusion. (See also question 5.9.)
References: K&R1 Sec. 2.6 p. 39, Sec. 2.7 p. 41
K&R2 Sec. 2.6 p. 42, Sec. 2.7 p. 44, Sec. A7.4.7 p. 204, Sec. A7.9 p. 206
ANSI Sec. 3.3.3.3, Sec. 3.3.8, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.8, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 7.5.4 pp. 196-7, Sec. 7.6.4 pp. 207-8, Sec. 7.6.5 pp. 208-9, Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 2189, Sec. 8.5 pp. 238-9, Sec. 8.6 pp. 241-4
``What the Tortoise Said to Achilles''
Question 9.2
Question 5.10
Question 5.10
But wouldn't it be better to use NULL (rather than 0), in case the value of NULL changes, perhaps on a
machine with nonzero internal null pointers?
No. (Using NULL may be preferable, but not for this reason.) Although symbolic constants are often used
in place of numbers because the numbers might change, this is not the reason that NULL is used in place
of 0. Once again, the language guarantees that source-code 0's (in pointer contexts) generate null
pointers. NULL is used only as a stylistic convention. See questions 5.5 and 9.2.
Question 5.5
Question 5.5
How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?
The same as on any other machine: as 0 (or ((void *)0)).

Whenever a programmer requests a null pointer, either by writing ``0'' or ``NULL,'' it is the compiler's
responsibility to generate whatever bit pattern the machine uses for that null pointer. Therefore, #defining
NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other: the
compiler must always be able to generate the machine's correct null pointers in response to unadorned 0's
seen in pointer contexts. See also questions 5.2, 5.10, and 5.17.
ISO Sec. 7.1.6
Question 15.3
Question 15.3
I had a frustrating problem which turned out to be caused by the line
printf("%d", n);
where n was actually a long int. I thought that ANSI function prototypes were supposed to guard
against argument type mismatches like this.
When a function accepts a variable number of arguments, its prototype does not (and cannot) provide any
information about the number and types of those variable arguments. Therefore, the usual protections do
not apply in the variable-length part of variable-length argument lists: the compiler cannot perform
implicit conversions or (in general) warn about mismatches.
See also questions 5.2, 11.3, 12.9, and 15.2.
Question 11.3
Question 11.3
My ANSI compiler complains about a mismatch when it sees
extern int func(float);
int func(x)
float x;
{ ...
You have mixed the new-style prototype declaration `èxtern int func(float);'' with the oldstyle definition `ìnt func(x) float x;''. It is usually safe to mix the two styles (see question
11.4), but not in this case.
Old C (and ANSI C, in the absence of prototypes, and in variable-length argument lists; see question
15.2) ``widens'' certain arguments when they are passed to functions. floats are promoted to double,
and characters and short integers are promoted to int. (For old-style function definitions, the values are
automatically converted back to the corresponding narrower types within the body of the called function,
if they are declared that way there.)
This problem can be fixed either by using new-style syntax consistently in the definition:
int func(float x) { ... }
or by changing the new-style prototype declaration to match the old-style definition:
extern int func(double);
(In this case, it would be clearest to change the old-style definition to use double as well, as long as the
address of that parameter is not taken.)
It may also be safer to avoid ``narrow'' (char, short int, and float) function arguments and return
types altogether.
K&R2 Sec. A7.3.2 p. 202
ANSI Sec. 3.3.2.2, Sec. 3.5.4.3
Question 11.3
ISO Sec. 6.3.2.2, Sec. 6.5.4.3

Rationale Sec. 3.3.2.2, Sec. 3.5.4.3
H&S Sec. 9.2 pp. 265-7, Sec. 9.4 pp. 272-3
Question 11.4
Question 11.4
Can you mix old-style and new-style function syntax?
Doing so is perfectly legal, as long as you're careful (see especially question 11.3). Note however that oldstyle syntax is marked as obsolescent, so official support for it may be removed some day.
References: ANSI Sec. 3.7.1, Sec. 3.9.5
ISO Sec. 6.7.1, Sec. 6.9.5
H&S Sec. 9.2.2 pp. 265-7, Sec. 9.2.5 pp. 269-70
Question 11.5
Question 11.5
Why does the declaration
extern f(struct x *p);
give me an obscure warning message about ``struct x introduced in prototype scope''?
In a quirk of C's normal block scoping rules, a structure declared (or even mentioned) for the first time
within a prototype cannot be compatible with other structures declared in the same source file (it goes out
of scope at the end of the prototype).
To resolve the problem, precede the prototype with the vacuous-looking declaration
struct x;
which places an (incomplete) declaration of struct x at file scope, so that all following declarations
involving struct x can at least be sure they're referring to the same struct x.
References: ANSI Sec. 3.1.2.1, Sec. 3.1.2.6, Sec. 3.5.2.3
ISO Sec. 6.1.2.1, Sec. 6.1.2.6, Sec. 6.5.2.3
Question 11.8
Question 11.8
I don't understand why I can't use const values in initializers and array dimensions, as in
const int n = 5;
int a[n];
The const qualifier really means ``read-only;'' an object so qualified is a run-time object which cannot
(normally) be assigned to. The value of a const-qualified object is therefore not a constant expression
in the full sense of the term. (C is unlike C++ in this regard.) When you need a true compile-time
constant, use a preprocessor #define.
References: ANSI Sec. 3.4
ISO Sec. 6.4
H&S Secs. 7.11.2,7.11.3 pp. 226-7
Question 11.9
Question 11.9
What's the difference between const char *p and char * const p?
const char *p declares a pointer to a constant character (you can't change the character); char *
const p declares a constant pointer to a (variable) character (i.e. you can't change the pointer).
Read these `ìnside out'' to understand them; see also question 1.21.
References: ANSI Sec. 3.5.4.1 examples
ISO Sec. 6.5.4.1
H&S Sec. 4.4.4 p. 81
Question 1.21
Question 1.21
How do I declare an array of N pointers to functions returning pointers to functions returning pointers to
characters?
The first part of this question can be answered in at least three ways:
1. char *(*(*a[N])())();
2. Build the declaration up incrementally, using typedefs:
typedef char *pc;
typedef pc fpc();
typedef fpc *pfpc;
typedef pfpc fpfpc();
typedef fpfpc *pfpfpc;
pfpfpc a[N];
/*
/*
/*
/*
/*
/*
pointer to char */
function returning pointer to char */
pointer to above */
function returning... */
pointer to... */
array of... */
3. Use the cdecl program, which turns English into C and vice versa:
cdecl> declare a as array of pointer to function returning
pointer to function returning pointer to char
char *(*(*a[])())()
cdecl can also explain complicated declarations, help with casts, and indicate which set of parentheses
the arguments go in (for complicated function definitions, like the one above). Versions of cdecl are in
volume 14 of comp.sources.unix (see question 18.16) and K&R2.
Any good book on C should explain how to read these complicated C declarations `ìnside out'' to understand them
(``declaration mimics use'').
The pointer-to-function declarations in the examples above have not included parameter type information. When
the parameters have complicated types, declarations can really get messy. (Modern versions of cdecl can help
here, too.)
ANSI Sec. 3.5ff (esp. Sec. 3.5.4)
ISO Sec. 6.5ff (esp. Sec. 6.5.4)
H&S Sec. 4.5 pp. 85-92, Sec. 5.10.1 pp. 149-50
Question 1.21
Question 1.14
Question 1.14
I can't seem to define a linked list successfully. I tried
typedef struct {
char *item;
NODEPTR next;
} *NODEPTR;
but the compiler gave me error messages. Can't a structure in C contain a pointer to itself?
Structures in C can certainly contain pointers to themselves; the discussion and example in section 6.5 of
K&R make this clear. The problem with the NODEPTR example is that the typedef has not been defined
at the point where the next field is declared. To fix this code, first give the structure a tag (``struct
node''). Then, declare the next field as a simple struct node *, or disentangle the typedef
declaration from the structure definition, or both. One corrected version would be
struct node {
char *item;
struct node *next;
};
typedef struct node *NODEPTR;
and there are at least three other equivalently correct ways of arranging it.
A similar problem, with a similar solution, can arise when attempting to declare a pair of typedef'ed
mutually referential structures.
K&R2 Sec. 6.5 p. 139
ANSI Sec. 3.5.2, Sec. 3.5.2.3, esp. examples
ISO Sec. 6.5.2, Sec. 6.5.2.3
H&S Sec. 5.6.1 pp. 132-3
Question 1.14
Question 2.1
Question 2.1
What's the difference between these two declarations?
struct x1 { ... };
typedef struct { ... } x2;
The first form declares a structure tag; the second declares a typedef. The main difference is that the
second declaration is of a slightly more abstract type--its users don't necessarily know that it is a
structure, and the keyword struct is not used when declaring instances of it.
Question 1.34
Question 1.34
I finally figured out the syntax for declaring pointers to functions, but now how do I initialize one?
Use something like

extern int func();
int (*fp)() = func;
When the name of a function appears in an expression like this, it ``decays'' into a pointer (that is, it has
its address implicitly taken), much as an array name does.
An explicit declaration for the function is normally needed, since implicit external function declaration
does not happen in this case (because the function name in the initialization is not part of a function call).
Question 4.12
Question 4.12
I've seen different methods used for calling functions via pointers. What's the story?
Originally, a pointer to a function had to be ``turned into'' a ``real'' function, with the * operator (and an
extra pair of parentheses, to keep the precedence straight), before calling:
int r, func(), (*fp)() = func;
r = (*fp)();
It can also be argued that functions are always called via pointers, and that ``real'' function names always
decay implicitly into pointers (in expressions, as they do in initializations; see question 1.34). This
reasoning, made widespread through pcc and adopted in the ANSI standard, means that
r = fp();
is legal and works correctly, whether fp is the name of a function or a pointer to one. (The usage has
always been unambiguous; there is nothing you ever could have done with a function pointer followed by
an argument list except call the function pointed to.) An explicit * is still allowed (and recommended, if
portability to older compilers is important).
K&R2 Sec. 5.11 p. 120
ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
H&S Sec. 5.8 p. 147, Sec. 7.4.3 p. 190
Question 4.11
Question 4.11
Does C even have ``pass by reference''?
Not really. Strictly speaking, C always uses pass by value. You can simulate pass by reference yourself,
by defining functions which accept pointers and then using the & operator when calling, and the compiler
will essentially simulate it for you when you pass an array to a function (by passing a pointer instead, see
question 6.4 et al.), but C has nothing truly equivalent to formal pass by reference or C++ reference
parameters. (However, function-like preprocessor macros do provide a form of ``call by name''.)
See also questions 4.8 and 20.1.
References: K&R1 Sec. 1.8 pp. 24-5, Sec. 5.2 pp. 91-3
K&R2 Sec. 1.8 pp. 27-8, Sec. 5.2 pp. 91-3
ANSI Sec. 3.3.2.2, esp. footnote 39
ISO Sec. 6.3.2.2
H&S Sec. 9.5 pp. 273-4
Question 6.4
Question 6.4
Then why are array and pointer declarations interchangeable as function formal parameters?
It's supposed to be a convenience.

Since arrays decay immediately into pointers, an array is never actually passed to a function. Allowing
pointer parameters to be declared as arrays is a simply a way of making it look as though the array was
being passed--a programmer may wish to emphasize that a parameter is traditionally treated as if it were
an array, or that an array (strictly speaking, the address) is traditionally passed. As a convenience,
therefore, any parameter declarations which ``look like'' arrays, e.g.
f(a)
char a[];
{ ... }
are treated by the compiler as if they were pointers, since that is what the function will receive if an array
is passed:
f(a)
char *a;
{ ... }
This conversion holds only within function formal parameter declarations, nowhere else. If the
conversion bothers you, avoid it; many people have concluded that the confusion it causes outweighs the
small advantage of having the declaration ``look like'' the call or the uses within the function.
References: K&R1 Sec. 5.3 p. 95, Sec. A10.1 p. 205
K&R2 Sec. 5.3 p. 100, Sec. A8.6.3 p. 218, Sec. A10.1 p. 226
ANSI Sec. 3.5.4.3, Sec. 3.7.1, Sec. 3.9.6
ISO Sec. 6.5.4.3, Sec. 6.7.1, Sec. 6.9.6
H&S Sec. 9.3 p. 271
CT&P Sec. 3.3 pp. 33-4

Question 6.4
Question 6.21
Question 6.21
Why doesn't sizeof properly report the size of an array when the array is a parameter to a function?
The compiler pretends that the array parameter was declared as a pointer (see question 6.4), and sizeof
reports the size of the pointer.
References: H&S Sec. 7.5.2 p. 195
Question 6.20
Question 6.20
How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when
passing them to functions?
There is no single perfect method. Given the declarations

int
int
int
int
int
array[NROWS][NCOLUMNS];
**array1;
/* ragged */
**array2;
/* contiguous */
*array3;
/* "flattened" */
(*array4)[NCOLUMNS];
with the pointers initialized as in the code fragments in question 6.16, and functions declared as
f1(int a[][NCOLUMNS], int nrows, int ncolumns);
f2(int *aryp, int nrows, int ncolumns);
f3(int **pp, int nrows, int ncolumns);
where f1 accepts a conventional two-dimensional array, f2 accepts a ``flattened'' two-dimensional
array, and f3 accepts a pointer-to-pointer, simulated array (see also questions 6.18 and 6.19), the
following calls should work as expected:
f1(array, NROWS, NCOLUMNS);
f1(array4, nrows, NCOLUMNS);
f2(&array[0][0], NROWS, NCOLUMNS);
f2(*array, NROWS, NCOLUMNS);
f2(*array2, nrows, ncolumns);
f2(array3, nrows, ncolumns);
f2(*array4, nrows, NCOLUMNS);
The following two calls would probably work on most systems, but involve questionable casts, and work
only if the dynamic ncolumns matches the static NCOLUMNS:
f1((int (*)[NCOLUMNS])(*array2), nrows, ncolumns);
f1((int (*)[NCOLUMNS])array3, nrows, ncolumns);
Question 6.20
It must again be noted that passing &array[0][0] (or, equivalently, *array) to f2 is not strictly
conforming; see question 6.19.
If you can understand why all of the above calls work and are written as they are, and if you understand
why the combinations that are not listed would not work, then you have a very good understanding of
arrays and pointers in C.
Rather than worrying about all of this, one approach to using multidimensional arrays of various sizes is
to make them all dynamic, as in question 6.16. If there are no static multidimensional arrays--if all arrays
are allocated like array1 or array2 in question 6.16--then all functions can be written like f3.
Question 6.16
Question 6.16
How can I dynamically allocate a multidimensional array?
It is usually best to allocate an array of pointers, and then initialize each pointer to a dynamicallyallocated ``row.'' Here is a two-dimensional example:
#include <stdlib.h>
int **array1 = (int **)malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array1[i] = (int *)malloc(ncolumns * sizeof(int));
(In real code, of course, all of malloc's return values would be checked.)
You can keep the array's contents contiguous, while making later reallocation of individual rows
difficult, with a bit of explicit pointer arithmetic:
int **array2 = (int **)malloc(nrows * sizeof(int *));
array2[0] = (int *)malloc(nrows * ncolumns * sizeof(int));
array2[i] = array2[0] + i * ncolumns;
In either case, the elements of the dynamic array can be accessed with normal-looking array subscripts:
arrayx[i][j] (for 0 <= i < NROWS and 0 <= j < NCOLUMNS).
If the double indirection implied by the above schemes is for some reason unacceptable, you can
simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array:
int *array3 = (int *)malloc(nrows * ncolumns * sizeof(int));
However, you must now perform subscript calculations manually, accessing the i,jth element with
array3[i * ncolumns + j]. (A macro could hide the explicit calculation, but invoking it would
require parentheses and commas which wouldn't look exactly like multidimensional array syntax, and the
macro would need access to at least one of the dimensions, as well. See also question 6.19.)
Finally, you could use pointers to arrays:
int (*array4)[NCOLUMNS] =
Question 6.16
(int (*)[NCOLUMNS])malloc(nrows * sizeof(*array4));

but the syntax starts getting horrific and at most one dimension may be specified at run time.
With all of these techniques, you may of course need to remember to free the arrays (which may take
several steps; see question 7.23) when they are no longer needed, and you cannot necessarily intermix
dynamically-allocated arrays with conventional, statically-allocated ones (see question 6.20, and also
question 6.18).
All of these techniques can also be extended to three or more dimensions.
Question 6.19
Question 6.19
How do I write functions which accept two-dimensional arrays when the ``width'' is not known at
compile time?
It's not easy. One way is to pass in a pointer to the [0][0] element, along with the two dimensions, and
simulate array subscripting ``by hand:''
f2(aryp, nrows, ncolumns)
int *aryp;
int nrows, ncolumns;
{ ... array[i][j] is accessed as aryp[i * ncolumns + j] ... }
This function could be called with the array from question 6.18 as
f2(&array[0][0], NROWS, NCOLUMNS);
It must be noted, however, that a program which performs multidimensional array subscripting ``by
hand'' in this way is not in strict conformance with the ANSI C Standard; according to an official
interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS.
gcc allows local arrays to be declared having sizes which are specified by a function's arguments, but
this is a nonstandard extension.
When you want to be able to use a function on multidimensional arrays of various sizes, one solution is
to simulate all the arrays dynamically, as in question 6.16.
See also questions 6.18, 6.20, and 6.15.
ISO Sec. 6.3.6
Question 6.19
Question 6.18
Question 6.18
My compiler complained when I passed a two-dimensional array to a function expecting a pointer to a
pointer.
The rule (see question 6.3) by which arrays decay into pointers is not applied recursively. An array of
arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer.
Pointers to arrays can be confusing, and must be treated carefully; see also question 6.13. (The confusion
is heightened by the existence of incorrect compilers, including some old versions of pcc and pccderived lints, which improperly accept assignments of multi-dimensional arrays to multi-level
pointers.)
If you are passing a two-dimensional array to a function:
int array[NROWS][NCOLUMNS];
f(array);
the function's declaration must match:
f(int a[][NCOLUMNS])
{ ... }
or
f(int (*ap)[NCOLUMNS])
{ ... }
/* ap is a pointer to an array */
In the first declaration, the compiler performs the usual implicit parameter rewriting of `àrray of array''
to ``pointer to array'' (see questions 6.3 and 6.4); in the second form the pointer declaration is explicit.
Since the called function does not allocate space for the array, it does not need to know the overall size,
so the number of rows, NROWS, can be omitted. The ``shape'' of the array is still important, so the column
dimension NCOLUMNS (and, for three- or more dimensional arrays, the intervening ones) must be
retained.
If a function is already declared as accepting a pointer to a pointer, it is probably meaningless to pass a
two-dimensional array directly to it.
Question 6.18

K&R2 Sec. 5.9 p. 113
H&S Sec. 5.4.3 p. 126
Question 6.3
Question 6.3
So what is meant by the `èquivalence of pointers and arrays'' in C?
Much of the confusion surrounding arrays and pointers in C can be traced to a misunderstanding of this
statement. Saying that arrays and pointers are `èquivalent'' means neither that they are identical nor even
interchangeable.
`Èquivalence'' refers to the following key definition:
An lvalue of type array-of-T which appears in an expression decays (with three
exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-toT.
(The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal
initializer for a character array.)
As a consequence of this definition, the compiler doesn't apply the array subscripting operator [] that
differently to arrays and pointers, after all. In an expression of the form a[i], the array decays into a
pointer, following the rule above, and is then subscripted just as would be a pointer variable in the
expression p[i] (although the eventual memory accesses will be different, as explained in question 6.2).
If you were to assign the array's address to the pointer:
p = a;
then p[3] and a[3] would access the same element.
References: K&R1 Sec. 5.3 pp. 93-6
K&R2 Sec. 5.3 p. 99
ANSI Sec. 3.2.2.1, Sec. 3.3.2.1, Sec. 3.3.6
ISO Sec. 6.2.2.1, Sec. 6.3.2.1, Sec. 6.3.6
H&S Sec. 5.4.1 p. 124
Question 6.3
Question 6.2
Question 6.2
But I heard that char a[] was identical to char *a.
Not at all. (What you heard has to do with formal parameters to functions; see question 6.4.) Arrays are
not pointers. The array declaration char a[6] requests that space for six characters be set aside, to be
known by the name `à.'' That is, there is a location named `à'' at which six characters can sit. The
pointer declaration char *p, on the other hand, requests a place which holds a pointer, to be known by
the name ``p.'' This pointer can point almost anywhere: to any char, or to any contiguous array of
chars, or nowhere (see also questions 5.1 and 1.30).
As usual, a picture is worth a thousand words. The declarations
char a[] = "hello";
char *p = "world";
would initialize data structures which could be represented like this:
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+
+-----+
+---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+
+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an
array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits
code to start at the location `à,'' move three past it, and fetch the character there. When it sees the
expression p[3], it emits code to start at the location ``p,'' fetch the pointer value there, add three to the
pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of)
the object named a, while p[3] is three places past the object pointed to by p. In the example above,
both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently.
CT&P Sec. 4.5 pp. 64-5

Question 6.2
Question 5.1
Question 5.1
What is this infamous null pointer, anyway?
The language definition states that for each pointer type, there is a special value--the ``null pointer''-which is distinguishable from all other pointer values and which is ``guaranteed to compare unequal to a
pointer to any object or function.'' That is, the address-of operator & will never yield a null pointer, nor
will a successful call to malloc. (malloc does return a null pointer when it fails, and this is a typical
use of null pointers: as a ``special'' pointer value with some other meaning, usually ``not allocated'' or
``not pointing anywhere yet.'')
A null pointer is conceptually different from an uninitialized pointer. A null pointer is known not to point
to any object or function; an uninitialized pointer might point anywhere. See also questions 1.30, 7.1, and
7.31.
As mentioned above, there is a null pointer for each pointer type, and the internal values of null pointers
for different types may be different. Although programmers need not know the internal values, the
compiler must always be informed which type of null pointer is required, so that it can make the
distinction if necessary (see questions 5.2, 5.5, and 5.6).
K&R2 Sec. 5.4 p. 102
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
H&S Sec. 5.3.2 pp. 121-3
Question 1.30
Question 1.30
What can I safely assume about the initial values of variables which are not explicitly initialized? If
global variables start out as ``zero,'' is that good enough for null pointers and floating-point zeroes?
Variables with static duration (that is, those declared outside of functions, and those declared with the
storage class static), are guaranteed initialized (just once, at program startup) to zero, as if the
programmer had typed ``= 0''. Therefore, such variables are initialized to the null pointer (of the correct
type; see also section 5) if they are pointers, and to 0.0 if they are floating-point.
Variables with automatic duration (i.e. local variables without the static storage class) start out
containing garbage, unless they are explicitly initialized. (Nothing useful can be predicted about the
garbage.)
Dynamically-allocated memory obtained with malloc and realloc is also likely to contain garbage,
and must be initialized by the calling program, as appropriate. Memory obtained with calloc is all-bits0, but this is not necessarily useful for pointer or floating-point values (see question 7.31, and section 5).
K&R2 Sec. 4.9 pp. 85-86
ANSI Sec. 3.5.7, Sec. 4.10.3.1, Sec. 4.10.5.3
ISO Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3
H&S Sec. 4.2.8 pp. 72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96, Sec. 16.1 p. 386
Null Pointers
5. Null Pointers
5.1 What is this infamous null pointer, anyway?
5.2 How do I get a null pointer in my programs?
5.3 Is the abbreviated pointer comparison `ìf(p)'' to test for non-null pointers valid?
5.4 What is NULL and how is it #defined?
5.5 How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?
5.6 If NULL were defined as ``((char *)0),'' wouldn't that make function calls which pass an uncast
NULL work?
5.9 If NULL and 0 are equivalent as null pointer constants, which should I use?
5.10 But wouldn't it be better to use NULL, in case the value of NULL changes?
5.12 I use the preprocessor macro "#define Nullptr(type) (type *)0" to help me build null
pointers of the correct type.
5.13 This is strange. NULL is guaranteed to be 0, but the null pointer is not?
5.14 Why is there so much confusion surrounding null pointers?
5.15 I'm confused. I just can't understand all this null pointer stuff.
5.16 Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?
5.17 Seriously, have any actual machines really used nonzero null pointers?
5.20 What does a run-time ``null pointer assignment'' error mean?
http://www.eskimo.com/~scs/C-faq/s5.html (1 of 2) [22/07/2003 5:12:03 PM]
Null Pointers
top
Question 5.4
Question 5.4
What is NULL and how is it #defined?
As a matter of style, many programmers prefer not to have unadorned 0's scattered through their
programs. Therefore, the preprocessor macro NULL is #defined (by <stdio.h> or <stddef.h>)
with the value 0, possibly cast to (void *) (see also question 5.6). A programmer who wishes to make
explicit the distinction between 0 the integer and 0 the null pointer constant can then use NULL
whenever a null pointer is required.
Using NULL is a stylistic convention only; the preprocessor turns NULL back into 0 which is then
recognized by the compiler, in pointer contexts, as before. In particular, a cast may still be necessary
before NULL (as before 0) in a function call argument. The table under question 5.2 above applies for
NULL as well as 0 (an unadorned NULL is equivalent to an unadorned 0).
NULL should only be used for pointers; see question 5.9.
K&R2 Sec. 5.4 p. 102
ANSI Sec. 4.1.5, Sec. 3.2.2.3
ISO Sec. 7.1.6, Sec. 6.2.2.3
H&S Sec. 5.3.2 p. 122, Sec. 11.1 p. 292
Question 5.6
Question 5.6
If NULL were defined as follows:
#define NULL ((char *)0)
wouldn't that make function calls which pass an uncast NULL work?
Not in general. The problem is that there are machines which use different internal representations for
pointers to different types of data. The suggested definition would make uncast NULL arguments to
functions expecting pointers to characters work correctly, but pointer arguments of other types would
still be problematical, and legal constructions such as
FILE *fp = NULL;
could fail.
Nevertheless, ANSI C allows the alternate definition
#define NULL ((void *)0)
for NULL. Besides potentially helping incorrect programs to work (but only on machines with
homogeneous pointers, thus questionably valid assistance), this definition may catch programs which use
NULL incorrectly (e.g. when the ASCII NUL character was really intended; see question 5.9).
Question 5.9
Question 5.9
If NULL and 0 are equivalent as null pointer constants, which should I use?
Many programmers believe that NULL should be used in all pointer contexts, as a reminder that the value
is to be thought of as a pointer. Others feel that the confusion surrounding NULL and 0 is only
compounded by hiding 0 behind a macro, and prefer to use unadorned 0 instead. There is no one right
answer. (See also questions 9.2 and 17.10.) C programmers must understand that NULL and 0 are
interchangeable in pointer contexts, and that an uncast 0 is perfectly acceptable. Any usage of NULL (as
opposed to 0) should be considered a gentle reminder that a pointer is involved; programmers should not
depend on it (either for their own understanding or the compiler's) for distinguishing pointer 0's from
integer 0's.
NULL should not be used when another kind of 0 is required, even though it might work, because doing
so sends the wrong stylistic message. (Furthermore, ANSI allows the definition of NULL to be ((void
*)0), which will not work at all in non-pointer contexts.) In particular, do not use NULL when the
ASCII null character (NUL) is desired. Provide your own definition
#define NUL '\0'
if you must.
K&R2 Sec. 5.4 p. 102
Question 5.17
Question 5.17
Seriously, have any actual machines really used nonzero null pointers, or different representations for
pointers to different types?
The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I. Later models used
segment 0, offset 0 for null pointers in C, necessitating new instructions such as TCNP (Test C Null
Pointer), evidently as a sop to all the extant poorly-written C code which made incorrect assumptions.
Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char
*'s) than word pointers (int *'s).
The Eclipse MV series from Data General has three architecturally supported pointer formats (word,
byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *,
and word pointers for everything else.
Some Honeywell-Bull mainframes use the bit pattern 06000 for (internal) null pointers.
The CDC Cyber 180 Series has 48-bit pointers consisting of a ring, segment, and offset. Most users (in
ring 11) have null pointers of 0xB00000000000. It was common on old CDC ones-complement machines
to use an all-one-bits word as a special flag for all kinds of data, including invalid addresses.
The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses;
like several of the machines above it therefore uses different representations for char * and void *
pointers than for other pointers.
The Symbolics Lisp Machine, a tagged architecture, does not even have conventional numeric pointers; it
uses the pair <NIL, 0> (basically a nonexistent <object, offset> handle) as a C null pointer.
Depending on the ``memory model'' in use, 8086-family processors (PC compatibles) may use 16-bit
data pointers and 32-bit function pointers, or vice versa.
Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses
the upper 16 bits to indicate a byte address within a word.

Question 5.17
Question 5.16
Question 5.16
Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?
If for no other reason, doing so would be ill-advised because it would unnecessarily constrain
implementations which would otherwise naturally represent null pointers by special, nonzero bit patterns,
particularly when those values would trigger automatic hardware traps for invalid accesses.
Besides, what would such a requirement really accomplish? Proper understanding of null pointers does
not require knowledge of the internal representation, whether zero or nonzero. Assuming that null
pointers are internally zero does not make any code easier to write (except for a certain ill-advised usage
of calloc; see question 7.31). Known-zero internal pointers would not obviate casts in function calls,
because the size of the pointer might still be different from that of an int. (If ``nil'' were used to request
null pointers, as mentioned in question 5.14, the urge to assume an internal zero representation would not
even arise.)
Question 7.31
Question 7.31
What's the difference between calloc and malloc? Is it safe to take advantage of calloc's zerofilling? Does free work on memory allocated with calloc, or do you need a cfree?
calloc(m, n) is essentially equivalent to

p = malloc(m * n);
memset(p, 0, m * n);
The zero fill is all-bits-zero, and does not therefore guarantee useful null pointer values (see section 5 of
this list) or floating-point zero values. free is properly used to free the memory allocated by calloc.
References: ANSI Sec. 4.10.3 to 4.10.3.2
ISO Sec. 7.10.3 to 7.10.3.2
H&S Sec. 16.1 p. 386, Sec. 16.2 p. 386
PCS Sec. 11 pp. 141,142
Question 7.30
Question 7.30
Is it legal to pass a null pointer as the first argument to realloc? Why would you want to?
ANSI C sanctions this usage (and the related realloc(..., 0), which frees), although several earlier
implementations do not support it, so it may not be fully portable. Passing an initially-null pointer to
realloc can make it easier to write a self-starting incremental allocation algorithm.
References: ANSI Sec. 4.10.3.4
ISO Sec. 7.10.3.4
H&S Sec. 16.3 p. 388
Question 7.27
Question 7.27
So can I query the malloc package to find out how big an allocated block is?
Not portably.
Question 7.26
Question 7.26
How does free know how many bytes to free?
The malloc/free implementation remembers the size of each block allocated and returned, so it is not
necessary to remind it of the size when freeing.
Question 7.25
Question 7.25
I have a program which mallocs and later frees a lot of memory, but memory usage (as reported by
ps) doesn't seem to go back down.
Most implementations of malloc/free do not return freed memory to the operating system (if there
is one), but merely make it available for future malloc calls within the same program.
Question 7.24
Question 7.24
Must I free allocated memory before the program exits?
You shouldn't have to. A real operating system definitively reclaims all memory when a program exits.
Nevertheless, some personal computers are said not to reliably recover memory, and all that can be
inferred from the ANSI/ISO C Standard is that this is a ``quality of implementation issue.''
ISO Sec. 7.10.3.2
Question 7.23
Question 7.23
I'm allocating structures which contain pointers to other dynamically-allocated objects. When I free a
structure, do I have to free each subsidiary pointer first?
Yes. In general, you must arrange that each pointer returned from malloc be individually passed to
free, exactly once (if it is freed at all).
A good rule of thumb is that for each call to malloc in a program, you should be able to point at the call
to free which frees the memory allocated by that malloc call.
Question 7.22
Question 7.22
When I call malloc to allocate memory for a local pointer, do I have to explicitly free it?
Yes. Remember that a pointer is different from what it points to. Local variables are deallocated when
the function returns, but in the case of a pointer variable, this means that the pointer is deallocated, not
what it points to. Memory allocated with malloc always persists until you explicitly free it. In general,
for every call to malloc, there should be a corresponding call to free.
Question 7.21
Question 7.21
Why isn't a pointer null after calling free?
How unsafe is it to use (assign, compare) a pointer value after it's been freed?
When you call free, the memory pointed to by the passed pointer is freed, but the value of the pointer
in the caller remains unchanged, because C's pass-by-value semantics mean that called functions never
permanently change the values of their arguments. (See also question 4.8.)
A pointer value which has been freed is, strictly speaking, invalid, and any use of it, even if is not
dereferenced can theoretically lead to trouble, though as a quality of implementation issue, most
implementations will probably not go out of their way to generate exceptions for innocuous uses of
invalid pointers.
ISO Sec. 7.10.3
Question 4.8
Question 4.8
I have a function which accepts, and is supposed to initialize, a pointer:
void f(ip)
int *ip;
{
static int dummy = 5;
ip = &dummy;
}
But when I call it like this:
int *ip;
f(ip);
the pointer in the caller remains unchanged.
Are you sure the function initialized what you thought it did? Remember that arguments in C are passed
by value. The called function altered only the passed copy of the pointer. You'll either want to pass the
address of the pointer (the function will end up accepting a pointer-to-a-pointer), or have the function
return the pointer.
Question 4.9
Question 4.9
Can I use a void ** pointer to pass a generic pointer to a function by reference?
Not portably. There is no generic pointer-to-pointer type in C. void * acts as a generic pointer only
because conversions are applied automatically when other pointer types are assigned to and from void
*'s; these conversions cannot be performed (the correct underlying pointer type is not known) if an
attempt is made to indirect upon a void ** value which points at something other than a void *.
Question 4.10
Question 4.10
I have a function
extern int f(int *);
which accepts a pointer to an int. How can I pass a constant by reference? A call like
f(&5);
doesn't seem to work.
You can't do this directly. You will have to declare a temporary variable, and then pass its address to the
function:
int five = 5;
f(&five);
Question 2.10
Question 2.10
How can I pass constant values to functions which accept structure arguments?
C has no way of generating anonymous structure values. You will have to use a temporary structure
variable or a little structure-building function; see question 14.11 for an example. (gcc provides
structure constants as an extension, and the mechanism will probably be added to a future revision of the
C Standard.) See also question 4.10.
Question 2.6
Question 2.6
I came across some code that declared a structure like this:
struct name {
int namelen;
char namestr[1];
};
and then did some tricky allocation to make the namestr array act like it had several elements. Is this
legal or portable?
This technique is popular, although Dennis Ritchie has called it `ùnwarranted chumminess with the C
implementation.'' An official interpretation has deemed that it is not strictly conforming with the C
Standard. (A thorough treatment of the arguments surrounding the legality of the technique is beyond the
scope of this list.) It does seem to be portable to all known implementations. (Compilers which check
array bounds carefully might issue warnings.)
Another possibility is to declare the variable-size element very large, rather than very small; in the case
of the above example:
...
char namestr[MAXSIZE];
...
where MAXSIZE is larger than any name which will be stored. However, it looks like this technique is
disallowed by a strict interpretation of the Standard as well.
References: Rationale Sec. 3.5.4.2
Question 2.4
Question 2.4
What's the best way of implementing opaque (abstract) data types in C?
One good way is for clients to use structure pointers (perhaps additionally hidden behind typedefs)
which point to structure types which are not publicly defined.
Question 2.3
Question 2.3
Can a structure contain a pointer to itself?
Most certainly. See question 1.14.
Question 2.2
Question 2.2
Why doesn't
struct x { ... };
x thestruct;
work?
C is not C++. Typedef names are not automatically generated for structure tags. See also question 2.1.
Structures, Unions, and Enumerations

2.1 What's the difference between struct x1 { ... }; and typedef struct { ... } x2;
?
2.2 Why doesn't "struct x { ... }; x thestruct;" work?
2.3 Can a structure contain a pointer to itself?
2.4 What's the best way of implementing opaque (abstract) data types in C?
2.6 I came across some code that declared a structure with the last member an array of one element, and
then did some tricky allocation to make it act like the array had several elements. Is this legal or
portable?
2.7 I heard that structures could be assigned to variables and passed to and from functions, but K&R1
says not.
2.8 Why can't you compare structures?
2.9 How are structure passing and returning implemented?
2.10 Can I pass constant values to functions which accept structure arguments?
2.11 How can I read/write structures from/to data files?
2.12 How can I turn off structure padding?
2.13 Why does sizeof report a larger size than I expect for a structure type?
2.14 How can I determine the byte offset of a field within a structure?
2.15 How can I access structure fields by name at run time?
2.18 I have a program which works correctly, but dumps core after it finishes. Why?
2.20 Can I initialize unions?
Structures, Unions, and Enumerations
2.22 What is the difference between an enumeration and a set of preprocessor #defines?
2.24 Is there an easy way to print enumeration values symbolically?
top
Question 20.1
Question 20.1
How can I return multiple values from a function?
Either pass pointers to several locations which the function can fill in, or have the function return a
structure containing the desired values, or (in a pinch) consider global variables. See also questions 2.7,
4.8, and 7.5.
Question 7.5
Question 7.5
I have a function that is supposed to return a string, but when it returns to its caller, the returned string is
garbage.
Make sure that the pointed-to memory is properly allocated. The returned pointer should be to a staticallyallocated buffer, or to a buffer passed in by the caller, or to memory obtained with malloc, but not to a
local (automatic) array. In other words, never do something like
char *itoa(int n)
{
char retbuf[20];
sprintf(retbuf, "%d", n);
return retbuf;
}
/* WRONG */
/* WRONG */
One fix (which is imperfect, especially if the function in question is called recursively, or if several of its
return values are needed simultaneously) would be to declare the return buffer as
static char retbuf[20];
ISO Sec. 6.1.2.4
Question 12.21
Question 12.21
How can I tell how much destination buffer space I'll need for an arbitrary sprintf call? How can I
avoid overflowing the destination buffer with sprintf?
There are not (yet) any good answers to either of these excellent questions, and this represents perhaps
the biggest deficiency in the traditional stdio library.
When the format string being used with sprintf is known and relatively simple, you can usually
predict a buffer size in an ad-hoc way. If the format consists of one or two %s's, you can count the fixed
characters in the format string yourself (or let sizeof count them for you) and add in the result of
calling strlen on the string(s) to be inserted. You can conservatively estimate the size that %d will
expand to with code like:
#include <limits.h>
char buf[(sizeof(int) * CHAR_BIT + 2) / 3 + 1 + 1];
sprintf(buf, "%d", n);
(This code computes the number of characters required for a base-8 representation of a number; a base10 expansion is guaranteed to take as much room or less.)
When the format string is more complicated, or is not even known until run time, predicting the buffer
size becomes as difficult as reimplementing sprintf, and correspondingly error-prone (and
inadvisable). A last-ditch technique which is sometimes suggested is to use fprintf to print the same
text to a bit bucket or temporary file, and then to look at fprintf's return value or the size of the file
(but see question 19.12).
If there's any chance that the buffer might not be big enough, you won't want to call sprintf without
some guarantee that the buffer will not overflow and overwrite some other part of memory. Several
stdio's (including GNU and 4.4bsd) provide the obvious snprintf function, which can be used like
this:
snprintf(buf, bufsize, "You typed \"%s\"", answer);
and we can hope that a future revision of the ANSI/ISO C Standard will include this function.

Question 12.21
Question 19.12
Question 19.12
How can I find out the size of a file, prior to reading it in?
If the ``size of a file'' is the number of characters you'll be able to read from it in C, it is difficult or
impossible to determine this number exactly).
Under Unix, the stat call will give you an exact answer. Several other systems supply a Unix-like
stat which will give an approximate answer. You can fseek to the end and then use ftell, but these
tend to have the same problems: fstat is not portable, and generally tells you the same thing stat
tells you; ftell is not guaranteed to return a byte count except for binary files. Some systems provide
routines called filesize or filelength, but these are not portable, either.
Are you sure you have to determine the file's size in advance? Since the most accurate way of
determining the size of a file as a C program will see it is to open the file and read it, perhaps you can
rearrange the code to learn the size as it reads.
ISO Sec. 7.9.9.4
H&S Sec. 15.5.1
PCS Sec. 12 p. 213
POSIX Sec. 5.6.2
Question 19.11
Question 19.11
How can I check whether a file exists? I want to warn the user if a requested input file is missing.
It's surprisingly difficult to make this determination reliably and portably. Any test you make can be
invalidated if the file is created or deleted (i.e. by some other process) between the time you make the
test and the time you try to open the file.
Three possible test routines are stat, access, and fopen. (To make an approximate test for file
existence with fopen, just open for reading and close immediately.) Of these, only fopen is widely
portable, and access, where it exists, must be used carefully if the program uses the Unix set-UID
feature.
Rather than trying to predict in advance whether an operation such as opening a file will succeed, it's
often better to try it, check the return value, and complain if it fails. (Obviously, this approach won't work
if you're trying to avoid overwriting an existing file, unless you've got something like the O_EXCL file
opening option available, which does just what you want in this case.)
References: PCS Sec. 12 pp. 189,213
POSIX Sec. 5.3.1, Sec. 5.6.2, Sec. 5.6.3
Question 19.10
Question 19.10
How can I do graphics?
Once upon a time, Unix had a fairly nice little set of device-independent plot routines described in plot(3)
and plot(5), but they've largely fallen into disuse.
If you're programming for MS-DOS, you'll probably want to use libraries conforming to the VESA or
BGI standards.
If you're trying to talk to a particular plotter, making it draw is usually a matter of sending it the
appropriate escape sequences; see also question 19.9. The vendor may supply a C-callable library, or you
may be able to find one on the net.
If you're programming for a particular window system (Macintosh, X windows, Microsoft Windows),
you will use its facilities; see the relevant documentation or newsgroup or FAQ list.
References: PCS Sec. 5.4 pp. 75-77
Question 19.9
Question 19.9
How do I send escape sequences to control a terminal or other device?
If you can figure out how to send characters to the device at all (see question 19.8), it's easy enough to
send escape sequences. In ASCII, the ESC code is 033 (27 decimal), so code like
fprintf(ofd, "\033[J");
sends the sequence ESC [ J .
Question 19.8
Question 19.8
How can I direct output to the printer?
Under Unix, either use popen (see question 19.30) to write to the lp or lpr program, or perhaps open
a special file like /dev/lp. Under MS-DOS, write to the (nonstandard) predefined stdio stream
stdprn, or open the special files PRN or LPT1.
Question 19.30
Question 19.30
How can I invoke another program or command and trap its output?
Unix and some other systems provide a popen routine, which sets up a stdio stream on a pipe connected
to the process running a command, so that the output can be read (or the input supplied). (Also,
remember to call pclose.)
If you can't use popen, you may be able to use system, with the output going to a file which you then
open and read.
If you're using Unix and popen isn't sufficient, you can learn about pipe, dup, fork, and exec.
(One thing that probably would not work, by the way, would be to use freopen.)
References: PCS Sec. 11 p. 169
Question 19.27
Question 19.27
How can I invoke another program (a standalone executable, or an operating system command) from
within a C program?
Use the library function system, which does exactly that. Note that system's return value is the
command's exit status, and usually has nothing to do with the output of the command. Note also that
system accepts a single string representing the command to be invoked; if you need to build up a
complex command line, you can use sprintf. See also question 19.30.
K&R2 Sec. 7.8.4 p. 167, Sec. B6 p. 253
ANSI Sec. 4.10.4.5
ISO Sec. 7.10.4.5
H&S Sec. 19.2 p. 407
PCS Sec. 11 p. 179
Question 19.25
Question 19.25
How can I access memory (a memory-mapped device, or graphics memory) located at a certain address?
Set a pointer, of the appropriate type, to the right number (using an explicit cast to assure the compiler
that you really do intend this nonportable conversion):
unsigned int *magicloc = (unsigned int *)0x12345678;
Then, *magicloc refers to the location you want. (Under MS-DOS, you may find a macro like
MK_FP() handy for working with segments and offsets.)
K&R2 Sec. A6.6 p. 199
ANSI Sec. 3.3.4
ISO Sec. 6.3.4
H&S Sec. 6.2.7 pp. 171-2
Question 19.24
Question 19.24
What does the error message ``DGROUP data allocation exceeds 64K'' mean, and what can I do about it?
I thought that using large model meant that I could use more than 64K of data!
Even in large memory models, MS-DOS compilers apparently toss certain data (strings, some initialized
global or static variables) into a default data segment, and it's this segment that is overflowing. Either
use less global data, or, if you're already limiting yourself to reasonable amounts (and if the problem is
due to something like the number of strings), you may be able to coax the compiler into not using the
default data segment for so much. Some compilers place only ``small'' data objects in the default data
segment, and give you a way (e.g. the /Gt option under Microsoft compilers) to configure the threshold
for ``small.''
Question 19.23
Question 19.23
How can I allocate arrays or structures bigger than 64K?
A reasonable computer ought to give you transparent access to all available memory. If you're not so
lucky, you'll either have to rethink your program's use of memory, or use various system-specific
techniques.
64K is (still) a pretty big chunk of memory. No matter how much memory your computer has available,
it's asking a lot to be able to allocate huge amounts of it contiguously. (The C Standard does not
guarantee that a single object can be larger than 32K.) Often it's a good idea to use data structures which
don't require that all memory be contiguous. For dynamically-allocated multidimensional arrays, you can
use pointers to pointers, as illustrated in question 6.16. Instead of a large array of structures, you can use
a linked list, or an array of pointers to structures.
If you're using a PC-compatible (8086-based) system, and running up against a 640K limit, consider
using ``huge'' memory model, or expanded or extended memory, or malloc variants such as halloc or
farmalloc, or a 32-bit ``flat'' compiler (e.g. djgpp, see question 18.3), or some kind of a DOS
extender, or another operating system.
ISO Sec. 5.2.4.1
Question 6.1
Question 6.1
I had the definition char a[6] in one source file, and in another I declared extern char *a. Why
didn't it work?
The declaration extern char *a simply does not match the actual definition. The type pointer-totype-T is not the same as array-of-type-T. Use extern char a[].
ISO Sec. 6.5.4.2
CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5
Question 5.20
Question 5.20
What does a run-time ``null pointer assignment'' error mean? How do I track it down?
This message, which typically occurs with MS-DOS compilers (see, therefore, section 19) means that
you've written, via a null (perhaps because uninitialized) pointer, to location 0. (See also question 16.8.)
A debugger may let you set a data breakpoint or watchpoint or something on location 0. Alternatively,
you could write a bit of code to stash away a copy of 20 or so bytes from location 0, and periodically
check that the memory at location 0 hasn't changed.
System Dependencies

19.1 How can I read a single character from the keyboard without waiting for the RETURN key?
19.2 How can I find out how many characters are available for reading, or do a non-blocking read?
19.3 How can I display a percentage-done indication that updates itself in place, or show one of those
``twirling baton'' progress indicators?
19.4 How can I clear the screen, or print things in inverse video, or move the cursor?
19.5 How do I read the arrow keys? What about function keys?
19.6 How do I read the mouse?
19.7 How can I do serial (``comm'') port I/O?
19.8 How can I direct output to the printer?
19.9 How do I send escape sequences to control a terminal or other device?
19.10 How can I do graphics?
19.11 How can I check whether a file exists?
19.12 How can I find out the size of a file, prior to reading it in?
19.13 How can a file be shortened in-place without completely clearing or rewriting it?
19.14 How can I insert or delete a line in the middle of a file?
19.15 How can I recover the file name given an open file descriptor?
19.16 How can I delete a file?
19.17 What's wrong with the call "fopen("c:\newdir\file.dat", "r")"?
System Dependencies
19.18 How can I increase the allowable number of simultaneously open files?
19.20 How can I read a directory in a C program?
19.22 How can I find out how much memory is available?
19.23 How can I allocate arrays or structures bigger than 64K?
19.24 What does the error message ``DGROUP exceeds 64K'' mean?
19.25 How can I access memory located at a certain address?
19.27 How can I invoke another program from within a C program?
19.30 How can I invoke another program and trap its output?
19.31 How can my program discover the complete pathname to the executable from which it was
invoked?
19.32 How can I automatically locate a program's configuration files in the same directory as the
executable?
19.33 How can a process change an environment variable in its caller?
19.36 How can I read in an object file and jump to routines in it?
19.37 How can I implement a delay, or time a user's response, with sub-second resolution?
19.38 How can I trap or ignore keyboard interrupts like control-C?
19.39 How can I handle floating-point exceptions gracefully?
19.40 How do I... Use sockets? Do networking? Write client/server applications?
19.40b How do I use BIOS calls? How can I write ISR's? How can I create TSR's?
19.41 But I can't use all these nonstandard, system-dependent functions, because my program has to be
ANSI compatible!
System Dependencies
top
Question 19.1
Question 19.1
How can I read a single character from the keyboard without waiting for the RETURN key? How can I
stop characters from being echoed on the screen as they're typed?
Alas, there is no standard or portable way to do these things in C. Concepts such as screens and
keyboards are not even mentioned in the Standard, which deals only with simple I/O ``streams'' of
characters.
At some level, interactive keyboard input is usually collected and presented to the requesting program a
line at a time. This gives the operating system a chance to support input line editing
(backspace/delete/rubout, etc.) in a consistent way, without requiring that it be built into every program.
Only when the user is satisfied and presses the RETURN key (or equivalent) is the line made available to
the calling program. Even if the calling program appears to be reading input a character at a time (with
getchar or the like), the first call blocks until the user has typed an entire line, at which point
potentially many characters become available and many character requests (e.g. getchar calls) are
satisfied in quick succession.
When a program wants to read each character immediately as it arrives, its course of action will depend
on where in the input stream the line collection is happening and how it can be disabled. Under some
systems (e.g. MS-DOS, VMS in some modes), a program can use a different or modified set of OS-level
input calls to bypass line-at-a-time input processing. Under other systems (e.g. Unix, VMS in other
modes), the part of the operating system responsible for serial input (often called the ``terminal driver'')
must be placed in a mode which turns off line-at-a-time processing, after which all calls to the usual
input routines (e.g. read, getchar, etc.) will return characters immediately. Finally, a few systems
(particularly older, batch-oriented mainframes) perform input processing in peripheral processors which
cannot be told to do anything other than line-at-a-time input.
Therefore, when you need to do character-at-a-time input (or disable keyboard echo, which is an
analogous problem), you will have to use a technique specific to the system you're using, assuming it
provides one. Since comp.lang.c is oriented towards topics that C does deal with, you will usually get
better answers to these questions by referring to a system-specific newsgroup such as
comp.unix.questions or comp.os.msdos.programmer, and to the FAQ lists for these groups. Note that the
answers are often not unique even across different variants of a system; bear in mind when answering
system-specific questions that the answer that applies to your system may not apply to everyone else's.
However, since these questions are frequently asked here, here are brief answers for some common
situations.
Some versions of curses have functions called cbreak, noecho, and getch which do what you want.
Question 19.1
If you're specifically trying to read a short password without echo, you might try getpass. Under Unix,
you can use ioctl to play with the terminal driver modes (CBREAK or RAW under ``classic'' versions;
ICANON, c_cc[VMIN] and c_cc[VTIME] under System V or POSIX systems; ECHO under all
versions), or in a pinch, system and the stty command. (For more information, see <sgtty.h> and
tty(4) under classic versions, <termio.h> and termio(4) under System V, or <termios.h> and
termios(4) under POSIX.) Under MS-DOS, use getch or getche, or the corresponding BIOS
interrupts. Under VMS, try the Screen Management (SMG$) routines, or curses, or issue low-level
$QIO's with the IO$_READVBLK function code (and perhaps IO$M_NOECHO, and others) to ask for
one character at a time. (It's also possible to set character-at-a-time or ``pass through'' modes in the VMS
terminal driver.) Under other operating systems, you're on your own.
(As an aside, note that simply using setbuf or setvbuf to set stdin to unbuffered will not
generally serve to allow character-at-a-time input.)
If you're trying to write a portable program, a good approach is to define your own suite of three
functions to (1) set the terminal driver or input system into character-at-a-time mode (if necessary), (2)
get characters, and (3) return the terminal driver to its initial state when the program is finished. (Ideally,
such a set of functions might be part of the C Standard, some day.) The extended versions of this FAQ
list (see question 20.40) contain examples of such functions for several popular systems.
References: PCS Sec. 10 pp. 128-9, Sec. 10.1 pp. 130-1
POSIX Sec. 7
Question 19.2
Question 19.2
How can I find out if there are characters available for reading (and if so, how many)? Alternatively, how
can I do a read that will not block if there are no characters available?
These, too, are entirely operating-system-specific. Some versions of curses have a nodelay function.
Depending on your system, you may also be able to use ``nonblocking I/O'', or a system call named
select or poll, or the FIONREAD ioctl, c_cc[VTIME], or kbhit, or rdchk, or the O_NDELAY
option to open or fcntl. See also question 19.1.
Question 19.3
Question 19.3
How can I display a percentage-done indication that updates itself in place, or show one of those
``twirling baton'' progress indicators?
These simple things, at least, you can do fairly portably. Printing the character '\r' will usually give
you a carriage return without a line feed, so that you can overwrite the current line. The character '\b'
is a backspace, and will usually move the cursor one position to the left.
ISO Sec. 5.2.2
Question 19.4
Question 19.4
How can I clear the screen?
How can I print things in inverse video?
How can I move the cursor to a specific x, y position?
Such things depend on the terminal type (or display) you're using. You will have to use a library such as
termcap, terminfo, or curses, or some system-specific routines, to perform these operations.
For clearing the screen, a halfway portable solution is to print a form-feed character ('\f'), which will
cause some displays to clear. Even more portable would be to print enough newlines to scroll everything
away. As a last resort, you could use system (see question 19.27) to invoke an operating system clearscreen command.
References: PCS Sec. 5.1.4 pp. 54-60, Sec. 5.1.5 pp. 60-62
Question 19.5
Question 19.5
How do I read the arrow keys? What about function keys?
Terminfo, some versions of termcap, and some versions of curses have support for these non-ASCII
keys. Typically, a special key sends a multicharacter sequence (usually beginning with ESC, '\033');
parsing these can be tricky. (curses will do the parsing for you, if you call keypad first.)
Under MS-DOS, if you receive a character with value 0 (not '0'!) while reading the keyboard, it's a flag
indicating that the next character read will be a code indicating a special key. See any DOS programming
guide for lists of keyboard codes. (Very briefly: the up, left, right, and down arrow keys are 72, 75, 77,
and 80, and the function keys are 59 through 68.)
References: PCS Sec. 5.1.4 pp. 56-7
Question 19.6
Question 19.6
How do I read the mouse?
Consult your system documentation, or ask on an appropriate system-specific newsgroup (but check its
FAQ list first). Mouse handling is completely different under the X window system, MS-DOS, the
Macintosh, and probably every other system.
Question 19.7
Question 19.7
How can I do serial (``comm'') port I/O?
It's system-dependent. Under Unix, you typically open, read, and write a device file in /dev, and use the
facilities of the terminal driver to adjust its characteristics. (See also questions 19.1 and 19.2.) Under MSDOS, you can use the predefined stream stdaux, or a special file like COM1, or some primitive BIOS
interrupts, or (if you require decent performance) any number of interrupt-driven serial I/O packages.
Several netters recommend the book C Programmer's Guide to Serial Communications, by Joe
Campbell.
Question 19.13
Question 19.13
How can a file be shortened in-place without completely clearing or rewriting it?
BSD systems provide ftruncate, several others supply chsize, and a few may provide a (possibly
undocumented) fcntl option F_FREESP. Under MS-DOS, you can sometimes use write(fd, "",
0). However, there is no portable solution, nor a way to delete blocks at the beginning. See also question
19.14.
Question 19.14
Question 19.14
How can I insert or delete a line (or record) in the middle of a file?
Short of rewriting the file, you probably can't. The usual solution is simply to rewrite the file. (Instead of
deleting records, you might consider simply marking them as deleted, to avoid rewriting.) See also
questions 12.30 and 19.13.
Question 12.30
Question 12.30
I'm trying to update a file in place, by using fopen mode "r+", reading a certain string, and writing
back a modified string, but it's not working.
Be sure to call fseek before you write, both to seek back to the beginning of the string you're trying to
overwrite, and because an fseek or fflush is always required between reading and writing in the
read/write "+" modes. Also, remember that you can only overwrite characters with the same number of
replacement characters; see also question 19.14.
ISO Sec. 7.9.5.3
Question 12.26
Question 12.26
How can I flush pending input so that a user's typeahead isn't read at the next prompt? Will
fflush(stdin) work?
fflush is defined only for output streams. Since its definition of ``flush'' is to complete the writing of
buffered characters (not to discard them), discarding unread input would not be an analogous meaning
for fflush on input streams.
There is no standard way to discard unread characters from a stdio input stream, nor would such a way be
sufficient unread characters can also accumulate in other, OS-level input buffers.
ISO Sec. 7.9.5.2
H&S Sec. 15.2
Question 12.25
Question 12.25
What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos and fsetpos good for?
fgetpos and fsetpos use a special typedef, fpos_t, for representing offsets (positions) in a file.
The type behind this typedef, if chosen appropriately, can represent arbitrarily large offsets, allowing
fgetpos and fsetpos to be used with arbitrarily huge files. ftell and fseek, on the other hand,
use long int, and are therefore limited to offsets which can be represented in a long int. See also
question 1.4.
References: K&R2 Sec. B1.6 p. 248
ANSI Sec. 4.9.1, Secs. 4.9.9.1,4.9.9.3
ISO Sec. 7.9.1, Secs. 7.9.9.1,7.9.9.3
H&S Sec. 15.5 p. 252
Question 1.4
Question 1.4
What should the 64-bit type on new, 64-bit machines be?
Some vendors of C products for 64-bit machines support 64-bit long ints. Others fear that too much
existing code is written to assume that ints and longs are the same size, or that one or the other of
them is exactly 32 bits, and introduce a new, nonstandard, 64-bit long long (or __longlong) type
instead.
Programmers interested in writing portable code should therefore insulate their 64-bit type needs behind
appropriate typedefs. Vendors who feel compelled to introduce a new, longer integral type should
advertise it as being `àt least 64 bits'' (which is truly new, a type traditional C does not have), and not
`èxactly 64 bits.''
References: ANSI Sec. F.5.6
ISO Sec. G.5.6
Question 1.1
Question 1.1
How do you decide which integer type to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very
important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If welldefined overflow characteristics are important and negative values are not, or if you want to steer clear of
sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned
types. (Beware when mixing signed and unsigned values in expressions, though.)
Although character types (especially unsigned char) can be used as ``tiny'' integers, doing so is
sometimes more trouble than it's worth, due to unpredictable sign extension and increased code size.
(Using unsigned char can help; see question 12.1 for a related problem.)
A similar space/time tradeoff applies when deciding between float and double. None of the above
rules apply if the address of a variable is taken and must have a particular type.
If for some reason you need to declare something with an exact size (usually the only good reason for
doing so is when attempting to conform to some externally-imposed storage layout, but see question
20.5), be sure to encapsulate the choice behind an appropriate typedef.
K&R2 Sec. 2.2 p. 36, Sec. A4.2 pp. 195-6, Sec. B11 p. 257
ANSI Sec. 2.2.4.2.1, Sec. 3.1.2.5
ISO Sec. 5.2.4.2.1, Sec. 6.1.2.5
H&S Secs. 5.1,5.2 pp. 110-114
Read sequentially: next up top
Question 12.1
Question 12.1
What's wrong with this code?
char c;
while((c = getchar()) != EOF) ...
For one thing, the variable to hold getchar's return value must be an int. getchar can return all
possible character values, as well as EOF. By passing getchar's return value through a char, either a
normal character might be misinterpreted as EOF, or the EOF might be altered (particularly if type char
is unsigned) and so never seen.
K&R2 Sec. 1.5.1 p. 16
ANSI Sec. 3.1.2.5, Sec. 4.9.1, Sec. 4.9.7.5
ISO Sec. 6.1.2.5, Sec. 7.9.1, Sec. 7.9.7.5
H&S Sec. 5.1.3 p. 116, Sec. 15.1, Sec. 15.6
CT&P Sec. 5.1 p. 70
PCS Sec. 11 p. 157
Question 11.35
Question 11.35
People keep saying that the behavior of i = i++ is undefined, but I just tried it on an ANSIconforming compiler, and got the results I expected.
A compiler may do anything it likes when faced with undefined behavior (and, within limits, with
implementation-defined and unspecified behavior), including doing what you expect. It's unwise to
depend on it, though. See also questions 11.32, 11.33, and 11.34.
Question 11.32
Question 11.32
Why won't the Frobozz Magic C Compiler, which claims to be ANSI compliant, accept this code? I
know that the code is ANSI, because gcc accepts it.
Many compilers support a few non-Standard extensions, gcc more so than most. Are you sure that the
code being rejected doesn't rely on such an extension? It is usually a bad idea to perform experiments
with a particular compiler to determine properties of a language; the applicable standard may permit
variations, or the compiler may be wrong. See also question 11.35.
Question 11.31
Question 11.31
Does anyone have a tool for converting old-style C programs to ANSI C, or vice versa, or for
automatically generating prototypes?
Two programs, protoize and unprotoize, convert back and forth between prototyped and `òld style''
function definitions and declarations. (These programs do not handle full-blown translation between
``Classic'' C and ANSI C.) These programs are part of the FSF's GNU C compiler distribution; see
question 18.3.
The unproto program (/pub/unix/unproto5.shar.Z on ftp.win.tue.nl) is a filter which sits between the
preprocessor and the next compiler pass, converting most of ANSI C to traditional C on-the-fly.
The GNU GhostScript package comes with a little program called ansi2knr.
Before converting ANSI C back to old-style, beware that such a conversion cannot always be made both
safely and automatically. ANSI C introduces new features and complexities not found in K&R C. You'll
especially need to be careful of prototyped function calls; you'll probably need to insert explicit casts.
Several prototype generators exist, many as modifications to lint. A program called CPROTO was
posted to comp.sources.misc in March, 1992. There is another program called ``cextract.'' Many vendors
supply simple utilities like these with their compilers. See also question 18.16. (But be careful when
generating prototypes for old functions with ``narrow'' parameters; see question 11.3.)
Finally, are you sure you really need to convert lots of old code to ANSI C? The old-style function
syntax is still acceptable, and a hasty conversion can easily introduce bugs. (See question 11.3.)
Question 18.3
Question 18.3
What's a free or cheap C compiler I can use?
A popular and high-quality free C compiler is the FSF's GNU C compiler, or gcc. It is available by
anonymous ftp from prep.ai.mit.edu in directory pub/gnu, or at several other FSF archive sites. An MSDOS port, djgpp, is also available; it can be found in the Simtel and Oakland archives and probably many
others, usually in a directory like pub/msdos/djgpp/ or simtel/msdos/djgpp/.
There is a shareware compiler called PCC, available as PCC12C.ZIP .
A very inexpensive MS-DOS compiler is Power C from Mix Software, 1132 Commerce Drive,
Richardson, TX 75801, USA, 214-783-6001.
Another recently-developed compiler is lcc, available for anonymous ftp from ftp.cs.princeton.edu in
pub/lcc.
Archives associated with comp.compilers contain a great deal of information about available compilers,
interpreters, grammars, etc. (for many languages). The comp.compilers archives (including an FAQ list),
maintained by the moderator, John R. Levine, are at iecc.com . A list of available compilers and related
resources, maintained by Mark Hopkins, Steven Robenalt, and David Muir Sharnoff, is at ftp.idiom.com
in pub/compilers-list/. (See also the comp.compilers directory in the news.answers archives at
rtfm.mit.edu and ftp.uu.net; see question 20.40.)
Question 18.2
Question 18.2
How can I track down these pesky malloc problems?
A number of debugging packages exist to help track down malloc problems; one popular one is Conor
P. Cahill's ``dbmalloc,'' posted to comp.sources.misc in 1992, volume 32. Others are ``leak,'' available in
volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in the ``Snippets'' collection; and
MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug . See also question 18.16.
A number of commercial debugging tools exist, and can be invaluable in tracking down malloc-related
and other stubborn problems:
Bounds-Checker for DOS, from Nu-Mega Technologies, P.O. Box 7780, Nashua, NH 030607780, USA, 603-889-2386.
CodeCenter (formerly Saber-C) from Centerline Software (formerly Saber), 10 Fawcett Street,
Cambridge, MA 02138-1110, USA, 617-498-3000.
Insight, from ParaSoft Corporation, 2500 E. Foothill Blvd., Pasadena, CA 91107, USA, 818-7929941, insight@parasoft.com .
Purify, from Pure Software, 1309 S. Mary Ave., Sunnyvale, CA 94087, USA, 800-224-7873, infohome@pure.com .
SENTINEL, from AIB Software, 46030 Manekin Plaza, Dulles, VA 20166, USA, 703-430-9247,
800-296-3000, info@aib.com .
Question 18.1
Question 18.1
I need some C development tools.
Here is a crude list of some which are available.

a C cross-reference generator
cflow, cxref, calls, cscope, xscope, or ixfw
a C beautifier/pretty-printer
cb, indent, GNU indent, or vgrind
a revision control or configuration management tool
RCS or SCCS
a C source obfuscator (shrouder)
obfus, shroud, or opqcp
a ``make'' dependency generator
makedepend, or try cc -M or cpp -M
tools to compute code metrics
ccount, Metre, lcount, or csize, or see URL http://www.qucis.queensu.ca:1999/SoftwareEngineering/Cmetrics.html ; there is also a package sold by McCabe and Associates
a C lines-of-source counter
this can be done very crudely with the standard Unix utility wc, and considerably better with
grep -c ";"
a prototype generator
see question 11.31
a tool to track down malloc problems
see question 18.2
a ``selective'' C preprocessor
see question 10.18
language translation tools
see questions 11.31 and 20.26
C verifiers (lint)
see question 18.7
a C compiler!
see question 18.3
(This list of tools is by no means complete; if you know of tools not mentioned, you're welcome to
contact this list's maintainer.)
Other lists of tools, and discussion about them, can be found in the Usenet newsgroups comp.compilers
Question 18.1
and comp.software-eng .
Question 10.18
Question 10.18
I inherited some code which contains far too many #ifdef's for my taste. How can I preprocess the
code to leave only one conditional compilation set, without running it through the preprocessor and
expanding all of the #include's and #define's as well?
There are programs floating around called unifdef, rmifdef, and scpp (``selective C
preprocessor'') which do exactly this. See question 18.16.
Question 10.16
Question 10.16
How can I use a preprocessor #if expression to tell if a machine is big-endian or little-endian?
You probably can't. (Preprocessor arithmetic uses only long integers, and there is no concept of
addressing. ) Are you sure you need to know the machine's endianness explicitly? Usually it's better to
write code which doesn't care ). See also question 20.9.
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 20.9
Question 20.9
How can I determine whether a machine's byte order is big-endian or little-endian?
One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else
printf("big-endian\n");
It's also possible to use a union.
References: H&S Sec. 6.1.2 pp. 163-4
Question 20.8
Question 20.8
How can I implement sets or arrays of bits?
Use arrays of char or int, with a few macros to access the desired bit at the proper index. Here are
some simple macros to use with arrays of char:
#include <limits.h>
#define
#define
#define
#define
/* for CHAR_BIT */
BITMASK(b) (1 << ((b) % CHAR_BIT))

BITSLOT(b) ((b) / CHAR_BIT)
BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))
(If you don't have <limits.h>, try using 8 for CHAR_BIT.)

Question 20.6
Question 20.6
If I have a char * variable pointing to the name of a function, how can I call that function?
The most straightforward thing to do is to maintain a correspondence table of names and function
pointers:
int func(), anotherfunc();
struct { char *name; int (*funcptr)(); } symtab[] = {
"func",
func,
"anotherfunc", anotherfunc,
};
Then, search the table for the name, and call via the associated function pointer. See also questions 2.15
and 19.36.
Question 2.15
Question 2.15
How can I access structure fields by name at run time?
Build a table of names and offsets, using the offsetof() macro. The offset of field b in struct a
is
offsetb = offsetof(struct a, b)
If structp is a pointer to an instance of this structure, and field b is an int (with offset as computed
above), b's value can be set indirectly with
*(int *)((char *)structp + offsetb) = value;
Question 2.14
Question 2.14
How can I determine the byte offset of a field within a structure?
ANSI C defines the offsetof() macro, which should be used if available; see <stddef.h>. If you
don't have it, one possible implementation is
#define offsetof(type, mem) ((size_t) \
((char *)&((type *)0)->mem - (char *)(type *)0))
This implementation is not 100% portable; some compilers may legitimately refuse to accept it.
See question 2.15 for a usage hint.
ISO Sec. 7.1.6
H&S Sec. 11.1 pp. 292-3
Question 2.13
Question 2.13
Why does sizeof report a larger size than I expect for a structure type, as if there were padding at the
end?
Structures may have this padding (as well as internal padding), if necessary, to ensure that alignment
properties will be preserved when an array of contiguous structures is allocated. Even when the structure
is not part of an array, the end padding remains, so that sizeof can always return a consistent size. See
question 2.12.
C Programming FAQs: Frequently Asked Questions
C Programming FAQs: Frequently Asked

Questions
`Ì think it's safe to say that no person today can hope to achieve basic life competence
without consulting my work on a regular basis.''
--- Cecil Adams
A major revision and expansion of the comp.lang.c FAQ list was published in late 1995 by AddisonWesley, to wit:
Author: Steve Summit
Title: C Programming FAQs: Frequently Asked Questions
Publisher: Addison-Wesley
Copyright: 1996
ISBN: 0-201-84519-9
Most technical bookstores, as well as several of the large chains, are carrying this book. If you can't find
it, you can order it (in the U.S., at least) direct from Addison-Wesley by calling 800-282-0693. I'm sure
you could also order it on-line from Amazon.com or other on-line booksellers.
Addison-Wesley has web pages describing this book as well as many others. You can also browse the
content corresponding to the on-line version of the FAQ list (but note that this corresponds to only about
half of the actual book!).
As the Preface mentions, `ìt can be as hard to eradicate the last error from a large manuscript as it is to
stamp out the last bug in a program.'' If you've already obtained a copy of the book, you'll want to skim
this errata list. (
Updated with C99 changes.)
I'd like to publicly thank Addison-Wesley for their support of the FAQ list, for giving me the opportunity
to put it out in book form, and for being very easy to work with. All first-time authors should have it so
http://www.eskimo.com/~scs/C-faq/book/ (1 of 2) [22/07/2003 5:22:15 PM]
C Programming FAQs: Frequently Asked Questions
lucky.
scs home page
http://www.eskimo.com/~scs/C-faq/book/ (2 of 2) [22/07/2003 5:22:15 PM]
C Programming Notes
C Programming Notes
Introductory C Programming Class Notes, Chapter 1
Steve Summit
These notes are part of the UW Experimental College course on Introductory C Programming. They are
based on notes prepared (beginning in Spring, 1995) to supplement the book The C Programming
Language, by Brian Kernighan and Dennis Ritchie, or K&R as the book and its authors are affectionately
known. (The second edition was published in 1988 by Prentice-Hall, ISBN 0-13-110362-8.) These notes
are now (as of Winter, 1995-6) intended to be stand-alone, although the sections are still cross-referenced
to those of K&R, for the reader who wants to pursue a more in-depth exposition.
Chapter 1: Introduction
Chapter 2: Basic Data Types and Operators
Chapter 3: Statements and Control Flow
Chapter 4: More about Declarations (and Initialization)
Chapter 6: Basic I/O
Chapter 7: More Operators
Chapter 8: Strings
Chapter 9: The C Preprocessor
Chapter 10: Pointers
Chapter 11: Memory Allocation
http://www.eskimo.com/~scs/cclass/notes/top.html (1 of 2) [22/07/2003 5:22:17 PM]
C Programming Notes
Chapter 13: Reading the Command Line

Chapter 14: What's Next?
Read Sequentially
http://www.eskimo.com/~scs/cclass/notes/top.html (2 of 2) [22/07/2003 5:22:17 PM]
C is (as K&R admit) a relatively small language, but one which (to its admirers, anyway) wears well. C's
small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in the
way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you, there's a
lot you have to do yourself. (Actually, this is viewed by many as an additional advantage: anything the
language doesn't do for you, it doesn't dictate to you, either, so you're free to do that something however
you want.)
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable program
resulting from a C program is not efficient, it's probably because of something silly you did, not because
of something the compiler did behind your back which you have no control over. In any case, there's no
point in complaining about C's low-level flavor: C is what it is.
A programming language is a tool, and no tool can perform every task unaided. If you're building a
house, and I'm teaching you how to use a hammer, and you ask how to assemble rafters and trusses into
gables, that's a legitimate question, but the answer has fallen out of the realm of ``How do I use a
hammer?'' and into ``How do I build a house?''. In the same way, we'll see that C does not have built-in
features to perform every function that we might ever need to do while programming.
As mentioned above, C imposes relatively few built-in ways of doing things on the programmer. Some
common tasks, such as manipulating strings, allocating memory, and doing input/output (I/O), are
performed by calling on library functions. Other tasks which you might want to do, such as creating or
listing directories, or interacting with a mouse, or displaying windows or other user-interface elements,
or doing color graphics, are not defined by the C language at all. You can do these things from a C
program, of course, but you will be calling on services which are peculiar to your programming
environment (compiler, processor, and operating system) and which are not defined by the C standard.
Since this course is about portable C programming, it will also be steering clear of facilities not provided
in all C environments.
Another aspect of C that's worth mentioning here is that it is, to put it bluntly, a bit dangerous. C does
not, in general, try hard to protect a programmer from mistakes. If you write a piece of code which will
(through some oversight of yours) do something wildly different from what you intended it to do, up to
http://www.eskimo.com/~scs/cclass/notes/sx1.html (1 of 2) [22/07/2003 5:22:20 PM]
and including deleting your data or trashing your disk, and if it is possible for the compiler to compile it,
it generally will. You won't get warnings of the form ``Do you really mean to...?'' or `Àre you sure you
really want to...?''. C is often compared to a sharp knife: it can do a surgically precise job on some
exacting task you have in mind, but it can also do a surgically precise job of cutting off your finger. It's
up to you to use it carefully.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
1.1 A First Example
1.2 Second Example
1.3 Program Structure

1.1 A First Example
1.1 A First Example

[This section corresponds to K&R Sec. 1.1]
The best way to learn programming is to dive right in and start writing real programs. This way, concepts
which would otherwise seem abstract make sense, and the positive feedback you get from getting even a
small program to work gives you a great incentive to improve it or write the next one.
Diving in with ``real'' programs right away has another advantage, if only pragmatic: if you're using a
conventional compiler, you can't run a fragment of a program and see what it does; nothing will run until
you have a complete (if tiny or trivial) program. You can't learn everything you'd need to write a
complete program all at once, so you'll have to take some things `òn faith'' and parrot them in your first
programs before you begin to understand them. (You can't learn to program just one expression or
statement at a time any more than you can learn to speak a foreign language one word at a time. If all you
know is a handful of words, you can't actually say anything: you also need to know something about the
language's word order and grammar and sentence structure and declension of articles and verbs.)
Besides the occasional necessity to take things on faith, there is a more serious potential drawback of this
``dive in and program'' approach: it's a small step from learning-by-doing to learning-by-trial-and-error,
and when you learn programming by trial-and-error, you can very easily learn many errors. When you're
not sure whether something will work, or you're not even sure what you could use that might work, and
you try something, and it does work, you do not have any guarantee that what you tried worked for the
right reason. You might just have ``learned'' something that works only by accident or only on your
compiler, and it may be very hard to un-learn it later, when it stops working.
Therefore, whenever you're not sure of something, be very careful before you go off and try it ``just to
see if it will work.'' Of course, you can never be absolutely sure that something is going to work before
you try it, otherwise we'd never have to try things. But you should have an expectation that something is
going to work before you try it, and if you can't predict how to do something or whether something
would work and find yourself having to determine it experimentally, make a note in your mind that
whatever you've just learned (based on the outcome of the experiment) is suspect.
The first example program in K&R is the first example program in any language: print or display a
simple string, and exit. Here is my version of K&R's ``hello, world'' program:
#include <stdio.h>
main()
{
printf("Hello, world!\n");
return 0;
}
http://www.eskimo.com/~scs/cclass/notes/sx1a.html (1 of 5) [22/07/2003 5:22:24 PM]
1.1 A First Example
If you have a C compiler, the first thing to do is figure out how to type this program in and compile it and
run it and see where its output went. (If you don't have a C compiler yet, the first thing to do is to find
one.)
The first line is practically boilerplate; it will appear in almost all programs we write. It asks that some
definitions having to do with the ``Standard I/O Library'' be included in our program; these definitions
are needed if we are to call the library function printf correctly.
The second line says that we are defining a function named main. Most of the time, we can name our
functions anything we want, but the function name main is special: it is the function that will be
``called'' first when our program starts running. The empty pair of parentheses indicates that our main
function accepts no arguments, that is, there isn't any information which needs to be passed in when the
function is called.
The braces { and } surround a list of statements in C. Here, they surround the list of statements making
up the function main.
The line
is the first statement in the program. It asks that the function printf be called; printf is a library
function which prints formatted output. The parentheses surround printf's argument list: the
information which is handed to it which it should act on. The semicolon at the end of the line terminates
the statement.
(printf's name reflects the fact that C was first developed when Teletypes and other printing terminals
were still in widespread use. Today, of course, video displays are far more common. printf's ``prints''
to the standard output, that is, to the default location for program output to go. Nowadays, that's almost
always a video screen or a window on that screen. If you do have a printer, you'll typically have to do
something extra to get a program to print to it.)
printf's first (and, in this case, only) argument is the string which it should print. The string, enclosed
in double quotes "", consists of the words ``Hello, world!'' followed by a special sequence: \n. In
strings, any two-character sequence beginning with the backslash \ represents a single special character.
The sequence \n represents the ``new line'' character, which prints a carriage return or line feed or
whatever it takes to end one line of output and move down to the next. (This program only prints one line
of output, but it's still important to terminate it.)
The second line in the main function is
1.1 A First Example
return 0;
In general, a function may return a value to its caller, and main is no exception. When main returns
(that is, reaches its end and stops functioning), the program is at its end, and the return value from main
tells the operating system (or whatever invoked the program that main is the main function of) whether
it succeeded or not. By convention, a return value of 0 indicates success.
This program may look so absolutely trivial that it seems as if it's not even worth typing it in and trying
to run it, but doing so may be a big (and is certainly a vital) first hurdle. On an unfamiliar computer, it
can be arbitrarily difficult to figure out how to enter a text file containing program source, or how to
compile and link it, or how to invoke it, or what happened after (if?) it ran. The most experienced C
programmers immediately go back to this one, simple program whenever they're trying out a new system
or a new way of entering or building programs or a new way of printing output from within programs. As
Kernighan and Ritchie say, everything else is comparatively easy.
How you compile and run this (or any) program is a function of the compiler and operating system you're
using. The first step is to type it in, exactly as shown; this may involve using a text editor to create a file
containing the program text. You'll have to give the file a name, and all C compilers (that I've ever heard
of) require that files containing C source end with the extension .c. So you might place the program text
in a file called hello.c.
The second step is to compile the program. (Strictly speaking, compilation consists of two steps,
compilation proper followed by linking, but we can overlook this distinction at first, especially because
the compiler often takes care of initiating the linking step automatically.) On many Unix systems, the
command to compile a C program from a source file hello.c is
cc -o hello hello.c
You would type this command at the Unix shell prompt, and it requests that the cc (C compiler) program
be run, placing its output (i.e. the new executable program it creates) in the file hello, and taking its
input (i.e. the source code to be compiled) from the file hello.c.
The third step is to run (execute, invoke) the newly-built hello program. Again on a Unix system, this
is done simply by typing the program's name:
hello
Depending on how your system is set up (in particular, on whether the current directory is searched for
executables, based on the PATH variable), you may have to type
./hello
1.1 A First Example
to indicate that the hello program is in the current directory (as opposed to some ``bin'' directory full
of executable programs, elsewhere).
You may also have your choice of C compilers. On many Unix machines, the cc command is an older
compiler which does not recognize modern, ANSI Standard C syntax. An old compiler will accept the
simple programs we'll be starting with, but it will not accept most of our later programs. If you find
yourself getting baffling compilation errors on programs which you've typed in exactly as they're shown,
it probably indicates that you're using an older compiler. On many machines, another compiler called
acc or gcc is available, and you'll want to use it, instead. (Both acc and gcc are typically invoked the
same as cc; that is, the above cc command would instead be typed, say, gcc -o hello hello.c .)
(One final caveat about Unix systems: don't name your test programs test, because there's already a
standard command called test, and you and the command interpreter will get badly confused if you try
to replace the system's test command with your own, not least because your own almost certainly does
something completely different.)
Under MS-DOS, the compilation procedure is quite similar. The name of the command you type will
depend on your compiler (e.g. cl for the Microsoft C compiler, tc or bcc for Borland's Turbo C, etc.).
You may have to manually perform the second, linking step, perhaps with a command named link or
tlink. The executable file which the compiler/linker creates will have a name ending in .exe (or
perhaps .com), but you can still invoke it by typing the base name (e.g. hello). See your compiler
documentation for complete details; one of the manuals should contain a demonstration of how to enter,
compile, and run a small program that prints some simple output, just as we're trying to describe here.
In an integrated or ``visual'' progamming environment, such as those on the Macintosh or under various
versions of Microsoft Windows, the steps you take to enter, compile, and run a program are somewhat
different (and, theoretically, simpler). Typically, there is a way to open a new source window, type
source code into it, give it a file name, and add it to the program (or ``project'') you're building. If
necessary, there will be a way to specify what other source files (or ``modules'') make up the program.
Then, there's a button or menu selection which compiles and runs the program, all from within the
programming environment. (There will also be a way to create a standalone executable file which you
can run from outside the environment.) In a PC-compatible environment, you may have to choose
between creating DOS programs or Windows programs. (If you have troubles pertaining to the printf
function, try specifying a target environment of MS-DOS. Supposedly, some compilers which are
targeted at Windows environments won't let you call printf, because until you call some fancier
functions to request that a window be created, there's no window for printf to print to.) Again, check
the introductory or tutorial manual that came with the programming package; it should walk you through
the steps necessary to get your first program running.

1.1 A First Example
1.2 Second Example
1.2 Second Example

Our second example is of little more practical use than the first, but it introduces a few more
programming language elements:
#include <stdio.h>
/* print a few numbers, to illustrate a simple loop */
main()
{
int i;
for(i = 0; i < 10; i = i + 1)
printf("i is %d\n", i);
return 0;
}
As before, the line #include <stdio.h> is boilerplate which is necessary since we're calling the
printf function, and main() and the pair of braces {} indicate and delineate the function named
main we're (again) writing.
The first new line is the line
/* print a few numbers, to illustrate a simple loop */
which is a comment. Anything between the characters /* and */ is ignored by the compiler, but may be
useful to a person trying to read and understand the program. You can add comments anywhere you want
to in the program, to document what the program is, what it does, who wrote it, how it works, what the
various functions are for and how they work, what the various variables are for, etc.
The second new line, down within the function main, is
int i;
which declares that our function will use a variable named i. The variable's type is int, which is a plain
integer.
Next, we set up a loop:
for(i = 0; i < 10; i = i + 1)
http://www.eskimo.com/~scs/cclass/notes/sx1b.html (1 of 2) [22/07/2003 5:22:26 PM]
1.2 Second Example
The keyword for indicates that we are setting up a ``for loop.'' A for loop is controlled by three
expressions, enclosed in parentheses and separated by semicolons. These expressions say that, in this
case, the loop starts by setting i to 0, that it continues as long as i is less than 10, and that after each
iteration of the loop, i should be incremented by 1 (that is, have 1 added to its value).
Finally, we have a call to the printf function, as before, but with several differences. First, the call to
printf is within the body of the for loop. This means that control flow does not pass once through the
printf call, but instead that the call is performed as many times as are dictated by the for loop. In this
case, printf will be called several times: once when i is 0, once when i is 1, once when i is 2, and so
on until i is 9, for a total of 10 times.
A second difference in the printf call is that the string to be printed, "i is %d", contains a percent
sign. Whenever printf sees a percent sign, it indicates that printf is not supposed to print the exact
text of the string, but is instead supposed to read another one of its arguments to decide what to print. The
letter after the percent sign tells it what type of argument to expect and how to print it. In this case, the
letter d indicates that printf is to expect an int, and to print it in decimal. Finally, we see that
printf is in fact being called with another argument, for a total of two, separated by commas. The
second argument is the variable i, which is in fact an int, as required by %d. The effect of all of this is
that each time it is called, printf will print a line containing the current value of the variable i:
i is 0
i is 1
i is 2
...
After several trips through the loop, i will eventually equal 9. After that trip through the loop, the third
control expression i = i + 1 will increment its value to 10. The condition i < 10 is no longer true,
so no more trips through the loop are taken. Instead, control flow jumps down to the statement following
the for loop, which is the return statement. The main function returns, and the program is finished.


We'll have more to say later about program structure, but for now let's observe a few basics. A program
consists of one or more functions; it may also contain global variables. (Our two example programs so
far have contained one function apiece, and no global variables.) At the top of a source file are typically a
few boilerplate lines such as #include <stdio.h>, followed by the definitions (i.e. code) for the
functions. (It's also possible to split up the several functions making up a larger program into several
source files, as we'll see in a later chapter.)
Each function is further composed of declarations and statements, in that order. When a sequence of
statements should act as one (for example, when they should all serve together as the body of a loop)
they can be enclosed in braces (just as for the outer body of the entire function). The simplest kind of
statement is an expression statement, which is an expression (presumably performing some useful
operation) followed by a semicolon. Expressions are further composed of operators, objects (variables),
and constants.
C source code consists of several lexical elements. Some are words, such as for, return, main, and
i, which are either keywords of the language (for, return) or identifiers (names) we've chosen for our
own functions and variables (main, i). There are constants such as 1 and 10 which introduce new
values into the program. There are operators such as =, +, and >, which manipulate variables and values.
There are other punctuation characters (often called delimiters), such as parentheses and squiggly braces
{}, which indicate how the other elements of the program are grouped. Finally, all of the preceding
elements can be separated by whitespace: spaces, tabs, and the ``carriage returns'' between lines.
The source code for a C program is, for the most part, ``free form.'' This means that the compiler does not
care how the code is arranged: how it is broken into lines, how the lines are indented, or whether
whitespace is used between things like variable names and other punctuation. (Lines like #include
<stdio.h> are an exception; they must appear alone on their own lines, generally unbroken. Only
lines beginning with # are affected by this rule; we'll see other examples later.) You can use whitespace,
indentation, and appropriate line breaks to make your programs more readable for yourself and other
people (even though the compiler doesn't care). You can place explanatory comments anywhere in your
program--any text between the characters /* and */ is ignored by the compiler. (In fact, the compiler
pretends that all it saw was whitespace.) Though comments are ignored by the compiler, well-chosen
comments can make a program much easier to read (for its author, as well as for others).
The usage of whitespace is our first style issue. It's typical to leave a blank line between different parts of
the program, to leave a space on either side of operators such as + and =, and to indent the bodies of
loops and other control flow constructs. Typically, we arrange the indentation so that the subsidiary
statements controlled by a loop statement (the ``loop body,'' such as the printf call in our second
example program) are all aligned with each other and placed one tab stop (or some consistent number of
spaces) to the right of the controlling statement. This indentation (like all whitespace) is not required by
the compiler, but it makes programs much easier to read. (However, it can also be misleading, if used
http://www.eskimo.com/~scs/cclass/notes/sx1c.html (1 of 3) [22/07/2003 5:28:35 PM]
incorrectly or in the face of inadvertent mistakes. The compiler will decide what ``the body of the loop''
is based on its own rules, not the indentation, so if the indentation does not match the compiler's
interpretation, confusion is inevitable.)
To drive home the point that the compiler doesn't care about indentation, line breaks, or other
whitespace, here are a few (extreme) examples: The fragments
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for(i = 0; i < 10; i = i + 1) printf("%d\n", i);
and
for(i=0;i<10;i=i+1)printf("%d\n",i);
and
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for
=
i
;
i
)
"%d\n"
)
(
0
<
i
+
printf
,
;
i
;
10
=
1
(
i
and
for
(i=0;
i<10;i=
i+1)printf
("%d\n", i);
are all treated exactly the same way by the compiler.

Some programmers argue forever over the best set of ``rules'' for indentation and other aspects of
programming style, calling to mind the old philosopher's debates about the number of angels that could
dance on the head of a pin. Style issues (such as how a program is laid out) are important, but they're not
something to be too dogmatic about, and there are also other, deeper style issues besides mere layout and
typography. Kernighan and Ritchie take a fairly moderate stance:
Although C compilers do not care about how a program looks, proper indentation and
spacing are critical in making programs easy for people to read. We recommend writing
only one statement per line, and using blanks around operators to clarify grouping. The
position of braces is less important, although people hold passionate beliefs. We have
There is some value in having a reasonably standard style (or a few standard styles) for code layout.
Please don't take the above advice to ``pick a style that suits you'' as an invitation to invent your own
brand-new style. If (perhaps after you've been programming in C for a while) you have specific
objections to specific facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If you want to place your
own stamp of originality on the programs that you write, there are better avenues for your creativity than
inventing a bizarre layout; you might instead try to make the logic easier to follow, or the user interface
easier to use, or the code freer of bugs.)

Chapter 2: Basic Data Types and

Operators
The type of a variable determines what kinds of values it may take on. An operator computes new values
out of old ones. An expression consists of variables, constants, and operators combined to perform some
useful computation. In this chapter, we'll learn about C's basic types, how to write constants and declare
variables of these types, and what the basic operators are.
As Kernighan and Ritchie say, ``The type of an object determines the set of values it can have and what
operations can be performed on it.'' This is a fairly formal, mathematical definition of what a type is, but
it is traditional (and meaningful). There are several implications to remember:
1. The ``set of values'' is finite. C's int type can not represent all of the integers; its float type
can not represent all floating-point numbers.
2. When you're using an object (that is, a variable) of some type, you may have to remember what
values it can take on and what operations you can perform on it. For example, there are several
operators which play with the binary (bit-level) representation of integers, but these operators are
not meaningful for and may not be applied to floating-point operands.
3. When declaring a new variable and picking a type for it, you have to keep in mind the values and
operations you'll be needing.
In other words, picking a type for a variable is not some abstract academic exercise; it's closely
connected to the way(s) you'll be using that variable.
2.1 Types
2.2 Constants
2.3 Declarations
2.4 Variable Names
2.5 Arithmetic Operators
2.6 Assignment Operators
2.7 Function Calls

2.1 Types
2.1 Types
There are only a few basic data types in C. The first ones we'll be encountering and using are:
char a character
int an integer, in the range -32,767 to 32,767
long int a larger integer (up to +-2,147,483,647)
float a floating-point number
double a floating-point number, with more precision and perhaps greater range than float
If you can look at this list of basic types and say to yourself, `Òh, how simple, there are only a few
types, I won't have to worry much about choosing among them,'' you'll have an easy time with
declarations. (Some masochists wish that the type system were more complicated so that they could
specify more things about each variable, but those of us who would rather not have to specify these extra
things each time are glad that we don't have to.)
The ranges listed above for types int and long int are the guaranteed minimum ranges. On some
systems, either of these types (or, indeed, any C type) may be able to hold larger values, but a program
that depends on extended ranges will not be as portable. Some programmers become obsessed with
knowing exactly what the sizes of data objects will be in various situations, and go on to write programs
which depend on these exact sizes. Determining or controlling the size of an object is occasionally
important, but most of the time we can sidestep size issues and let the compiler do most of the worrying.
(From the ranges listed above, we can determine that type int must be at least 16 bits, and that type
long int must be at least 32 bits. But neither of these sizes is exact; many systens have 32-bit ints,
and some systems have 64-bit long ints.)
You might wonder how the computer stores characters. The answer involves a character set, which is
simply a mapping between some set of characters and some set of small numeric codes. Most machines
today use the ASCII character set, in which the letter A is represented by the code 65, the ampersand & is
represented by the code 38, the digit 1 is represented by the code 49, the space character is represented
by the code 32, etc. (Most of the time, of course, you have no need to know or even worry about these
particular code values; they're automatically translated into the right shapes on the screen or printer when
characters are printed out, and they're automatically generated when you type characters on the keyboard.
Eventually, though, we'll appreciate, and even take some control over, exactly when these translations-from characters to their numeric codes--are performed.) Character codes are usually small--the largest
code value in ASCII is 126, which is the ~ (tilde or circumflex) character. Characters usually fit in a byte,
which is usually 8 bits. In C, type char is defined as occupying one byte, so it is usually 8 bits.
2.1 Types
Most of the simple variables in most programs are of types int, long int, or double. Typically,
we'll use int and double for most purposes, and long int any time we need to hold integer values
greater than 32,767. As we'll see, even when we're manipulating individual characters, we'll usually use
an int variable, for reasons to be discussed later. Therefore, we'll rarely use individual variables of type
char; although we'll use plenty of arrays of char.

2.2 Constants
2.2 Constants
A constant is just an immediate, absolute value found in an expression. The simplest constants are
decimal integers, e.g. 0, 1, 2, 123 . Occasionally it is useful to specify constants in base 8 or base 16
(octal or hexadecimal); this is done by prefixing an extra 0 (zero) for octal, or 0x for hexadecimal: the
constants 100, 0144, and 0x64 all represent the same number. (If you're not using these non-decimal
constants, just remember not to use any leading zeroes. If you accidentally write 0123 intending to get
one hundred and twenty three, you'll get 83 instead, which is 123 base 8.)
We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler's. The
compiler doesn't care; it always converts everything into binary internally, anyway. (There is, however,
no good way to specify constants in source code in binary.)
A constant can be forced to be of type long int by suffixing it with the letter L (in upper or lower
case, although upper case is strongly recommended, because a lower case l looks too much like the digit
1).
A constant that contains a decimal point or the letter e (or both) is a floating-point constant: 3.14, 10.,
.01, 123e4, 123.456e7 . The e indicates multiplication by a power of 10; 123.456e7 is 123.456
times 10 to the 7th, or 1,234,560,000. (Floating-point constants are of type double by default.)
We also have constants for specifying characters and strings. (Make sure you understand the difference
between a character and a string: a character is exactly one character; a string is a set of zero or more
characters; a string containing one character is distinct from a lone character.) A character constant is
simply a single character between single quotes: 'A', '.', '%'. The numeric value of a character
constant is, naturally enough, that character's value in the machine's character set. (In ASCII, for
example, 'A' has the value 65.)
A string is represented in C as a sequence or array of characters. (We'll have more to say about arrays in
general, and strings in particular, later.) A string constant is a sequence of zero or more characters
enclosed in double quotes: "apple", "hello, world", "this is a test".
Within character and string constants, the backslash character \ is special, and is used to represent
characters not easily typed on the keyboard or for various reasons not easily typed in constants. The most
common of these ``character escapes'' are:
\n
\b
a ``newline'' character
a backspace
2.2 Constants
\r
\'
\"
\\
a
a
a
a
carriage return (without a line feed)

single quote (e.g. in a character constant)
double quote (e.g. in a string constant)
single backslash
For example, "he said \"hi\"" is a string constant which contains two double quotes, and '\'' is
a character constant consisting of a (single) single quote. Notice once again that the character constant
'A' is very different from the string constant "A".

2.3 Declarations
2.3 Declarations
Informally, a variable (also called an object) is a place you can store a value. So that you can refer to it
unambiguously, a variable needs a name. You can think of the variables in your program as a set of
boxes or cubbyholes, each with a label giving its name; you might imagine that storing a value `ìn'' a
variable consists of writing the value on a slip of paper and placing it in the cubbyhole.
A declaration tells the compiler the name and type of a variable you'll be using in your program. In its
simplest form, a declaration consists of the type, the name of the variable, and a terminating semicolon:
char c;
int i;
float f;
You can also declare several variables of the same type in one declaration, separating them with
commas:
int i1, i2;
Later we'll see that declarations may also contain initializers, qualifiers and storage classes, and that we
can declare arrays, functions, pointers, and other kinds of data structures.
The placement of declarations is significant. You can't place them just anywhere (i.e. they cannot be
interspersed with the other statements in your program). They must either be placed at the beginning of a
function, or at the beginning of a brace-enclosed block of statements (which we'll learn about in the next
chapter), or outside of any function. Furthermore, the placement of a declaration, as well as its storage
class, controls several things about its visibility and lifetime, as we'll see later.
You may wonder why variables must be declared before use. There are two reasons:
1. It makes things somewhat easier on the compiler; it knows right away what kind of storage to
allocate and what code to emit to store and manipulate each variable; it doesn't have to try to intuit
the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly;
you must think about them enough to pick appropriate types for them. (The compiler's error
messages to you, telling you that you apparently forgot to declare a variable, are as often helpful
as they are a nuisance: they're helpful when they tell you that you misspelled a variable, or forgot
to think about exactly how you were going to use it.)
2.3 Declarations
Although there are a few places where declarations can be omitted (in which case the compiler will
assume an implicit declaration), making use of these removes the advantages of reason 2 above, so I
recommend always declaring everything explicitly.
Most of the time, I recommend writing one declaration per line. For the most part, the compiler doesn't
care what order declarations are in. You can order the declarations alphabetically, or in the order that
they're used, or to put related declarations next to each other. Collecting all variables of the same type
together on one line essentially orders declarations by type, which isn't a very useful order (it's only
slightly more useful than random order).
A declaration for a variable can also contain an initial value. This initializer consists of an equals sign
and an expression, which is usually a single constant:
int i = 1;
int i1 = 10, i2 = 20;

2.4 Variable Names
2.4 Variable Names

Within limits, you can give your variables and functions any names you want. These names (the formal
term is `ìdentifiers'') consist of letters, numbers, and underscores. For our purposes, names must begin
with a letter. Theoretically, names can be as long as you want, but extremely long ones get tedious to
type after a while, and the compiler is not required to keep track of extremely long ones perfectly. (What
this means is that if you were to name a variable, say,
supercalafragalisticespialidocious, the compiler might get lazy and pretend that you'd
named it supercalafragalisticespialidocio, such that if you later misspelled it
supercalafragalisticespialidociouz, the compiler wouldn't catch your mistake. Nor would
the compiler necessarily be able to tell the difference if for some perverse reason you deliberately
declared a second variable named supercalafragalisticespialidociouz.)
The capitalization of names in C is significant: the variable names variable, Variable, and
VARIABLE (as well as silly combinations like variAble) are all distinct.
A final restriction on names is that you may not use keywords (the words such as int and for which
are part of the syntax of the language) as the names of variables or functions (or as identifiers of any
kind).

http://www.eskimo.com/~scs/cclass/notes/sx2d.html [22/07/2003 5:29:03 PM]

The basic operators for performing arithmetic are the same in many computer languages:
+
*
/
%
addition
subtraction
multiplication
division
modulus (remainder)
The - operator can be used in two ways: to subtract two numbers (as in a - b), or to negate one
number (as in -a + b or a + -b).
When applied to integers, the division operator / discards any remainder, so 1 / 2 is 0 and 7 / 4 is
1. But when either operand is a floating-point quantity (type float or double), the division operator
yields a floating-point result, with a potentially nonzero fractional part. So 1 / 2.0 is 0.5, and 7.0 /
4.0 is 1.75.
The modulus operator % gives you the remainder when two integers are divided: 1 % 2 is 1; 7 % 4 is
3. (The modulus operator can only be applied to integers.)
An additional arithmetic operation you might be wondering about is exponentiation. Some languages
have an exponentiation operator (typically ^ or **), but C doesn't. (To square or cube a number, just
multiply it by itself.)
Multiplication, division, and modulus all have higher precedence than addition and subtraction. The term
``precedence'' refers to how ``tightly'' operators bind to their operands (that is, to the things they operate
on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7, not 9. In
other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.
All of these operators ``group'' from left to right, which means that when two or more of them have the
same precedence and participate next to each other in an expression, the evaluation conceptually
proceeds from left to right. For example, 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not
+2. (``Grouping'' is sometimes called associativity, although the term is used somewhat differently in
programming than it is in mathematics. Not all C operators group from left to right; a few group from
right to left.)
Whenever the default precedence or associativity doesn't give you the grouping you want, you can
http://www.eskimo.com/~scs/cclass/notes/sx2e.html (1 of 2) [22/07/2003 5:29:07 PM]
always use explicit parentheses. For example, if you wanted to add 1 to 2 and then multiply the result by
3, you could write (1 + 2) * 3.
By the way, the word `àrithmetic'' as used in the title of this section is an adjective, not a noun, and it's
pronounced differently than the noun: the accent is on the third syllable.


The assignment operator = assigns a value to a variable. For example,
x = 1
sets x to 1, and
a = b
sets a to whatever b's value is. The expression
i = i + 1
is, as we've mentioned elsewhere, the standard programming idiom for increasing a variable's value by 1:
this expression takes i's old value, adds 1 to it, and stores it back into i. (C provides several ``shortcut''
operators for modifying variables in this and similar ways, which we'll meet later.)
We've called the = sign the `àssignment operator'' and referred to `àssignment expressions'' because, in
fact, = is an operator just like + or -. C does not have `àssignment statements''; instead, an assignment
like a = b is an expression and can be used wherever any expression can appear. Since it's an
expression, the assignment a = b has a value, namely, the same value that's assigned to a. This value
can then be used in a larger expression; for example, we might write
c = a = b
which is equivalent to
c = (a = b)
and assigns b's value to both a and c. (The assignment operator, therefore, groups from right to left.)
Later we'll see other circumstances in which it can be useful to use the value of an assignment
expression.
It's usually a matter of style whether you initialize a variable with an initializer in its declaration or with
an assignment expression near where you first use it. That is, there's no particular difference between
int a = 10;
http://www.eskimo.com/~scs/cclass/notes/sx2f.html (1 of 2) [22/07/2003 5:29:09 PM]
and
int a;
/* later... */
a = 10;

2.7 Function Calls
2.7 Function Calls

We'll have much more to say about functions in a later chapter, but for now let's just look at how they're
called. (To review: what a function is is a piece of code, written by you or by someone else, which
performs some useful, compartmentalizable task.) You call a function by mentioning its name followed
by a pair of parentheses. If the function takes any arguments, you place the arguments between the
parentheses, separated by commas. These are all function calls:
printf("Hello, world!\n")
printf("%d\n", i)
sqrt(144.)
getchar()
The arguments to a function can be arbitrary expressions. Therefore, you don't have to say things like
int sum = a + b + c;
printf("sum = %d\n", sum);
if you don't want to; you can instead collapse it to
printf("sum = %d\n", a + b + c);
Many functions return values, and when they do, you can embed calls to these functions within larger
expressions:
c = sqrt(a * a + b * b)
x = r * cos(theta)
i = f1(f2(j))
The first expression squares a and b, computes the square root of the sum of the squares, and assigns the
result to c. (In other words, it computes a * a + b * b, passes that number to the sqrt function,
and assigns sqrt's return value to c.) The second expression passes the value of the variable theta to
the cos (cosine) function, multiplies the result by r, and assigns the result to x. The third expression
passes the value of the variable j to the function f2, passes the return value of f2 immediately to the
function f1, and finally assigns f1's return value to the variable i.

http://www.eskimo.com/~scs/cclass/notes/sx2g.html (1 of 2) [22/07/2003 5:29:12 PM]
2.7 Function Calls

Statements are the ``steps'' of a program. Most statements compute and assign values or call functions,
but we will eventually meet several other kinds of statements as well. By default, statements are executed
in sequence, one after another. We can, however, modify that sequence by using control flow constructs
which arrange that a statement or group of statements is executed only if some condition is true or false,
or executed over and over again to form a loop. (A somewhat different kind of control flow happens
when we call a function: execution of the caller is suspended while the called function proceeds. We'll
discuss functions in chapter 5.)
My definitions of the terms statement and control flow are somewhat circular. A statement is an element
within a program which you can apply control flow to; control flow is how you specify the order in
which the statements in your program are executed. (A weaker definition of a statement might be `à part
of your program that does something,'' but this definition could as easily be applied to expressions or
functions.)
3.1 Expression Statements
3.2 if Statements
3.3 Boolean Expressions
3.4 while Loops
3.5 for Loops
3.6 break and continue

http://www.eskimo.com/~scs/cclass/notes/sx3.html [22/07/2003 5:29:20 PM]

Most of the statements in a C program are expression statements. An expression statement is simply an
expression followed by a semicolon. The lines
i = 0;
i = i + 1;
and
are all expression statements. (In some languages, such as Pascal, the semicolon separates statements,
such that the last statement is not followed by a semicolon. In C, however, the semicolon is a statement
terminator; all simple statements are followed by semicolons. The semicolon is also used for a few other
things in C; we've already seen that it terminates declarations, too.)
Expression statements do all of the real work in a C program. Whenever you need to compute new values
for variables, you'll typically use expression statements (and they'll typically contain assignment
operators). Whenever you want your program to do something visible, in the real world, you'll typically
call a function (as part of an expression statement). We've already seen the most basic example: calling
the function printf to print text to the screen. But anything else you might do--read or write a disk file,
talk to a modem or printer, draw pictures on the screen--will also involve function calls. (Furthermore,
the functions you call to do these things are usually different depending on which operating system
you're using. The C language does not define them, so we won't be talking about or using them much.)
Expressions and expression statements can be arbitrarily complicated. They don't have to consist of
exactly one simple function call, or of one simple assignment to a variable. For one thing, many
functions return values, and the values they return can then be used by other parts of the expression. For
example, C provides a sqrt (square root) function, which we might use to compute the hypotenuse of a
right triangle like this:
c = sqrt(a*a + b*b);
To be useful, an expression statement must do something; it must have some lasting effect on the state of
the program. (Formally, a useful statement must have at least one side effect.) The first two sample
expression statements in this section (above) assign new values to the variable i, and the third one calls
printf to print something out, and these are good examples of statements that do something useful.
(To make the distinction clear, we may note that degenerate constructions such as
0;
i;
or
i + 1;
are syntactically valid statements--they consist of an expression followed by a semicolon--but in each
case, they compute a value without doing anything with it, so the computed value is discarded, and the
statement is useless. But if the ``degenerate'' statements in this paragraph don't make much sense to you,
don't worry; it's because they, frankly, don't make much sense.)
It's also possible for a single expression to have multiple side effects, but it's easy for such an expression
to be (a) confusing or (b) undefined. For now, we'll only be looking at expressions (and, therefore,
statements) which do one well-defined thing at a time.

3.2 <TT>if</TT> Statements
3.2 if Statements
The simplest way to modify the control flow of a program is with an if statement, which in its simplest
form looks like this:
if(x > max)
max = x;
Even if you didn't know any C, it would probably be pretty obvious that what happens here is that if x is
greater than max, x gets assigned to max. (We'd use code like this to keep track of the maximum value
of x we'd seen--for each new x, we'd compare it to the old maximum value max, and if the new value
was greater, we'd update max.)
More generally, we can say that the syntax of an if statement is:
if( expression )
statement
where expression is any expression and statement is any statement.
What if you have a series of statements, all of which should be executed together or not at all depending
on whether some condition is true? The answer is that you enclose them in braces:
if( expression )
{
statement<sub>1</sub>
}
As a general rule, anywhere the syntax of C calls for a statement, you may write a series of statements
enclosed by braces. (You do not need to, and should not, put a semicolon after the closing brace, because
the series of statements enclosed by braces is not itself a simple expression statement.)
An if statement may also optionally contain a second statement, the `èlse clause,'' which is to be
executed if the condition is not met. Here is an example:
if(n > 0)
average = sum / n;
else
{
printf("can't compute average\n");
average = 0;
}
The first statement or block of statements is executed if the condition is true, and the second statement or
block of statements (following the keyword else) is executed if the condition is not true. In this
example, we can compute a meaningful average only if n is greater than 0; otherwise, we print a message
saying that we cannot compute the average. The general syntax of an if statement is therefore
if( expression )
else
(where both statement<sub>1</sub> and statement<sub>2</sub> may be lists of statements
enclosed in braces).
It's also possible to nest one if statement inside another. (For that matter, it's in general possible to nest
any kind of statement or control flow construct within another.) For example, here is a little piece of code
which decides roughly which quadrant of the compass you're walking into, based on an x value which is
positive if you're walking east, and a y value which is positive if you're walking north:
if(x > 0)
{
if(y > 0)
printf("Northeast.\n");
else
printf("Southeast.\n");
}
else
{
if(y > 0)
printf("Northwest.\n");
else
printf("Southwest.\n");
}
When you have one if statement (or loop) nested inside another, it's a very good idea to use explicit
braces {}, as shown, to make it clear (both to you and to the compiler) how they're nested and which
else goes with which if. It's also a good idea to indent the various levels, also as shown, to make the
code more readable to humans. Why do both? You use indentation to make the code visually more
readable to yourself and other humans, but the compiler doesn't pay attention to the indentation (since all
whitespace is essentially equivalent and is essentially ignored). Therefore, you also have to make sure
that the punctuation is right.
Here is an example of another common arrangement of if and else. Suppose we have a variable
grade containing a student's numeric grade, and we want to print out the corresponding letter grade.
Here is code that would do the job:
if(grade >= 90)
printf("A");
else if(grade >= 80)
printf("B");
printf("C");
printf("D");
else
printf("F");
What happens here is that exactly one of the five printf calls is executed, depending on which of the
conditions is true. Each condition is tested in turn, and if one is true, the corresponding statement is
executed, and the rest are skipped. If none of the conditions is true, we fall through to the last one,
printing ``F''.
In the cascaded if/else/if/else/... chain, each else clause is another if statement. This may be
more obvious at first if we reformat the example, including every set of braces and indenting each if
statement relative to the previous one:
if(grade >= 90)
{
printf("A");
}
else
{
if(grade >= 80)
{
printf("B");
}
else
{
if(grade >= 70)
{
printf("C");
}
else
{
if(grade >= 60)
{
printf("D");
}
else
{
printf("F");
}
}
}
}
By examining the code this way, it should be obvious that exactly one of the printf calls is executed,
and that whenever one of the conditions is found true, the remaining conditions do not need to be
checked and none of the later statements within the chain will be executed. But once you've convinced
yourself of this and learned to recognize the idiom, it's generally preferable to arrange the statements as
in the first example, without trying to indent each successive if statement one tabstop further out.
(Obviously, you'd run into the right margin very quickly if the chain had just a few more cases!)


An if statement like
if(x > max)
max = x;
is perhaps deceptively simple. Conceptually, we say that it checks whether the condition x > max is
``true'' or ``false''. The mechanics underlying C's conception of ``true'' and ``false,'' however, deserve
some explanation. We need to understand how true and false values are represented, and how they are
interpreted by statements like if.
As far as C is concerned, a true/false condition can be represented as an integer. (An integer can
represent many values; here we care about only two values: ``true'' and ``false.'' The study of
mathematics involving only two values is called Boolean algebra, after George Boole, a mathematician
who refined this study.) In C, ``false'' is represented by a value of 0 (zero), and ``true'' is represented by
any value that is nonzero. Since there are many nonzero values (at least 65,534, for values of type int),
when we have to pick a specific value for ``true,'' we'll pick 1.
The relational operators such as <, <=, >, and >= are in fact operators, just like +, -, *, and /. The
relational operators take two values, look at them, and ``return'' a value of 1 or 0 depending on whether
the tested relation was true or false. The complete set of relational operators in C is:
<
<=
>
>=
==
!=
less than
less than or equal
greater than
greater than or equal
equal
not equal
For example, 1 < 2 is 1, 3 > 4 is 0, 5 == 5 is 1, and 6 != 6 is 0.

We've now encountered perhaps the most easy-to-stumble-on ``gotcha!'' in C: the equality-testing
operator is ==, not a single =, which is assignment. If you accidentally write
if(a = 0)
(and you probably will at some point; everybody makes this mistake), it will not test whether a is zero,
as you probably intended. Instead, it will assign 0 to a, and then perform the ``true'' branch of the if
statement if a is nonzero. But a will have just been assigned the value 0, so the ``true'' branch will never
be taken! (This could drive you crazy while debugging--you wanted to do something if a was 0, and after
the test, a is 0, whether it was supposed to be or not, but the ``true'' branch is nevertheless not taken.)
The relational operators work with arbitrary numbers and generate true/false values. You can also
combine true/false values by using the Boolean operators, which take true/false values as operands and
compute new true/false values. The three Boolean operators are:
&&
||
!
and
or
not (takes one operand; `ùnary'')
The && (`ànd'') operator takes two true/false values and produces a true (1) result if both operands are
true (that is, if the left-hand side is true and the right-hand side is true). The || (`òr'') operator takes two
true/false values and produces a true (1) result if either operand is true. The ! (``not'') operator takes a
single true/false value and negates it, turning false to true and true to false (0 to 1 and nonzero to 0).
For example, to test whether the variable i lies between 1 and 10, you might use
if(1 < i && i < 10)
...
Here we're expressing the relation `ì is between 1 and 10'' as ``1 is less than i and i is less than 10.''
It's important to understand why the more obvious expression
if(1 < i < 10)
/* WRONG */
would not work. The expression 1 < i < 10 is parsed by the compiler analogously to 1 + i + 10.
The expression 1 + i + 10 is parsed as (1 + i) + 10 and means `àdd 1 to i, and then add the
result to 10.'' Similarly, the expression 1 < i < 10 is parsed as (1 < i) < 10 and means ``see if 1
is less than i, and then see if the result is less than 10.'' But in this case, ``the result'' is 1 or 0, depending
on whether i is greater than 1. Since both 0 and 1 are less than 10, the expression 1 < i < 10 would
always be true in C, regardless of the value of i!
Relational and Boolean expressions are usually used in contexts such as an if statement, where
something is to be done or not done depending on some condition. In these cases what's actually checked
is whether the expression representing the condition has a zero or nonzero value. As long as the
expression is a relational or Boolean expression, the interpretation is just what we want. For example,
when we wrote
if(x > max)
the > operator produced a 1 if x was greater than max, and a 0 otherwise. The if statement interprets 0
as false and 1 (or any nonzero value) as true.
But what if the expression is not a relational or Boolean expression? As far as C is concerned, the
controlling expression (of conditional statements like if) can in fact be any expression: it doesn't have to
``look like'' a Boolean expression; it doesn't have to contain relational or logical operators. All C looks at
(when it's evaluating an if statement, or anywhere else where it needs a true/false value) is whether the
expression evaluates to 0 or nonzero. For example, if you have a variable x, and you want to do
something if x is nonzero, it's possible to write
if(x)
statement
and the statement will be executed if x is nonzero (since nonzero means ``true'').
This possibility (that the controlling expression of an if statement doesn't have to ``look like'' a Boolean
expression) is both useful and potentially confusing. It's useful when you have a variable or a function
that is ``conceptually Boolean,'' that is, one that you consider to hold a true or false (actually nonzero or
zero) value. For example, if you have a variable verbose which contains a nonzero value when your
program should run in verbose mode and zero when it should be quiet, you can write things like
if(verbose)
printf("Starting first pass\n");
and this code is both legal and readable, besides which it does what you want. The standard library
contains a function isupper() which tests whether a character is an upper-case letter, so if c is a
character, you might write
if(isupper(c))
...
Both of these examples (verbose and isupper()) are useful and readable.
However, you will eventually come across code like
if(n)
average = sum / n;
where n is just a number. Here, the programmer wants to compute the average only if n is nonzero
(otherwise, of course, the code would divide by 0), and the code works, because, in the context of the if
statement, the trivial expression n is (as always) interpreted as ``true'' if it is nonzero, and ``false'' if it is
zero.
``Coding shortcuts'' like these can seem cryptic, but they're also quite common, so you'll need to be able
to recognize them even if you don't choose to write them in your own code. Whenever you see code like
if(x)
or
if(f())
where x or f() do not have obvious ``Boolean'' names, you can read them as `ìf x is nonzero'' or `ìf
f() returns nonzero.''

3.4 <TT>while</TT> Loops
3.4 while Loops

[This section corresponds to half of K&R Sec. 3.5]
Loops generally consist of two parts: one or more control expressions which (not surprisingly) control
the execution of the loop, and the body, which is the statement or set of statements which is executed
over and over.
The most basic loop in C is the while loop. A while loop has one control expression, and executes as
long as that expression is true. This example repeatedly doubles the number 2 (2, 4, 8, 16, ...) and prints
the resulting numbers as long as they are less than 1000:
int x = 2;
while(x < 1000)
{
printf("%d\n", x);
x = x * 2;
}
(Once again, we've used braces {} to enclose the group of statements which are to be executed together
as the body of the loop.)
The general syntax of a while loop is
while( expression )
statement
A while loop starts out like an if statement: if the condition expressed by the expression is true, the
statement is executed. However, after executing the statement, the condition is tested again, and if it's
still true, the statement is executed again. (Presumably, the condition depends on some value which is
changed in the body of the loop.) As long as the condition remains true, the body of the loop is executed
over and over again. (If the condition is false right at the start, the body of the loop is not executed at all.)
As another example, if you wanted to print a number of blank lines, with the variable n holding the
number of blank lines to be printed, you might use code like this:
while(n > 0)
{
printf("\n");
n = n - 1;
http://www.eskimo.com/~scs/cclass/notes/sx3d.html (1 of 2) [22/07/2003 5:29:34 PM]
3.4 <TT>while</TT> Loops
}
After the loop finishes (when control ``falls out'' of it, due to the condition being false), n will have the
value 0.
You use a while loop when you have a statement or group of statements which may have to be
executed a number of times to complete their task. The controlling expression represents the condition
``the loop is not done'' or ``there's more work to do.'' As long as the expression is true, the body of the
loop is executed; presumably, it makes at least some progress at its task. When the expression becomes
false, the task is done, and the rest of the program (beyond the loop) can proceed. When we think about a
loop in this way, we can seen an additional important property: if the expression evaluates to ``false''
before the very first trip through the loop, we make zero trips through the loop. In other words, if the task
is already done (if there's no work to do) the body of the loop is not executed at all. (It's always a good
idea to think about the ``boundary conditions'' in a piece of code, and to make sure that the code will
work correctly when there is no work to do, or when there is a trivial task to do, such as sorting an array
of one number. Experience has shown that bugs at boundary conditions are quite common.)

3.5 <TT>for</TT> Loops
3.5 for Loops

[This section corresponds to the other half of K&R Sec. 3.5]
Our second loop, which we've seen at least one example of already, is the for loop. The first one we
saw was:
for (i = 0; i < 10; i = i + 1)
More generally, the syntax of a for loop is
for( expr<sub>1</sub> ; expr<sub>2</sub> ; expr<sub>3</sub> )
statement
(Here we see that the for loop has three control expressions. As always, the statement can be a braceenclosed block.)
Many loops are set up to cause some variable to step through a range of values, or, more generally, to set
up an initial condition and then modify some value to perform each succeeding loop as long as some
condition is true. The three expressions in a for loop encapsulate these conditions:
expr<sub>1</sub> sets up the initial condition, expr<sub>2</sub> tests whether another trip
through the loop should be taken, and expr<sub>3</sub> increments or updates things after each trip
through the loop and prior to the next one. In our first example, we had i = 0 as expr<sub>1</sub>,
i < 10 as expr<sub>2</sub>, i = i + 1 as expr<sub>3</sub>, and the call to printf as
statement, the body of the loop. So the loop began by setting i to 0, proceeded as long as i was less than
10, printed out i's value during each trip through the loop, and added 1 to i between each trip through
the loop.
When the compiler sees a for loop, first, expr<sub>1</sub> is evaluated. Then,
expr<sub>2</sub> is evaluated, and if it is true, the body of the loop (statement) is executed. Then,
expr<sub>3</sub> is evaluated to go to the next step, and expr<sub>2</sub> is evaluated again, to
see if there is a next step. During the execution of a for loop, the sequence is:
expr<sub>1</sub>
expr<sub>2</sub>
statement
expr<sub>3</sub>
expr<sub>2</sub>
statement
expr<sub>3</sub>
...
expr<sub>2</sub>
statement
expr<sub>3</sub>
expr<sub>2</sub>
The first thing executed is expr<sub>1</sub>. expr<sub>3</sub> is evaluated after every trip
through the loop. The last thing executed is always expr<sub>2</sub>, because when
expr<sub>2</sub> evaluates false, the loop exits.
All three expressions of a for loop are optional. If you leave out expr<sub>1</sub>, there simply is
no initialization step, and the variable(s) used with the loop had better have been initialized already. If
you leave out expr<sub>2</sub>, there is no test, and the default for the for loop is that another trip
through the loop should be taken (such that unless you break out of it some other way, the loop runs
forever). If you leave out expr<sub>3</sub>, there is no increment step.
The semicolons separate the three controlling expressions of a for loop. (These semicolons, by the way,
have nothing to do with statement terminators.) If you leave out one or more of the expressions, the
semicolons remain. Therefore, one way of writing a deliberately infinite loop in C is
for(;;)
...
It's useful to compare C's for loop to the equivalent loops in other computer languages you might know.
The C loop
for(i = x; i <= y; i = i + z)
is roughly equivalent to:
for I = X to Y step Z
(BASIC)
do 10 i=x,y,z
(FORTRAN)
for i := x to y
(Pascal)
In C (unlike FORTRAN), if the test condition is false before the first trip through the loop, the loop won't
be traversed at all. In C (unlike Pascal), a loop control variable (in this case, i) is guaranteed to retain its
final value after the loop completes, and it is also legal to modify the control variable within the loop, if
you really want to. (When the loop terminates due to the test condition turning false, the value of the
control variable after the loop will be the first value for which the condition failed, not the last value for
which it succeeded.)
It's also worth noting that a for loop can be used in more general ways than the simple, iterative
examples we've seen so far. The ``control variable'' of a for loop does not have to be an integer, and it
does not have to be incremented by an additive increment. It could be `ìncremented'' by a multiplicative
factor (1, 2, 4, 8, ...) if that was what you needed, or it could be a floating-point variable, or it could be
another type of variable which we haven't met yet which would step, not over numeric values, but over
the elements of an array or other data structure. Strictly speaking, a for loop doesn't have to have a
``control variable'' at all; the three expressions can be anything, although the loop will make the most
sense if they are related and together form the expected initialize, test, increment sequence.
The powers-of-two example of the previous section does fit this pattern, so we could rewrite it like this:
int x;
for(x = 2; x < 1000; x = x * 2)
printf("%d\n", x);
There is no earth-shaking or fundamental difference between the while and for loops. In fact, given
the general for loop
for(expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
you could usually rewrite it as a while loop, moving the initialize and increment expressions to
statements before and within the loop:
expr<sub>1</sub> ;
while(expr<sub>2</sub>)
{
statement
expr<sub>3</sub> ;
}
Similarly, given the general while loop
while(expr)
statement
you could rewrite it as a for loop:
for(; expr; )
statement
Another contrast between the for and while loops is that although the test expression
(expr<sub>2</sub>) is optional in a for loop, it is required in a while loop. If you leave out the
controlling expression of a while loop, the compiler will complain about a syntax error. (To write a
deliberately infinite while loop, you have to supply an expression which is always nonzero. The most
obvious one would simply be while(1) .)
If it's possible to rewrite a for loop as a while loop and vice versa, why do they both exist? Which one
should you choose? In general, when you choose a for loop, its three expressions should all manipulate
the same variable or data structure, using the initialize, test, increment pattern. If they don't manipulate
the same variable or don't follow that pattern, wedging them into a for loop buys nothing and a while
loop would probably be clearer. (The reason that one loop or the other can be clearer is simply that, when
you see a for loop, you expect to see an idiomatic initialize/test/increment of a single variable, and if the
for loop you're looking at doesn't end up matching that pattern, you've been momentarily misled.)

3.6 <TT>break</TT> and <TT>continue</TT>
3.6 break and continue

Sometimes, due to an exceptional condition, you need to jump out of a loop early, that is, before the main
controlling expression of the loop causes it to terminate normally. Other times, in an elaborate loop, you
may want to jump back to the top of the loop (to test the controlling expression again, and perhaps begin
a new trip through the loop) without playing out all the steps of the current loop. The break and
continue statements allow you to do these two things. (They are, in fact, essentially restricted forms of
goto.)
To put everything we've seen in this chapter together, as well as demonstrate the use of the break
statement, here is a program for printing prime numbers between 1 and 100:
#include <stdio.h>
#include <math.h>
main()
{
int i, j;
printf("%d\n", 2);
for(i = 3; i <= 100; i = i + 1)
{
for(j = 2; j < i; j = j + 1)
{
if(i % j == 0)
break;
if(j > sqrt(i))
{
printf("%d\n", i);
break;
}
}
}
return 0;
}
The outer loop steps the variable i through the numbers from 3 to 100; the code tests to see if each
number has any divisors other than 1 and itself. The trial divisor j loops from 2 up to i. j is a divisor of
3.6 <TT>break</TT> and <TT>continue</TT>
i if the remainder of i divided by j is 0, so the code uses C's ``remainder'' or ``modulus'' operator % to
make this test. (Remember that i % j gives the remainder when i is divided by j.)
If the program finds a divisor, it uses break to break out of the inner loop, without printing anything.
But if it notices that j has risen higher than the square root of i, without its having found any divisors,
then i must not have any divisors, so i is prime, and its value is printed. (Once we've determined that i
is prime by noticing that j > sqrt(i), there's no need to try the other trial divisors, so we use a
second break statement to break out of the loop in that case, too.)
The simple algorithm and implementation we used here (like many simple prime number algorithms)
does not work for 2, the only even prime number, so the program ``cheats'' and prints out 2 no matter
what, before going on to test the numbers from 3 to 100.
Many improvements to this simple program are of course possible; you might experiment with it. (Did
you notice that the ``test'' expression of the inner loop for(j = 2; j < i; j = j + 1) is in a
sense unnecessary, because the loop always terminates early due to one of the two break statements?)

Chapter 4: More about Declarations (and Initialization)
Chapter 4: More about Declarations (and

Initialization)
4.1 Arrays
4.2 Visibility and Lifetime (Global Variables, etc.)
4.3 Default Initialization
4.4 Examples

4.1 Arrays
4.1 Arrays
So far, we've been declaring simple variables: the declaration
int i;
declares a single variable, named i, of type int. It is also possible to declare an array of several
elements. The declaration
int a[10];
declares an array, named a, consisting of ten elements, each of type int. Simply speaking, an array is a
variable that can hold more than one value. You specify which of the several values you're referring to at
any given time by using a numeric subscript. (Arrays in programming are similar to vectors or matrices
in mathematics.) We can represent the array a above with a picture like this:
In C, arrays are zero-based: the ten elements of a 10-element array are numbered from 0 to 9. The
subscript which specifies a single element of an array is simply an integer expression in square brackets.
The first element of the array is a[0], the second element is a[1], etc. You can use these `àrray
subscript expressions'' anywhere you can use the name of a simple variable, for example:
a[0] = 10;
a[1] = 20;
a[2] = a[0] + a[1];
Notice that the subscripted array references (i.e. expressions such as a[0] and a[1]) can appear on
either side of the assignment operator.
The subscript does not have to be a constant like 0 or 1; it can be any integral expression. For example,
it's common to loop over all elements of an array:
int i;
for(i = 0; i < 10; i = i + 1)
a[i] = 0;
This loop sets all ten elements of the array a to 0.
Arrays are a real convenience for many problems, but there is not a lot that C will do with them for you
4.1 Arrays
automatically. In particular, you can neither set all elements of an array at once nor assign one array to
another; both of the assignments
a = 0;
/* WRONG */
int b[10];
b = a;
/* WRONG */
and
are illegal.
To set all of the elements of an array to some value, you must do so one by one, as in the loop example
above. To copy the contents of one array to another, you must again do so one by one:
int b[10];
for(i = 0; i < 10; i = i + 1)
b[i] = a[i];
Remember that for an array declared
int a[10];
there is no element a[10]; the topmost element is a[9]. This is one reason that zero-based loops are
also common in C. Note that the for loop
for(i = 0; i < 10; i = i + 1)
...
does just what you want in this case: it starts at 0, the number 10 suggests (correctly) that it goes through
10 iterations, but the less-than comparison means that the last trip through the loop has i set to 9. (The
comparison i <= 9 would also work, but it would be less clear and therefore poorer style.)
In the little examples so far, we've always looped over all 10 elements of the sample array a. It's
common, however, to use an array that's bigger than necessarily needed, and to use a second variable to
keep track of how many elements of the array are currently in use. For example, we might have an
integer variable
int na;
/* number of elements of a[] in use */
4.1 Arrays
Then, when we wanted to do something with a (such as print it out), the loop would run from 0 to na,
not 10 (or whatever a's size was):
for(i = 0; i < na; i = i + 1)
printf("%d\n", a[i]);
Naturally, we would have to ensure ensure that na's value was always less than or equal to the number of
elements actually declared in a.
Arrays are not limited to type int; you can have arrays of char or double or any other type.
Here is a slightly larger example of the use of arrays. Suppose we want to investigate the behavior of
rolling a pair of dice. The total roll can be anywhere from 2 to 12, and we want to count how often each
roll comes up. We will use an array to keep track of the counts: a[2] will count how many times we've
rolled 2, etc.
We'll simulate the roll of a die by calling C's random number generation function, rand(). Each time
you call rand(), it returns a different, pseudo-random integer. The values that rand() returns
typically span a large range, so we'll use C's modulus (or ``remainder'') operator % to produce random
numbers in the range we want. The expression rand() % 6 produces random numbers in the range 0
to 5, and rand() % 6 + 1 produces random numbers in the range 1 to 6.
Here is the program:
#include <stdio.h>
#include <stdlib.h>
main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
for(i = 2; i <= 12; i = i + 1)

a[i] = 0;
for(i = 0; i
{
d1 =
d2 =
a[d1
}
< 100; i = i + 1)
rand() % 6 + 1;
rand() % 6 + 1;
+ d2] = a[d1 + d2] + 1;
4.1 Arrays
for(i = 2; i <= 12; i = i + 1)

printf("%d: %d\n", i, a[i]);
return 0;
}
We include the header <stdlib.h> because it contains the necessary declarations for the rand()
function. We declare the array of size 13 so that its highest element will be a[12]. (We're wasting
a[0] and a[1]; this is no great loss.) The variables d1 and d2 contain the rolls of the two individual
dice; we add them together to decide which cell of the array to increment, in the line
a[d1 + d2] = a[d1 + d2] + 1;
After 100 rolls, we print the array out. Typically (as craps players well know), we'll see mostly 7's, and
relatively few 2's and 12's.
(By the way, it turns out that using the % operator to reduce the range of the rand function is not always
a good idea. We'll say more about this problem in an exercise.)
4.1.1 Array Initialization
4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)


Although it is not possible to assign to all elements of an array at once using an assignment expression, it
is possible to initialize some or all elements of an array when the array is defined. The syntax looks like
this:
int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
The list of values, enclosed in braces {}, separated by commas, provides the initial values for successive
elements of the array.
(Under older, pre-ANSI C compilers, you could not always supply initializers for ``local'' arrays inside
functions; you could only initialize ``global'' arrays, those outside of any function. Those compilers are
now rare, so you shouldn't have to worry about this distinction any more. We'll talk more about local and
global variables later in this chapter.)
If there are fewer initializers than elements in the array, the remaining elements are automatically
initialized to 0. For example,
int a[10] = {0, 1, 2, 3, 4, 5, 6};
would initialize a[7], a[8], and a[9] to 0. When an array definition includes an initializer, the array
dimension may be omitted, and the compiler will infer the dimension from the number of initializers. For
example,
int b[] = {10, 11, 12, 13, 14};
would declare, define, and initialize an array b of 5 elements (i.e. just as if you'd typed int b[5]).
Only the dimension is omitted; the brackets [] remain to indicate that b is in fact an array.
In the case of arrays of char, the initializer may be a string constant:
char s1[7] = "Hello,";
char s2[10] = "there,";
char s3[] = "world!";
As before, if the dimension is omitted, it is inferred from the size of the string initializer. (We haven't
covered strings in detail yet--we'll do so in chapter 8--but it turns out that all strings in C are terminated
by a special character with the value 0. Therefore, the array s3 will be of size 7, and the explicitly-sized
s1 does need to be of size at least 7. For s2, the last 4 characters in the array will all end up being this
zero-value character.)
http://www.eskimo.com/~scs/cclass/notes/sx4aa.html (1 of 2) [22/07/2003 5:30:52 PM]

http://www.eskimo.com/~scs/cclass/notes/sx4aa.html (2 of 2) [22/07/2003 5:30:52 PM]

[This section is optional and may be skipped.]
When we said that `Àrrays are not limited to type int; you can have arrays of... any other type,'' we
meant that more literally than you might have guessed. If you have an `àrray of int,'' it means that you
have an array each of whose elements is of type int. But you can have an array each of whose elements
is of type x, where x is any type you choose. In particular, you can have an array each of whose elements
is another array! We can use these arrays of arrays for the same sorts of tasks as we'd use
multidimensional arrays in other computer languages (or matrices in mathematics). Naturally, we are not
limited to arrays of arrays, either; we could have an array of arrays of arrays, which would act like a 3dimensional array, etc.
The declaration of an array of arrays looks like this:
int a2[5][7];
You have to read complicated declarations like these `ìnside out.'' What this one says is that a2 is an
array of 5 somethings, and that each of the somethings is an array of 7 ints. More briefly, `à2 is an
array of 5 arrays of 7 ints,'' or, `à2 is an array of array of int.'' In the declaration of a2, the brackets
closest to the identifier a2 tell you what a2 first and foremost is. That's how you know it's an array of 5
arrays of size 7, not the other way around. You can think of a2 as having 5 ``rows'' and 7 ``columns,''
although this interpretation is not mandatory. (You could also treat the ``first'' or inner subscript as ``x''
and the second as ``y.'' Unless you're doing something fancy, all you have to worry about is that the
subscripts when you access the array match those that you used when you declared it, as in the examples
below.)
To illustrate the use of multidimensional arrays, we might fill in the elements of the above array a2 using
this piece of code:
int i, j;
for(i = 0; i < 5; i = i + 1)
{
for(j = 0; j < 7; j = j + 1)
a2[i][j] = 10 * i + j;
}
This pair of nested loops sets a[1][2] to 12, a[4][1] to 41, etc. Since the first dimension of a2 is 5,
the first subscripting index variable, i, runs from 0 to 4. Similarly, the second subscript varies from 0 to
6.
We could print a2 out (in a two-dimensional way, suggesting its structure) with a similar pair of nested
http://www.eskimo.com/~scs/cclass/notes/sx4ba.html (1 of 3) [22/07/2003 5:30:55 PM]
loops:
for(i = 0; i < 5; i = i + 1)
{
for(j = 0; j < 7; j = j + 1)
printf("%d\t", a2[i][j]);
printf("\n");
}
(The character \t in the printf string is the tab character.)
Just to see more clearly what's going on, we could make the ``row'' and ``column'' subscripts explicit by
printing them, too:
for(j = 0; j < 7; j = j + 1)
printf("\t%d:", j);
printf("\n");
for(i = 0; i < 5; i = i + 1)
{
printf("%d:", i);
for(j = 0; j < 7; j = j + 1)
printf("\t%d", a2[i][j]);
printf("\n");
}
This last fragment would print
0:
1:
2:
3:
4:
0:
0
10
20
30
40
1:
1
11
21
31
41
2:
2
12
22
32
42
3:
3
13
23
33
43
4:
4
14
24
34
44
5:
5
15
25
35
45
6:
6
16
26
36
46
Finally, there's no reason we have to loop over the ``rows'' first and the ``columns'' second; depending on
what we wanted to do, we could interchange the two loops, like this:
for(j = 0; j < 7; j = j + 1)
{
for(i = 0; i < 5; i = i + 1)
printf("%d\t", a2[i][j]);
printf("\n");
}
Notice that i is still the first subscript and it still runs from 0 to 4, and j is still the second subscript and
it still runs from 0 to 6.


We haven't said so explicitly, but variables are channels of communication within a program. You set a
variable to a value at one point in a program, and at another point (or points) you read the value out
again. The two points may be in adjoining statements, or they may be in widely separated parts of the
program.
How long does a variable last? How widely separated can the setting and fetching parts of the program
be, and how long after a variable is set does it persist? Depending on the variable and how you're using it,
you might want different answers to these questions.
The visibility of a variable determines how much of the rest of the program can access that variable. You
can arrange that a variable is visible only within one part of one function, or in one function, or in one
source file, or anywhere in the program. (We haven't really talked about source files yet; we'll be
exploring them soon.)
Why would you want to limit the visibility of a variable? For maximum flexibility, wouldn't it be handy
if all variables were potentially visible everywhere? As it happens, that arrangement would be too
flexible: everywhere in the program, you would have to keep track of the names of all the variables
declared anywhere else in the program, so that you didn't accidentally re-use one. Whenever a variable
had the wrong value by mistake, you'd have to search the entire program for the bug, because any
statement in the entire program could potentially have modified that variable. You would constantly be
stepping all over yourself by using a common variable name like i in two parts of your program, and
having one snippet of code accidentally overwrite the values being used by another part of the code. The
communication would be sort of like an old party line--you'd always be accidentally interrupting other
conversations, or having your conversations interrupted.
To avoid this confusion, we generally give variables the narrowest or smallest visibility they need. A
variable declared within the braces {} of a function is visible only within that function; variables
declared within functions are called local variables. If another function somewhere else declares a local
variable with the same name, it's a different variable entirely, and the two don't clash with each other.
On the other hand, a variable declared outside of any function is a global variable, and it is potentially
visible anywhere within the program. You use global variables when you do want the communications
path to be able to travel to any part of the program. When you declare a global variable, you will usually
give it a longer, more descriptive name (not something generic like i) so that whenever you use it you
will remember that it's the same variable everywhere.
Another word for the visibility of variables is scope.
How long do variables last? By default, local variables (those declared within a function) have automatic
duration: they spring into existence when the function is called, and they (and their values) disappear
when the function returns. Global variables, on the other hand, have static duration: they last, and the
values stored in them persist, for as long as the program does. (Of course, the values can in general still
be overwritten, so they don't necessarily persist forever.)
Finally, it is possible to split a function up into several source files, for easier maintenance. When several
source files are combined into one program (we'll be seeing how in the next chapter) the compiler must
have a way of correlating the global variables which might be used to communicate between the several
source files. Furthermore, if a global variable is going to be useful for communication, there must be
exactly one of it: you wouldn't want one function in one source file to store a value in one global variable
named globalvar, and then have another function in another source file read from a different global
variable named globalvar. Therefore, a global variable should have exactly one defining instance, in
one place in one source file. If the same variable is to be used anywhere else (i.e. in some other source
file or files), the variable is declared in those other file(s) with an external declaration, which is not a
defining instance. The external declaration says, ``hey, compiler, here's the name and type of a global
variable I'm going to use, but don't define it here, don't allocate space for it; it's one that's defined
somewhere else, and I'm just referring to it here.'' If you accidentally have two distinct defining instances
for a variable of the same name, the compiler (or the linker) will complain that it is ``multiply defined.''
It is also possible to have a variable which is global in the sense that it is declared outside of any
function, but private to the one source file it's defined in. Such a variable is visible to the functions in that
source file but not to any functions in any other source files, even if they try to issue a matching
declaration.
You get any extra control you might need over visibility and lifetime, and you distinguish between
defining instances and external declarations, by using storage classes. A storage class is an extra
keyword at the beginning of a declaration which modifies the declaration in some way. Generally, the
storage class (if any) is the first word in the declaration, preceding the type name. (Strictly speaking, this
ordering has not traditionally been necessary, and you may see some code with the storage class, type
name, and other parts of a declaration in an unusual order.)
We said that, by default, local variables had automatic duration. To give them static duration (so that,
instead of coming and going as the function is called, they persist for as long as the function does), you
precede their declaration with the static keyword:
static int i;
By default, a declaration of a global variable (especially if it specifies an initial value) is the defining
instance. To make it an external declaration, of a variable which is defined somewhere else, you precede
it with the keyword extern:
extern int j;
Finally, to arrange that a global variable is visible only within its containing source file, you precede it
with the static keyword:
static int k;
Notice that the static keyword can do two different things: it adjusts the duration of a local variable
from automatic to static, or it adjusts the visibility of a global variable from truly global to private-to-thefile.
To summarize, we've talked about two different attributes of a variable: visibility and duration. These are
orthogonal, as shown in this table:
visibility
duration
automatic
local
normal local
variables
global
N/A
static
static local
variables
normal global
variables
We can also distinguish between file-scope global variables and truly global variables, based on the
presence or absence of the static keyword.
We can also distinguish between external declarations and defining instances of global variables, based
on the presence or absence of the extern keyword.


The duration of a variable (whether static or automatic) also affects its default initialization.
If you do not explicitly initialize them, automatic-duration variables (that is, local, non-static ones)
are not guaranteed to have any particular initial value; they will typically contain garbage. It is therefore
a fairly serious error to attempt to use the value of an automatic variable which has never been initialized
or assigned to: the program will either work incorrectly, or the garbage value may just happen to be
``correct'' such that the program appears to work correctly! However, the particular value that the
garbage takes on can vary depending literally on anything: other parts of the program, which compiler
was used, which hardware or operating system the program is running on, the time of day, the phase of
the moon. (Okay, maybe the phase of the moon is a bit of an exaggeration.) So you hardly want to say
that a program which uses an uninitialized variable ``works''; it may seem to work, but it works for the
wrong reason, and it may stop working tomorrow.
Static-duration variables (global and static local), on the other hand, are guaranteed to be initialized
to 0 if you do not use an explicit initializer in the definition.
(Once upon a time, there was another distinction between the initialization of automatic vs. static
variables: you could initialize aggregate objects, such as arrays, only if they had static duration. If your
compiler complains when you try to initialize a local array, it's probably an old, pre-ANSI compiler.
Modern, ANSI-compatible compilers remove this limitation, so it's no longer much of a concern.)

http://www.eskimo.com/~scs/cclass/notes/sx4c.html [22/07/2003 5:31:01 PM]
4.4 Examples
4.4 Examples
Here is an example demonstrating almost everything we've seen so far:
int globalvar = 1;
extern int anotherglobalvar;
static int privatevar;
f()
{
int localvar;
int localvar2 = 2;
static int persistentvar;
}
Here we have six variables, three declared outside and three declared inside of the function f().
globalvar is a global variable. The declaration we see is its defining instance (it happens also to
include an initial value). globalvar can be used anywhere in this source file, and it could be used in
other source files, too (as long as corresponding external declarations are issued in those other source
files).
anotherglobalvar is a second global variable. It is not defined here; the defining instance for it
(and its initialization) is somewhere else.
privatevar is a ``private'' global variable. It can be used anywhere within this source file, but
functions in other source files cannot access it, even if they try to issue external declarations for it. (If
other source files try to declare a global variable called ``privatevar'', they'll get their own; they
won't be sharing this one.) Since it has static duration and receives no explicit initialization,
privatevar will be initialized to 0.
localvar is a local variable within the function f(). It can be accessed only within the function f().
(If any other part of the program declares a variable named ``localvar'', that variable will be distinct
from the one we're looking at here.) localvar is conceptually ``created'' each time f() is called, and
disappears when f() returns. Any value which was stored in localvar last time f() was running
will be lost and will not be available next time f() is called. Furthermore, since it has no explicit
initializer, the value of localvar will in general be garbage each time f() is called.
localvar2 is also local, and everything that we said about localvar applies to it, except that since
its declaration includes an explicit initializer, it will be initialized to 2 each time f() is called.
4.4 Examples
Finally, persistentvar is again local to f(), but it does maintain its value between calls to f(). It
has static duration but no explicit initializer, so its initial value will be 0.
The defining instances and external declarations we've been looking at so far have all been of simple
variables. There are also defining instances and external declarations of functions, which we'll be looking
at in the next chapter.
(Also, don't worry about static variables for now if they don't make sense to you; they're a relatively
sophisticated concept, which you won't need to use at first.)
The term declaration is a general one which encompasses defining instances and external declarations;
defining instances and external declarations are two different kinds of declarations. Furthermore, either
kind of declaration suffices to inform the compiler of the name and type of a particular variable (or
function). If you have the defining instance of a global variable in a source file, the rest of that source file
can use that variable without having to issue any external declarations. It's only in source files where the
defining instance hasn't been seen that you need external declarations.
You will sometimes hear a defining instance referred to simply as a ``definition,'' and you will sometimes
hear an external declaration referred to simply as a ``declaration.'' These usages are mildly ambiguous, in
that you can't tell out of context whether a ``declaration'' is a generic declaration (that might be a defining
instance or an external declaration) or whether it's an external declaration that specifically is not a
defining instance. (Similarly, there are other constructions that can be called ``definitions'' in C, namely
the definitions of preprocessor macros, structures, and typedefs, none of which we've met.) In these
notes, we'll try to make things clear by using the unambiguous terms defining instance and external
declaration. Elsewhere, you may have to look at the context to determine how the terms ``definition'' and
``declaration'' are being used.

Chapter 5: Functions and Program

Structure
[This chapter corresponds to K&R chapter 4.]
A function is a ``black box'' that we've locked part of our program into. The idea behind a function is that
it compartmentalizes part of the program, and in particular, that the code within the function has some
useful properties:
1. It performs some well-defined task, which will be useful to other parts of the program.
2. It might be useful to other programs as well; that is, we might be able to reuse it (and without
having to rewrite it).
3. The rest of the program doesn't have to know the details of how the function is implemented. This
can make the rest of the program easier to think about.
4. The function performs its task well. It may be written to do a little more than is required by the
first program that calls it, with the anticipation that the calling program (or some other program)
may later need the extra functionality or improved performance. (It's important that a finished
function do its job well, otherwise there might be a reluctance to call it, and it therefore might not
achieve the goal of reusability.)
5. By placing the code to perform the useful task into a function, and simply calling the function in
the other parts of the program where the task must be performed, the rest of the program becomes
clearer: rather than having some large, complicated, difficult-to-understand piece of code repeated
wherever the task is being performed, we have a single simple function call, and the name of the
function reminds us which task is being performed.
6. Since the rest of the program doesn't have to know the details of how the function is implemented,
the rest of the program doesn't care if the function is reimplemented later, in some different way
(as long as it continues to perform its same task, of course!). This means that one part of the
program can be rewritten, to improve performance or add a new feature (or simply to fix a bug),
without having to rewrite the rest of the program.
Functions are probably the most important weapon in our battle against software complexity. You'll want
to learn when it's appropriate to break processing out into functions (and also when it's not), and how to
set up function interfaces to best achieve the qualities mentioned above: reuseability, information hiding,
clarity, and maintainability.
5.1 Function Basics
5.2 Function Prototypes
5.3 Function Philosophy
5.4 Separate Compilation--Logistics

5.1 Function Basics
5.1 Function Basics

So what defines a function? It has a name that you call it by, and a list of zero or more arguments or
parameters that you hand to it for it to act on or to direct its work; it has a body containing the actual
instructions (statements) for carrying out the task the function is supposed to perform; and it may give
you back a return value, of a particular type.
Here is a very simple function, which accepts one argument, multiplies it by 2, and hands that value
back:
int multbytwo(int x)
{
int retval;
retval = x * 2;
return retval;
}
On the first line we see the return type of the function (int), the name of the function (multbytwo),
and a list of the function's arguments, enclosed in parentheses. Each argument has both a name and a
type; multbytwo accepts one argument, of type int, named x. The name x is arbitrary, and is used
only within the definition of multbytwo. The caller of this function only needs to know that a single
argument of type int is expected; the caller does not need to know what name the function will use
internally to refer to that argument. (In particular, the caller does not have to pass the value of a variable
named x.)
Next we see, surrounded by the familiar braces, the body of the function itself. This function consists of
one declaration (of a local variable retval) and two statements. The first statement is a conventional
expression statement, which computes and assigns a value to retval, and the second statement is a
return statement, which causes the function to return to its caller, and also specifies the value which
the function returns to its caller.
The return statement can return the value of any expression, so we don't really need the local retval
variable; the function could be collapsed to
{
return x * 2;
}
How do we call a function? We've been doing so informally since day one, but now we have a chance to
call one that we've written, in full detail. Here is a tiny skeletal program to call multby2:
5.1 Function Basics
#include <stdio.h>
extern int multbytwo(int);
int main()
{
int i, j;
i = 3;
j = multbytwo(i);
printf("%d\n", j);
return 0;
}
This looks much like our other test programs, with the exception of the new line
This is an external function prototype declaration. It is an external declaration, in that it declares
something which is defined somewhere else. (We've already seen the defining instance of the function
multbytwo, but maybe the compiler hasn't seen it yet.) The function prototype declaration contains the
three pieces of information about the function that a caller needs to know: the function's name, return
type, and argument type(s). Since we don't care what name the multbytwo function will use to refer to
its first argument, we don't need to mention it. (On the other hand, if a function takes several arguments,
giving them names in the prototype may make it easier to remember which is which, so names may
optionally be used in function prototype declarations.) Finally, to remind us that this is an external
declaration and not a defining instance, the prototype is preceded by the keyword extern.
The presence of the function prototype declaration lets the compiler know that we intend to call this
function, multbytwo. The information in the prototype lets the compiler generate the correct code for
calling the function, and also enables the compiler to check up on our code (by making sure, for example,
that we pass the correct number of arguments to each function we call).
Down in the body of main, the action of the function call should be obvious: the line
j = multbytwo(i);
calls multbytwo, passing it the value of i as its argument. When multbytwo returns, the return value
is assigned to the variable j. (Notice that the value of main's local variable i will become the value of
multbytwo's parameter x; this is absolutely not a problem, and is a normal sort of affair.)
This example is written out in ``longhand,'' to make each step equivalent. The variable i isn't really
needed, since we could just as well call
5.1 Function Basics
j = multbytwo(3);
And the variable j isn't really needed, either, since we could just as well call
printf("%d\n", multbytwo(3));
Here, the call to multbytwo is a subexpression which serves as the second argument to printf. The
value returned by multbytwo is passed immediately to printf. (Here, as in general, we see the
flexibility and generality of expressions in C. An argument passed to a function may be an arbitrarily
complex subexpression, and a function call is itself an expression which may be embedded as a
subexpression within arbitrarily complicated surrounding expressions.)
We should say a little more about the mechanism by which an argument is passed down from a caller
into a function. Formally, C is call by value, which means that a function receives copies of the values of
its arguments. We can illustrate this with an example. Suppose, in our implementation of multbytwo,
we had gotten rid of the unnecessary retval variable like this:
{
x = x * 2;
return x;
}
We might wonder, if we wrote it this way, what would happen to the value of the variable i when we
called
j = multbytwo(i);
When our implementation of multbytwo changes the value of x, does that change the value of i up in
the caller? The answer is no. x receives a copy of i's value, so when we change x we don't change i.
However, there is an exception to this rule. When the argument you pass to a function is not a single
variable, but is rather an array, the function does not receive a copy of the array, and it therefore can
modify the array in the caller. The reason is that it might be too expensive to copy the entire array, and
furthermore, it can be useful for the function to write into the caller's array, as a way of handing back
more data than would fit in the function's single return value. We'll see an example of an array argument
(which the function deliberately writes into) in the next chapter.

5.1 Function Basics

In modern C programming, it is considered good practice to use prototype declarations for all functions
that you call. As we mentioned, these prototypes help to ensure that the compiler can generate correct
code for calling the functions, as well as allowing the compiler to catch certain mistakes you might make.
Strictly speaking, however, prototypes are optional. If you call a function for which the compiler has not
seen a prototype, the compiler will do the best it can, assuming that you're calling the function correctly.
If prototypes are a good idea, and if we're going to get in the habit of writing function prototype
declarations for functions we call that we've written (such as multbytwo), what happens for library
functions such as printf? Where are their prototypes? The answer is in that boilerplate line
#include <stdio.h>
we've been including at the top of all of our programs. stdio.h is conceptually a file full of external
declarations and other information pertaining to the ``Standard I/O'' library functions, including
printf. The #include directive (which we'll meet formally in a later chapter) arranges that all of the
declarations within stdio.h are considered by the compiler, rather as if we'd typed them all in
ourselves. Somewhere within these declarations is an external function prototype declaration for
printf, which satisfies the rule that there should be a prototype for each function we call. (For other
standard library functions we call, there will be other ``header files'' to include.) Finally, one more thing
about external function prototype declarations. We've said that the distinction between external
declarations and defining instances of normal variables hinges on the presence or absence of the keyword
extern. The situation is a little bit different for functions. The ``defining instance'' of a function is the
function, including its body (that is, the brace-enclosed list of declarations and statements implementing
the function). An external declaration of a function, even without the keyword extern, looks nothing
like a function declaration. Therefore, the keyword extern is optional in function prototype
declarations. If you wish, you can write
int multbytwo(int);
and this is just as good an external function prototype declaration as
(In the first form, without the extern, as soon as the compiler sees the semicolon, it knows it's not
going to see a function body, so the declaration can't be a definition.) You may want to stay in the habit
of using extern in all external declarations, including function declarations, since `èxtern = external
declaration'' is an easier rule to remember.


What makes a good function? The most important aspect of a good ``building block'' is that have a
single, well-defined task to perform. When you find that a program is hard to manage, it's often because
it has not been designed and broken up into functions cleanly. Two obvious reasons for moving code
down into a function are because:
1. It appeared in the main program several times, such that by making it a function, it can be written just
once, and the several places where it used to appear can be replaced with calls to the new function.
2. The main program was getting too big, so it could be made (presumably) smaller and more
manageable by lopping part of it off and making it a function.
These two reasons are important, and they represent significant benefits of well-chosen functions, but
they are not sufficient to automatically identify a good function. As we've been suggesting, a good
function has at least these two additional attributes:
3. It does just one well-defined task, and does it well.
4. Its interface to the rest of the program is clean and narrow.
Attribute 3 is just a restatement of two things we said above. Attribute 4 says that you shouldn't have to
keep track of too many things when calling a function. If you know what a function is supposed to do,
and if its task is simple and well-defined, there should be just a few pieces of information you have to
give it to act upon, and one or just a few pieces of information which it returns to you when it's done. If
you find yourself having to pass lots and lots of information to a function, or remember details of its
internal implementation to make sure that it will work properly this time, it's often a sign that the
function is not sufficiently well-defined. (A poorly-defined function may be an arbitrary chunk of code
that was ripped out of a main program that was getting too big, such that it essentially has to have access
to all of that main function's local variables.)
The whole point of breaking a program up into functions is so that you don't have to think about the
entire program at once; ideally, you can think about just one function at a time. We say that a good
function is a ``black box,'' which is supposed to suggest that the ``container'' it's in is opaque--callers
can't see inside it (and the function inside can't see out). When you call a function, you only have to
know what it does, not how it does it. When you're writing a function, you only have to know what it's
supposed to do, and you don't have to know why or under what circumstances its caller will be calling it.
(When designing a function, we should perhaps think about the callers just enough to ensure that the
function we're designing will be easy to call, and that we aren't accidentally setting things up so that
callers will have to think about any internal details.)
Some functions may be hard to write (if they have a hard job to do, or if it's hard to make them do it truly
well), but that difficulty should be compartmentalized along with the function itself. Once you've written
a ``hard'' function, you should be able to sit back and relax and watch it do that hard work on call from
the rest of your program. It should be pleasant to notice (in the ideal case) how much easier the rest of the
program is to write, now that the hard work can be deferred to this workhorse function.
(In fact, if a difficult-to-write function's interface is well-defined, you may be able to get away with
writing a quick-and-dirty version of the function first, so that you can begin testing the rest of the
program, and then go back later and rewrite the function to do the hard parts. As long as the function's
original interface anticipated the hard parts, you won't have to rewrite the rest of the program when you
fix the function.)
What I've been trying to say in the preceding few paragraphs is that functions are important for far more
important reasons than just saving typing. Sometimes, we'll write a function which we only call once,
just because breaking it out into a function makes things clearer and easier.
If you find that difficulties pervade a program, that the hard parts can't be buried inside black-box
functions and then forgotten about; if you find that there are hard parts which involve complicated
interactions among multiple functions, then the program probably needs redesigning.
For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program''
into a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. Any
function we find ourself writing will often be appropriately written in terms of sub-functions, sub-subfunctions, etc. (Furthermore, the ``main program,'' main(), is itself just a function.)


When a program consists of many functions, it can be convenient to split them up into several source
files. Among other things, this means that when a change is made, only the source file containing the
change has to be recompiled, not the whole program.
The job of putting the pieces of a program together and producing the final executable falls to a tool
called the linker. (We may or may not need to invoke the linker explicitly; a compiler often invokes it
automatically, as needed.) The linker looks through all of the pieces making up the program, sorting out
the external declarations and defining instances. The compiler has noted the definitions made by each
source file, as well as the declarations of things used by each source file but (presumably) defined
elsewhere. For each thing (global variable or function) used but not defined by one piece of the program,
the linker looks for another piece which does define that thing.
The logistics of writing a program in several source files, and then compiling and linking all of the
source files together, depend on the programming environment you're using. We'll cover two
possibilities, depending on whether you're using a traditional command-line compiler or a newer
integrated development environment (IDE) or other graphical user interface (GUI) compiler.
When using a command-line compiler, there are usually two main steps involved in building an
executable program from one or more source files. First, each source file is compiled, resulting in an
object file containing the machine instructions (generated by the compiler) corresponding to just the code
in that source file. Second, the various object files are linked together, with each other and with libraries
containing code for functions which you did not write (such as printf), to produce a final, executable
program.
Under Unix, the cc command can perform one or both steps. So far, we've been using extremely simple
invocations of cc such as
cc -o hello hello.c
This invocation compiles a single source file, hello.c, links it, and places the executable in a file
named hello.
Suppose we have a program which we're trying to build from three separate source files, x.c, y.c, and
z.c. We could compile all three of them, and link them together, all at once, with the command
cc -o myprog x.c y.c z.c
Alternatively, we could compile them separately: the -c option to cc tells it to compile only, but not to
link. Instead of building an executable, it merely creates an object file, with a name ending in .o, for
each source file compiled. So the three commands

cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o, and z.o. Then, the three object
files could be linked together using
cc -o myprog x.o y.o z.o
When the cc command is given an .o file, it knows that it does not have to compile it (it's an object file,
already compiled); it just sends it through to the link process.
Above we mentioned that the second, linking step also involves pulling in library functions. Normally,
the functions from the Standard C library are linked in automatically. Occasionally, you must request a
library manually; one common situation under Unix is that the math functions tend to be in a separate
math library, which is requested by using -lm on the command line. Since the libraries must typically be
searched after your program's own object files are linked (so that the linker knows which library
functions your program uses), any -l option must appear after the names of your files on the command
line. For example, to link the object file mymath.o (previously compiled with cc -c mymath.c)
together with the math library, you might use
cc -o mymathprog mymath.o -lm
(The l in the -l option is the lower case ell, for library; it is not the digit 1.)
Everything we've said about cc also applies to most other Unix C compilers. (Many of you will be using
gcc, the FSF's GNU C Compiler.)
There are command-line compilers for MS-DOS systems which work similarly. For example, the
Microsoft C compiler comes with a CL (``compile and link'') command, which works almost the same as
Unix cc. You can compile and link in one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.
The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Borland's Turbo C or the Microsoft Programmer's Workbench or
Visual C or Think C or Codewarrior, most of the mechanical details are taken care of for you. (There's
also less I can say here about these environments, because they're all different.) Typically you define a
``project,'' and there's a way to specify the list of files (modules) which make up your project. The
modules might be source files which you typed in or obtained elsewhere, or they might be source files
which you created within the environment (perhaps by requesting a ``New source file,'' and typing it in).
Typically, the programming environment has a single ``build'' button which does whatever's required to
build (and perhaps even execute) your program. There may also be configuration windows in which you
can specify compiler options (such as whether you'd like it to accept C or C++). ``See your manual for
details.''


So far, we've been using printf to do output, and we haven't had a way of doing any input. In this
chapter, we'll learn a bit more about printf, and we'll begin learning about character-based input and
output.
6.1 printf
6.2 Character Input and Output
6.3 Reading Lines
6.4 Reading Numbers

6.1 <TT>printf</TT>
6.1 printf
printf's name comes from print formatted. It generates output under the control of a format string (its first
argument) which consists of literal characters to be printed and also special character sequences--format specifiers-which request that other arguments be fetched, formatted, and inserted into the string. Our very first program was
nothing more than a call to printf, printing a constant string:
Our second program also featured a call to printf:
In that case, whenever printf ``printed'' the string "i is %d", it did not print it verbatim; it replaced the two
characters %d with the value of the variable i.
There are quite a number of format specifiers for printf. Here are the basic ones :
%d
%ld
%c
%s
%f
%e
%g
%o
%x
%%
print an int argument in decimal

print a long int argument in decimal
print a character
print a string
print a float or double argument
same as %f, but use exponential notation
use %e or %f, whichever is better
print an int argument in octal (base 8)
print an int argument in hexadecimal (base 16)
print a single %
It is also possible to specify the width and precision of numbers and strings as they are inserted (somewhat like
FORTRAN format statements); we'll present those details in a later chapter. (Very briefly, for those who are
curious: a notation like %3d means to print an int in a field at least 3 spaces wide; a notation like %5.2f means
to print a float or double in a field at least 5 spaces wide, with two places to the right of the decimal.)
To illustrate with a few more examples: the call
printf("%c %d %f %e %s %d%%\n", '1', 2, 3.14, 56000000., "eight", 9);
would print
1 2 3.140000 5.600000e+07 eight 9%
6.1 <TT>printf</TT>
The call
printf("%d %o %x\n", 100, 100, 100);
would print
100 144 64
Successive calls to printf just build up the output a piece at a time, so the calls
printf("Hello, ");
printf("world!\n");
would also print Hello, world! (on one line of output).
Earlier we learned that C represents characters internally as small integers corresponding to the characters' values
in the machine's character set (typically ASCII). This means that there isn't really much difference between a
character and an integer in C; most of the difference is in whether we choose to interpret an integer as an integer or
a character. printf is one place where we get to make that choice: %d prints an integer value as a string of digits
representing its decimal value, while %c prints the character corresponding to a character set value. So the lines
char c = 'A';
int i = 97;
printf("c = %c, i = %d\n", c, i);
would print c as the character A and i as the number 97. But if, on the other hand, we called
printf("c = %d, i = %c\n", c, i);
we'd see the decimal value (printed by %d) of the character 'A', followed by the character (whatever it is) which
happens to have the decimal value 97.
You have to be careful when calling printf. It has no way of knowing how many arguments you've passed it or
what their types are other than by looking for the format specifiers in the format string. If there are more format
specifiers (that is, more % signs) than there are arguments, or if the arguments have the wrong types for the format
specifiers, printf can misbehave badly, often printing nonsense numbers or (even worse) numbers which
mislead you into thinking that some other part of your program is broken.
Because of some automatic conversion rules which we haven't covered yet, you have a small amount of latitude in
the types of the expressions you pass as arguments to printf. The argument for %c may be of type char or
int, and the argument for %d may be of type char or int. The string argument for %s may be a string constant,
an array of characters, or a pointer to some characters (though we haven't really covered strings or pointers yet).
Finally, the arguments corresponding to %e, %f, and %g may be of types float or double. But other
combinations do not work reliably: %d will not print a long int or a float or a double; %ld will not print
an int; %e, %f, and %g will not print an int.
6.1 <TT>printf</TT>


Unless a program can read some input, it's hard to keep it from doing exactly the same thing every time
it's run, and thus being rather boring after a while.
The most basic way of reading input is by calling the function getchar. getchar reads one character
from the ``standard input,'' which is usually the user's keyboard, but which can sometimes be redirected
by the operating system. getchar returns (rather obviously) the character it reads, or, if there are no
more characters available, the special value EOF (`ènd of file'').
A companion function is putchar, which writes one character to the ``standard output.'' (The standard
output is, again not surprisingly, usually the user's screen, although it, too, can be redirected. printf,
like putchar, prints to the standard output; in fact, you can imagine that printf calls putchar to
actually print each of the characters it formats.)
Using these two functions, we can write a very basic program to copy the input, a character at a time, to
the output:
#include <stdio.h>
/* copy input to output */
main()
{
int c;
c = getchar();
while(c != EOF)
{
putchar(c);
c = getchar();
}
return 0;
}
This code is straightforward, and I encourage you to type it in and try it out. It reads one character, and if
it is not the EOF code, enters a while loop, printing one character and reading another, as long as the
character read is not EOF. This is a straightforward loop, although there's one mystery surrounding the
declaration of the variable c: if it holds characters, why is it an int?
We said that a char variable could hold integers corresponding to character set values, and that an int
could hold integers of more arbitrary values (up to +-32767). Since most character sets contain a few
hundred characters (nowhere near 32767), an int variable can in general comfortably hold all char
values, and then some. Therefore, there's nothing wrong with declaring c as an int. But in fact, it's
important to do so, because getchar can return every character value, plus that special, non-character
value EOF, indicating that there are no more characters. Type char is only guaranteed to be able to hold
all the character values; it is not guaranteed to be able to hold this ``no more characters'' value without
possibly mixing it up with some actual character value. (It's like trying to cram five pounds of books into
a four-pound box, or 13 eggs into a carton that holds a dozen.) Therefore, you should always remember to
use an int for anything you assign getchar's return value to.
When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard is not the
same as the special EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else. In a disk file, at least, there is not likely
to be any character in the file corresponding to EOF; as far as your program is concerned, EOF indicates
the absence of any more characters to read.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something, you'll have to take more drastic
measures. Under Unix, control-C (or, occasionally, the DELETE key) will terminate the current program,
almost no matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the
current program, but by default MS-DOS only checks for control-C when it's looking for input, so an
infinite loop can be unkillable. There's a DOS command,
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your
computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program. It then zips several times through its loop, reading and printing all the
characters in the line in quick succession. In other words, when you run this program, it will probably
seem to copy the input a line at a time, rather than a character at a time. You may wonder how a program
could instead read a character right away, without waiting for the user to hit RETURN. That's an excellent
question, but unfortunately the answer is rather complicated, and beyond the scope of our discussion here.
(Among other things, how to read a character right away is one of the things that's not defined by the C
language, and it's not defined by any of the standard library functions, either. How to do it depends on
which operating system you're using.)
Stylistically, the character-copying program above can be said to have one minor flaw: it contains two
calls to getchar, one which reads the first character and one which reads (by virtue of the fact that it's
in the body of the loop) all the other characters. This seems inelegant and perhaps unnecessary, and it can
also be risky: if there were more things going on within the loop, and if we ever changed the way we read
characters, it would be easy to change one of the getchar calls but forget to change the other one. Is
there a way to rewrite the loop so that there is only one call to getchar, responsible for reading all the
characters? Is there a way to read a character, test it for EOF, and assign it to the variable c, all at the
same time?
There is. It relies on the fact that the assignment operator, =, is just another operator in C. An assignment
is not (necessarily) a standalone statement; it is an expression, and it has a value (the value that's assigned
to the variable on the left-hand side), and it can therefore participate in a larger, surrounding expression.
Therefore, most C programmers would write the character-copying loop like this:
while((c = getchar()) != EOF)
putchar(c);
What does this mean? The function getchar is called, as before, and its return value is assigned to the
variable c. Then the value is immediately compared against the value EOF. Finally, the true/false value of
the comparison controls the while loop: as long as the value is not EOF, the loop continues executing,
but as soon as an EOF is received, no more trips through the loop are taken, and it exits. The net result is
that the call to getchar happens inside the test at the top of the while loop, and doesn't have to be
repeated before the loop and within the loop (more on this in a bit).
Stated another way, the syntax of a while loop is always
while( expression ) ...
A comparison (using the != operator) is of course an expression; the syntax is
expression != expression
And an assignment is an expression; the syntax is

expression = expression
What we're seeing is just another example of the fact that expressions can be combined with essentially
limitless generality and therefore infinite variety. The left-hand side of the != operator (its first
expression) is the (sub)expression c = getchar(), and the combined expression is the expression
needed by the while loop.
The extra parentheses around
(c = getchar())
are important, and are there because because the precedence of the != operator is higher than that of the =
operator. If we (incorrectly) wrote
while(c = getchar() != EOF)
/* WRONG */
the compiler would interpret it as

while(c = (getchar() != EOF))
That is, it would assign the result of the != operator to the variable c, which is not what we want.
(``Precedence'' refers to the rules for which operators are applied to their operands in which order, that is,
to the rules controlling the default grouping of expressions and subexpressions. For example, the
multiplication operator * has higher precedence than the addition operator +, which means that the
expression a + b * c is parsed as a + (b * c). We'll have more to say about precedence later.)
The line
epitomizes the cryptic brevity which C is notorious for. You may find this terseness infuriating (and
you're not alone!), and it can certainly be carried too far, but bear with me for a moment while I defend it.
The simple example we've been discussing illustrates the tradeoffs well. We have four things to do:
1. call getchar,
2. assign its return value to a variable,
3. test the return value against EOF, and
4. process the character (in this case, print it out again).

We can't eliminate any of these steps. We have to assign getchar's value to a variable (we can't just use
it directly) because we have to do two different things with it (test, and print). Therefore, compressing the
assignment and test into the same line is the only good way of avoiding two distinct calls to getchar.
You may not agree that the compressed idiom is better for being more compact or easier to read, but the
fact that there is now only one call to getchar is a real virtue.
Don't think that you'll have to write compressed lines like
right away, or in order to be an `èxpert C programmer.'' But, for better or worse, most experienced C
programmers do like to use these idioms (whether they're justified or not), so you'll need to be able to at
least recognize and understand them when you're reading other peoples' code.

6.3 Reading Lines
6.3 Reading Lines

It's often convenient for a program to process its input not a character at a time but rather a line at a time,
that is, to read an entire line of input and then act on it all at once. The standard C library has a couple of
functions for reading lines, but they have a few awkward features, so we're going to learn more about
character input (and about writing functions in general) by writing our own function to read one line. Here
it is:
#include <stdio.h>
/*
/*
/*
/*
int
{
int
int
max
Read one line from standard input, */

copying it to line array (but no more than max chars). */
Does not place terminating \n in line array. */
Returns line length, or 0 for empty line, or EOF for end-of-file. */
getline(char line[], int max)
nch = 0;
c;
= max - 1;
/* leave room for '\0' */

{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
As the comment indicates, this function will read one line of input from the standard input, placing it into
the line array. The size of the line array is given by the max argument; the function will never write
more than max characters into line.
6.3 Reading Lines
The main body of the function is a getchar loop, much as we used in the character-copying program. In
the body of this loop, however, we're storing the characters in an array (rather than immediately printing
them out). Also, we're only reading one line of characters, then stopping and returning.
There are several new things to notice here.
First of all, the getline function accepts an array as a parameter. As we've said, array parameters are an
exception to the rule that functions receive copies of their arguments--in the case of arrays, the function
does have access to the actual array passed by the caller, and can modify it. Since the function is
accessing the caller's array, not creating a new one to hold a copy, the function does not have to declare
the argument array's size; it's set by the caller. (Thus, the brackets in ``char line[]'' are empty.)
However, so that we won't overflow the caller's array by reading too long a line into it, we allow the caller
to pass along the size of the array, which we promise not to exceed.
Second, we see an example of the break statement. The top of the loop looks like our earlier charactercopying loop--it stops when it reaches EOF--but we only want this loop to read one line, so we also stop
(that is, break out of the loop) when we see the \n character signifying end-of-line. An equivalent loop,
without the break statement, would be
while((c = getchar()) != EOF && c != '\n')
{
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
We haven't learned about the internal representation of strings yet, but it turns out that strings in C are
simply arrays of characters, which is why we are reading the line into an array of characters. The end of a
string is marked by the special character, '\0'. To make sure that there's always room for that character,
on our way in we subtract 1 from max, the argument that tells us how many characters we may place in
the line array. When we're done reading the line, we store the end-of-string character '\0' at the end
of the string we've just built in the line array.
Finally, there's one subtlety in the code which isn't too important for our purposes now but which you
may wonder about: it's arranged to handle the possibility that a few characters (i.e. the apparent beginning
of a line) are read, followed immediately by an EOF, without the usual \n end-of-line character. (That's
why we return EOF only if we received EOF and we hadn't read any characters first.)
In any case, the function returns the length (number of characters) of the line it read, not including the \n.
(Therefore, it returns 0 for an empty line.) Like getchar, it returns EOF when there are no more lines to
6.3 Reading Lines
read. (It happens that EOF is a negative number, so it will never match the length of a line that getline
has read.)
Here is an example of a test program which calls getline, reading the input a line at a time and then
printing each line back out:
#include <stdio.h>
extern int getline(char [], int);
main()
{
char line[256];
while(getline(line, 256) != EOF)
printf("you typed \"%s\"\n", line);
return 0;
}
The notation char [] in the function prototype for getline says that getline accepts as its first
argument an array of char. When the program calls getline, it is careful to pass along the actual size
of the array. (You might notice a potential problem: since the number 256 appears in two places, if we
ever decide that 256 is too small, and that we want to be able to read longer lines, we could easily change
one of the instances of 256, and forget to change the other one. Later we'll learn ways of solving--that is,
avoiding--this sort of problem.)

6.4 Reading Numbers
6.4 Reading Numbers

The getline function of the previous section reads one line from the user, as a string. What if we want
to read a number? One straightforward way is to read a string as before, and then immediately convert
the string to a number. The standard C library contains a number of functions for doing this. The simplest
to use are atoi(), which converts a string to an integer, and atof(), which converts a string to a
floating-point number. (Both of these functions are declared in the header <stdlib.h>, so you should
#include that header at the top of any file using these functions.) You could read an integer from the
user like this:
#include <stdlib.h>
char line[256];
int n;
printf("Type an integer:\n");
getline(line, 256);
n = atoi(line);
Now the variable n contains the number typed by the user. (This assumes that the user did type a valid
number, and that getline did not return EOF.)
Reading a floating-point number is similar:
#include <stdlib.h>
char line[256];
double x;
printf("Type a floating-point number:\n");
getline(line, 256);
x = atof(line);
(atof is actually declared as returning type double, but you could also use it with a variable of type
float, because in general, C automatically converts between float and double as needed.)
Another way of reading in numbers, which you're likely to see in other books on C, involves the scanf
function, but it has several problems, so we won't discuss it for now. (Superficially, scanf seems simple
enough, which is why it's often used, especially in textbooks. The trouble is that to perform input reliably
using scanf is not nearly as easy as it looks, especially when you're not sure what the user is going to
type.)
6.4 Reading Numbers


In this chapter we'll meet some (though still not all) of C's more advanced arithmetic operators. The ones
we'll meet here have to do with making common patterns of operations easier.
It's extremely common in programming to have to increment a variable by 1, that is, to add 1 to it. (For
example, if you're processing each element of an array, you'll typically write a loop with an index or
pointer variable stepping through the elements of the array, and you'll increment the variable each time
through the loop.) The classic way to increment a variable is with an assignment like
i = i + 1
Such an assignment is perfectly common and acceptable, but it has a few slight problems:
1. As we've mentioned, it looks a little odd, especially from an algebraic perspective.
2. If the object being incremented is not a simple variable, the idiom can become cumbersome to
type, and correspondingly more error-prone. For example, the expression
a[i+j+2*k] = a[i+j+2*k] + 1
is a bit of a mess, and you may have to look closely to see that the similar-looking expression
a[i+j+2*k] = a[i+j+2+k] + 1
probably has a mistake in it.
3. Since incrementing things is so common, it might be nice to have an easier way of doing it.
In fact, C provides not one but two other, simpler ways of incrementing variables and performing other
similar operations.
7.2 Increment and Decrement Operators
7.3 Order of Evaluation

The first and more general way is that any time you have the pattern
v = v op e
where v is any variable (or anything like a[i]), op is any of the binary arithmetic operators we've seen
so far, and e is any expression, you can replace it with the simplified
v op= e
For example, you can replace the expressions
i = i + 1
j = j - 10
k = k * (n + 1)
a[i] = a[i] / b
with
i +=
j -=
k *=
a[i]
1
10
n + 1
/= b
In an example in a previous chapter, we used the assignment

a[d1 + d2] = a[d1 + d2] + 1;
to count the rolls of a pair of dice. Using +=, we could simplify this expression to
a[d1 + d2] += 1;
As these examples show, you can use the `òp='' form with any of the arithmetic operators (and with
several other operators that we haven't seen yet). The expression, e, does not have to be the constant 1; it
can be any expression. You don't always need as many explicit parentheses when using the op=
operators: the expression
k *= n + 1
is interpreted as
k = k * (n + 1)


The assignment operators of the previous section let us replace v = v op e with v op= e, so that we didn't
have to mention v twice. In the most common cases, namely when we're adding or subtracting the
constant 1 (that is, when op is + or - and e is 1), C provides another set of shortcuts: the autoincrement
and autodecrement operators. In their simplest forms, they look like this:
++i
--j
add 1 to i
subtract 1 from j
These correspond to the slightly longer i += 1 and j -= 1, respectively, and also to the fully
``longhand'' forms i = i + 1 and j = j - 1.
The ++ and -- operators apply to one operand (they're unary operators). The expression ++i adds 1 to
i, and stores the incremented result back in i. This means that these operators don't just compute new
values; they also modify the value of some variable. (They share this property--modifying some variable-with the assignment operators; we can say that these operators all have side effects. That is, they have
some effect, on the side, other than just computing a new value.)
The incremented (or decremented) result is also made available to the rest of the expression, so an
expression like
k = 2 * ++i
means `àdd one to i, store the result back in i, multiply it by 2, and store that result in k.'' (This is a
pretty meaningless expression; our actual uses of ++ later will make more sense.)
Both the ++ and -- operators have an unusual property: they can be used in two ways, depending on
whether they are written to the left or the right of the variable they're operating on. In either case, they
increment or decrement the variable they're operating on; the difference concerns whether it's the old or
the new value that's ``returned'' to the surrounding expression. The prefix form ++i increments i and
returns the incremented value. The postfix form i++ increments i, but returns the prior, non-incremented
value. Rewriting our previous example slightly, the expression
k = 2 * i++
means ``take i's old value and multiply it by 2, increment i, store the result of the multiplication in k.''
The distinction between the prefix and postfix forms of ++ and -- will probably seem strained at first,
but it will make more sense once we begin using these operators in more realistic situations.
For example, our getline function of the previous chapter used the statements
line[nch] = c;
nch = nch + 1;
as the body of its inner loop. Using the ++ operator, we could simplify this to
line[nch++] = c;
We wanted to increment nch after deciding which element of the line array to store into, so the postfix
form nch++ is appropriate.
Notice that it only makes sense to apply the ++ and -- operators to variables (or to other ``containers,''
such as a[i]). It would be meaningless to say something like
1++
or
(2+3)++
The ++ operator doesn't just mean `àdd one''; it means `àdd one to a variable'' or ``make a variable's
value one more than it was before.'' But (1+2) is not a variable, it's an expression; so there's no place for
++ to store the incremented result.
Another unfortunate example is
i = i++;
which some confused programmers sometimes write, presumably because they want to be extra sure that
i is incremented by 1. But i++ all by itself is sufficient to increment i by 1; the extra (explicit)
assignment to i is unnecessary and in fact counterproductive, meaningless, and incorrect. If you want to
increment i (that is, add one to it, and store the result back in i), either use
i = i + 1;
or
i += 1;
or
++i;
or
i++;
Don't try to use some bizarre combination.
Did it matter whether we used ++i or i++ in this last example? Remember, the difference between the
two forms is what value (either the old or the new) is passed on to the surrounding expression. If there is
no surrounding expression, if the ++i or i++ appears all by itself, to increment i and do nothing else,
you can use either form; it makes no difference. (Two ways that an expression can appear `àll by itself,''
with ``no surrounding expression,'' are when it is an expression statement terminated by a semicolon, as
above, or when it is one of the controlling expressions of a for loop.) For example, both the loops
for(i = 0; i < 10; ++i)
printf("%d\n", i);
and
for(i = 0; i < 10; i++)
printf("%d\n", i);
will behave exactly the same way and produce exactly the same results. (In real code, postfix increment is
probably more common, though prefix definitely has its uses, too.)
In the preceding section, we simplified the expression
a[d1 + d2] = a[d1 + d2] + 1;
from a previous chapter down to
a[d1 + d2] += 1;
Using ++, we could simplify it still further to
a[d1 + d2]++;
or
++a[d1 + d2];
(Again, in this case, both are equivalent.)
We'll see more examples of these operators in the next section and in the next chapter.


When you start using the ++ and -- operators in larger expressions, you end up with expressions which
do several things at once, i.e., they modify several different variables at more or less the same time.
When you write such an expression, you must be careful not to have the expression ``pull the rug out
from under itself'' by assigning two different values to the same variable, or by assigning a new value to a
variable at the same time that another part of the expression is trying to use the value of that variable.
Actually, we had already started writing expressions which did several things at once even before we met
the ++ and -- operators. The expression
(c = getchar()) != EOF
assigns getchar's return value to c, and compares it to EOF. The ++ and -- operators make it much
easier to cram a lot into a small expression: the example
line[nch++] = c;
from the previous section assigned c to line[nch], and incremented nch. We'll eventually meet
expressions which do three things at once, such as
a[i++] = b[j++];
which assigns b[j] to a[i], and increments i, and increments j.
If you're not careful, though, it's easy for this sort of thing to get out of hand. Can you figure out exactly
what the expression
a[i++] = b[i++];
/* WRONG */
should do? I can't, and here's the important part: neither can the compiler. We know that the definition of
postfix ++ is that the former value, before the increment, is what goes on to participate in the rest of the
expression, but the expression a[i++] = b[i++] contains two ++ operators. Which of them happens
first? Does this expression assign the old ith element of b to the new ith element of a, or vice versa? No
one knows.
When the order of evaluation matters but is not well-defined (that is, when we can't say for sure which
order the compiler will evaluate the various dependent parts in) we say that the meaning of the
expression is undefined, and if we're smart we won't write the expression in the first place. (Why would
anyone ever write an `ùndefined'' expression? Because sometimes, the compiler happens to evaluate it in
the order a programmer wanted, and the programmer assumes that since it works, it must be okay.)
For example, suppose we carelessly wrote this loop:
int i, a[10];
i = 0;
while(i < 10)
a[i] = i++;
/* WRONG */
It looks like we're trying to set a[0] to 0, a[1] to 1, etc. But what if the increment i++ happens before
the compiler decides which cell of the array a to store the (unincremented) result in? We might end up
setting a[1] to 0, a[2] to 1, etc., instead. Since, in this case, we can't be sure which order things would
happen in, we simply shouldn't write code like this. In this case, what we're doing matches the pattern of
a for loop, anyway, which would be a better choice:
for(i = 0; i < 10; i++)
a[i] = i;
Now that the increment i++ isn't crammed into the same expression that's setting a[i], the code is
perfectly well-defined, and is guaranteed to do what we want.
In general, you should be wary of ever trying to second-guess the order an expression will be evaluated
in, with two exceptions:
1. You can obviously assume that precedence will dictate the order in which binary operators are
applied. This typically says more than just what order things happens in, but also what the
expression actually means. (In other words, the precedence of * over + says more than that the
multiplication ``happens first'' in 1 + 2 * 3; it says that the answer is 7, not 9.)
2. Although we haven't mentioned it yet, it is guaranteed that the logical operators && and || are
evaluated left-to-right, and that the right-hand side is not evaluated at all if the left-hand side
determines the outcome.
To look at one more example, it might seem that the code
int i = 7;
printf("%d\n", i++ * i++);
would have to print 56, because no matter which order the increments happen in, 7*8 is 8*7 is 56. But
++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if the compiler chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
to do anything reasonable with them at all. Theoretically, the above code could end up printing 42, or
8923409342, or 0, or crashing your computer.
Programmers sometimes mistakenly imagine that they can write an expression which tries to do too
much at once and then predict exactly how it will behave based on `òrder of evaluation.'' For example,
we know that multiplication has higher precedence than addition, which means that in the expression
i + j * k
j will be multiplied by k, and then i will be added to the result. Informally, we often say that the
multiplication happens ``before'' the addition. That's true in this case, but it doesn't say as much as we
might think about a more complicated expression, such as
i++ + j++ * k++
In this case, besides the addition and multiplication, i, j, and k are all being incremented. We can not
say which of them will be incremented first; it's the compiler's choice. (In particular, it is not necessarily
the case that j++ or k++ will happen first; the compiler might choose to save i's value somewhere and
increment i first, even though it will have to keep the old value around until after it has done the
multiplication.)
In the preceding example, it probably doesn't matter which variable is incremented first. It's not too hard,
though, to write an expression where it does matter. In fact, we've seen one already: the ambiguous
assignment a[i++] = b[i++]. We still don't know which i++ happens first. (We can not assume,
based on the right-to-left behavior of the = operator, that the right-hand i++ will happen first.) But if we
had to know what a[i++] = b[i++] really did, we'd have to know which i++ happened first.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
i++ + (j++ * k++)
we still wouldn't know which of the increments would happen first. (The parentheses would force the
multiplication to happen before the addition, but precedence already would have forced that, anyway.) If
we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
order; this parenthesized version would be just as undefined as i++ * i++ was.
There's a line from Kernighan & Ritchie, which I am fond of quoting when discussing these issues [Sec.
2.12, p. 54]:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. But you don't want to write programs that work only under one particular compiler, that take
advantage of the way that one compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on `ùnder the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.

Chapter 8: Strings
Chapter 8: Strings
Strings in C are represented by arrays of characters. The end of the string is marked with a special
character, the null character, which is simply the character with the value 0. (The null character has no
relation except in name to the null pointer. In the ASCII character set, the null character is named NUL.)
The null or string-terminating character is represented by another character escape sequence, \0. (We've
seen it once already, in the getline function of chapter 6.)
Because C has no built-in facilities for manipulating entire arrays (copying them, comparing them, etc.),
it also has very few built-in facilities for manipulating strings.
In fact, C's only truly built-in string-handling is that it allows us to use string constants (also called string
literals) in our code. Whenever we write a string, enclosed in double quotes, C automatically creates an
array of characters for us, containing that string, terminated by the \0 character. For example, we can
declare and define an array of characters, and initialize it with a string constant:
In this case, we can leave out the dimension of the array, since the compiler can compute it for us based
on the size of the initializer (14, including the terminating \0). This is the only case where the compiler
sizes a string array for us, however; in other cases, it will be necessary that we decide how big the arrays
and other data structures we use to hold strings are.
To do anything else with strings, we must typically call functions. The C library contains a few basic
string manipulation functions, and to learn more about strings, we'll be looking at how these functions
might be implemented.
Since C never lets us assign entire arrays, we use the strcpy function to copy one string to another:
#include <string.h>
char string1[] = "Hello, world!";
char string2[20];
strcpy(string2, string1);
The destination string is strcpy's first argument, so that a call to strcpy mimics an assignment
expression (with the destination on the left-hand side). Notice that we had to allocate string2 big
enough to hold the string that would be copied to it. Also, at the top of any source file where we're using
the standard library's string-handling functions (such as strcpy) we must include the line
Chapter 8: Strings
#include <string.h>
which contains external declarations for these functions.
Since C won't let us compare entire arrays, either, we must call a function to do that, too. The standard
library's strcmp function compares two strings, and returns 0 if they are identical, or a negative number
if the first string is alphabetically ``less than'' the second string, or a positive number if the first string is
``greater.'' (Roughly speaking, what it means for one string to be ``less than'' another is that it would
come first in a dictionary or telephone book, although there are a few anomalies.) Here is an example:
char string3[] = "this is";
char string4[] = "a test";
if(strcmp(string3, string4) == 0)
printf("strings are equal\n");
else
printf("strings are different\n");
This code fragment will print ``strings are different''. Notice that strcmp does not return a Boolean,
true/false, zero/nonzero answer, so it's not a good idea to write something like
if(strcmp(string3, string4))
...
because it will behave backwards from what you might reasonably expect. (Nevertheless, if you start
reading other people's code, you're likely to come across conditionals like if(strcmp(a, b)) or
even if(!strcmp(a, b)). The first does something if the strings are unequal; the second does
something if they're equal. You can read these more easily if you pretend for a moment that strcmp's
name were strdiff, instead.)
Another standard library function is strcat, which concatenates strings. It does not concatenate two
strings together and give you a third, new string; what it really does is append one string onto the end of
another. (If it gave you a new string, it would have to allocate memory for it somewhere, and the
standard library string functions generally never do that for you automatically.) Here's an example:
char string5[20] = "Hello, ";
char string6[] = "world!";
printf("%s\n", string5);
strcat(string5, string6);
Chapter 8: Strings
The first call to printf prints ``Hello, '', and the second one prints ``Hello, world!'', indicating that the
contents of string6 have been tacked on to the end of string5. Notice that we declared string5
with extra space, to make room for the appended characters.
If you have a string and you want to know its length (perhaps so that you can check whether it will fit in
some other array you've allocated for it), you can call strlen, which returns the length of the string (i.e.
the number of characters in it), not including the \0:
char string7[] = "abc";
int len = strlen(string7);
printf("%d\n", len);
Finally, you can print strings out with printf using the %s format specifier, as we've been doing in
these examples already (e.g. printf("%s\n", string5);).
Since a string is just an array of characters, all of the string-handling functions we've just seen can be
written quite simply, using no techniques more complicated than the ones we already know. In fact, it's
quite instructive to look at how these functions might be implemented. Here is a version of strcpy:
mystrcpy(char dest[], char src[])
{
int i = 0;
while(src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}
We've called it mystrcpy instead of strcpy so that it won't clash with the version that's already in the
standard library. Its operation is simple: it looks at characters in the src string one at a time, and as long
as they're not \0, assigns them, one by one, to the corresponding positions in the dest string. When it's
done, it terminates the dest string by appending a \0. (After exiting the while loop, i is guaranteed to
have a value one greater than the subscript of the last character in src.) For comparison, here's a way of
writing the same code, using a for loop:
for(i = 0; src[i] != '\0'; i++)
dest[i] = src[i];
dest[i] = '\0';
Chapter 8: Strings
Yet a third possibility is to move the test for the terminating \0 character out of the for loop header and
into the body of the loop, using an explicit if and break statement, so that we can perform the test
after the assignment and therefore use the assignment inside the loop to copy the \0 to dest, too:
for(i = 0; ; i++)
{
dest[i] = src[i];
if(src[i] == '\0')
break;
}
(There are in fact many, many ways to write strcpy. Many programmers like to combine the
assignment and test, using an expression like (dest[i] = src[i]) != '\0'. This is actually the
same sort of combined operation as we used in our getchar loop in chapter 6.)
Here is a version of strcmp:
mystrcmp(char str1[], char str2[])
{
int i = 0;
while(1)
{
if(str1[i] != str2[i])
return str1[i] - str2[i];
if(str1[i] == '\0' || str2[i] == '\0')
return 0;
i++;
}
}
Characters are compared one at a time. If two characters in one position differ, the strings are different,
and we are supposed to return a value less than zero if the first string (str1) is alphabetically less than
the second string. Since characters in C are represented by their numeric character set values, and since
most reasonable character sets assign values to characters in alphabetical order, we can simply subtract
the two differing characters from each other: the expression str1[i] - str2[i] will yield a
negative result if the i'th character of str1 is less than the corresponding character in str2. (As it
turns out, this will behave a bit strangely when comparing upper- and lower-case letters, but it's the
traditional approach, which the standard versions of strcmp tend to use.) If the characters are the same,
we continue around the loop, unless the characters we just compared were (both) \0, in which case
we've reached the end of both strings, and they were both equal. Notice that we used what may at first
Chapter 8: Strings
appear to be an infinite loop--the controlling expression is the constant 1, which is always true. What
actually happens is that the loop runs until one of the two return statements breaks out of it (and the
entire function). Note also that when one string is longer than the other, the first test will notice this
(because one string will contain a real character at the [i] location, while the other will contain \0, and
these are not equal) and the return value will be computed by subtracting the real character's value from
0, or vice versa. (Thus the shorter string will be treated as ``less than'' the longer.)
Finally, here is a version of strlen:
int mystrlen(char str[])
{
int i;
for(i = 0; str[i] != '\0'; i++)
{}
return i;
}
In this case, all we have to do is find the \0 that terminates the string, and it turns out that the three
control expressions of the for loop do all the work; there's nothing left to do in the body. Therefore, we
use an empty pair of braces {} as the loop body. Equivalently, we could use a null statement, which is
simply a semicolon:
for(i = 0; str[i] != '\0'; i++)
;
Empty loop bodies can be a bit startling at first, but they're not unheard of.
Everything we've looked at so far has come out of C's standard libraries. As one last example, let's write
a substr function, for extracting a substring out of a larger string. We might call it like this:
char string8[] = "this is a test";
char string9[10];
substr(string9, string8, 5, 4);
The idea is that we'll extract a substring of length 4, starting at character 5 (0-based) of string8, and
copy the substring to string9. Just as with strcpy, it's our responsibility to declare the destination
string (string9) big enough. Here is an implementation of substr. Not surprisingly, it's quite similar
to strcpy:
Chapter 8: Strings
substr(char dest[], char src[], int offset, int len)

{
int i;
for(i = 0; i < len && src[offset + i] != '\0'; i++)
dest[i] = src[i + offset];
dest[i] = '\0';
}
If you compare this code to the code for mystrcpy, you'll see that the only differences are that
characters are fetched from src[offset + i] instead of src[i], and that the loop stops when len
characters have been copied (or when the src string runs out of characters, whichever comes first).
In this chapter, we've been careless about declaring the return types of the string functions, and (with the
exception of mystrlen) they haven't returned values. The real string functions do return values, but
they're of type ``pointer to character,'' which we haven't discussed yet.
When working with strings, it's important to keep firmly in mind the differences between characters and
strings. We must also occasionally remember the way characters are represented, and about the relation
between character values and integers.
As we have had several occasions to mention, a character is represented internally as a small integer,
with a value depending on the character set in use. For example, we might find that 'A' had the value
65, that 'a' had the value 97, and that '+' had the value 43. (These are, in fact, the values in the ASCII
character set, which most computers use. However, you don't need to learn these values, because the vast
majority of the time, you use character constants to refer to characters, and the compiler worries about
the values for you. Using character constants in preference to raw numeric values also makes your
programs more portable.)
As we may also have mentioned, there is a big difference between a character and a string, even a string
which contains only one character (other than the \0). For example, 'A' is not the same as "A". To
drive home this point, let's illustrate it with a few examples.
If you have a string:
char string[] = "hello, world!";
you can modify its first character by saying
string[0] = 'H';
(Of course, there's nothing magic about the first character; you can modify any character in the string in
this way. Be aware, though, that it is not always safe to modify strings in-place like this; we'll say more
Chapter 8: Strings
about the modifiability of strings in a later chapter on pointers.) Since you're replacing a character, you
want a character constant, 'H'. It would not be right to write
string[0] = "H";
/* WRONG */
because "H" is a string (an array of characters), not a single character. (The destination of the
assignment, string[0], is a char, but the right-hand side is a string; these types don't match.)
On the other hand, when you need a string, you must use a string. To print a single newline, you could
call
printf("\n");
It would not be correct to call
printf('\n');
/* WRONG */
printf always wants a string as its first argument. (As one final example, putchar wants a single
character, so putchar('\n') would be correct, and putchar("\n") would be incorrect.)
We must also remember the difference between strings and integers. If we treat the character '1' as an
integer, perhaps by saying
int i = '1';
we will probably not get the value 1 in i; we'll get the value of the character '1' in the machine's
character set. (In ASCII, it's 49.) When we do need to find the numeric value of a digit character (or to go
the other way, to get the digit character with a particular value) we can make use of the fact that, in any
character set used by C, the values for the digit characters, whatever they are, are contiguous. In other
words, no matter what values '0' and '1' have, '1' - '0' will be 1 (and, obviously, '0' - '0'
will be 0). So, for a variable c holding some digit character, the expression
c - '0'
gives us its value. (Similarly, for an integer value i, i + '0' gives us the corresponding digit
character, as long as 0 <= i <= 9.)
Just as the character '1' is not the integer 1, the string "123" is not the integer 123. When we have a
string of digits, we can convert it to the corresponding integer by calling the standard function atoi:
char string[] = "123";
Chapter 8: Strings
int i = atoi(string);
int j = atoi("456");
Later we'll learn how to go in the other direction, to convert an integer into a string. (One way, as long as
what you want to do is print the number out, is to call printf, using %d in the format string.)


Conceptually, the ``preprocessor'' is a translation phase that is applied to your source code before the
compiler proper gets its hands on it. (Once upon a time, the preprocessor was a separate program, much
as the compiler and linker may still be separate programs today.) Generally, the preprocessor performs
textual substitutions on your source code, in three sorts of ways:
File inclusion: inserting the contents of another file into your source file, as if you had typed it all
in there.
Macro substitution: replacing instances of one piece of text with another.
Conditional compilation: Arranging that, depending on various circumstances, certain parts of
your source code are seen or not seen by the compiler at all.
The next three sections will introduce these three preprocessing functions.
The syntax of the preprocessor is different from the syntax of the rest of C in several respects. First of all,
the preprocessor is ``line based.'' Each of the preprocessor directives we're going to learn about (all of
which begin with the # character) must begin at the beginning of a line, and each ends at the end of the
line. (The rest of C treats line ends as just another whitespace character, and doesn't care how your
program text is arranged into lines.) Secondly, the preprocessor does not know about the structure of C-about functions, statements, or expressions. It is possible to play strange tricks with the preprocessor to
turn something which does not look like C into C (or vice versa). It's also possible to run into problems
when a preprocessor substitution does not do what you expected it to, because the preprocessor does not
respect the structure of C statements and expressions (but you expected it to). For the simple uses of the
preprocessor we'll be discussing, you shouldn't have any of these problems, but you'll want to be careful
before doing anything tricky or outrageous with the preprocessor. (As it happens, playing tricky and
outrageous games with the preprocessor is considered sporting in some circles, but it rapidly gets out of
hand, and can lead to bewilderingly impenetrable programs.)
9.1 File Inclusion
9.2 Macro Definition and Substitution
9.3 Conditional Compilation

9.1 File Inclusion
9.1 File Inclusion

[This section corresponds to K&R Sec. 4.11.1]
A line of the form
#include <filename.h>
or
#include "filename.h"
causes the contents of the file filename.h to be read, parsed, and compiled at that point. (After
filename.h is processed, compilation continues on the line following the #include line.) For
example, suppose you got tired of retyping external function prototypes such as
at the top of each source file. You could instead place the prototype in a header file, perhaps
getline.h, and then simply place
#include "getline.h"
at the top of each source file where you called getline. (You might not find it worthwhile to create an
entire header file for a single function, but if you had a package of several related function, it might be
very useful to place all of their declarations in one header file.) As we may have mentioned, that's exactly
what the Standard header files such as stdio.h are--collections of declarations (including external
function prototype declarations) having to do with various sets of Standard library functions. When you
use #include to read in a header file, you automatically get the prototypes and other declarations it
contains, and you should use header files, precisely so that you will get the prototypes and other
declarations they contain.
The difference between the <> and "" forms is where the preprocessor searches for filename.h. As a
general rule, it searches for files enclosed in <> in central, standard directories, and it searches for files
enclosed in "" in the ``current directory,'' or the directory containing the source file that's doing the
including. Therefore, "" is usually used for header files you've written, and <> is usually used for
headers which are provided for you (which someone else has written).
The extension ``.h'', by the way, simply stands for ``header,'' and reflects the fact that #include
directives usually sit at the top (head) of your source files, and contain global declarations and definitions
which you would otherwise put there. (That extension is not mandatory--you can theoretically name your
9.1 File Inclusion
own header files anything you wish--but .h is traditional, and recommended.)

As we've already begun to see, the reason for putting something in a header file, and then using
#include to pull that header file into several different source files, is when the something (whatever it
is) must be declared or defined consistently in all of the source files. If, instead of using a header file, you
typed the something in to each of the source files directly, and the something ever changed, you'd have to
edit all those source files, and if you missed one, your program could fail in subtle (or serious) ways due
to the mismatched declarations (i.e. due to the incompatibility between the new declaration in one source
file and the old one in a source file you forgot to change). Placing common declarations and definitions
into header files means that if they ever change, they only have to be changed in one place, which is a
much more workable system.
What should you put in header files?
External declarations of global variables and functions. We said that a global variable must have
exactly one defining instance, but that it can have external declarations in many places. We said
that it was a grave error to issue an external declaration in one place saying that a variable or
function has one type, when the defining instance in some other place actually defines it with
another type. (If the two places are two source files, separately compiled, the compiler will
probably not even catch the discrepancy.) If you put the external declarations in a header file,
however, and include the header wherever it's needed, the declarations are virtually guaranteed to
be consistent. It's a good idea to include the header in the source file where the defining instance
appears, too, so that the compiler can check that the declaration and definition match. (That is, if
you ever change the type, you do still have to change it in two places: in the source file where the
defining instance occurs, and in the header file where the external declaration appears. But at least
you don't have to change it in an arbitrary number of places, and, if you've set things up correctly,
the compiler can catch any remaining mistakes.)
Preprocessor macro definitions (which we'll meet in the next section).
Structure definitions (which we haven't seen yet).
Typedef declarations (which we haven't seen yet).
However, there are a few things not to put in header files:
Defining instances of global variables. If you put these in a header file, and include the header file
in more than one source file, the variable will end up multiply defined.
Function bodies (which are also defining instances). You don't want to put these in headers for the
same reason--it's likely that you'll end up with multiple copies of the function and hence
``multiply defined'' errors. People sometimes put commonly-used functions in header files and
then use #include to bring them (once) into each program where they use that function, or use
#include to bring together the several source files making up a program, but both of these are
poor ideas. It's much better to learn how to use your compiler or linker to combine together
separately-compiled object files.
9.1 File Inclusion
Since header files typically contain only external declarations, and should not contain function bodies,
you have to understand just what does and doesn't happen when you #include a header file. The
header file may provide the declarations for some functions, so that the compiler can generate correct
code when you call them (and so that it can make sure that you're calling them correctly), but the header
file does not give the compiler the functions themselves. The actual functions will be combined into your
program at the end of compilation, by the part of the compiler called the linker. The linker may have to
get the functions out of libraries, or you may have to tell the compiler/linker where to find them. In
particular, if you are trying to use a third-party library containing some useful functions, the library will
often come with a header file describing those functions. Using the library is therefore a two-step
process: you must #include the header in the files where you call the library functions, and you must
tell the linker to read in the functions from the library itself.


A preprocessor line of the form
#define name text
defines a macro with the given name, having as its value the given replacement text. After that (for the
rest of the current source file), wherever the preprocessor sees that name, it will replace it with the
replacement text. The name follows the same rules as ordinary identifiers (it can contain only letters,
digits, and underscores, and may not begin with a digit). Since macros behave quite differently from
normal variables (or functions), it is customary to give them names which are all capital letters (or at
least which begin with a capital letter). The replacement text can be absolutely anything--it's not
restricted to numbers, or simple strings, or anything.
The most common use for macros is to propagate various constants around and to make them more selfdocumenting. We've been saying things like
char line[100];
...
getline(line, 100);
but this is neither readable nor reliable; it's not necessarily obvious what all those 100's scattered around
the program are, and if we ever decide that 100 is too small for the size of the array to hold lines, we'll
have to remember to change the number in two (or more) places. A much better solution is to use a
macro:
#define MAXLINE 100
char line[MAXLINE];
...
Now, if we ever want to change the size, we only have to do it in one place, and it's more obvious what
the words MAXLINE sprinkled through the program mean than the magic numbers 100 did.
Since the replacement text of a preprocessor macro can be anything, it can also be an expression,
although you have to realize that, as always, the text is substituted (and perhaps evaluated) later. No
evaluation is performed when the macro is defined. For example, suppose that you write something like
#define A 2
#define B 3
#define C A + B
(this is a pretty meaningless example, but the situation does come up in practice). Then, later, suppose
that you write
int x = C * 2;
If A, B, and C were ordinary variables, you'd expect x to end up with the value 10. But let's see what
happens.
The preprocessor always substitutes text for macros exactly as you have written it. So it first substitites
the replacement text for the macro C, resulting in
int x = A + B * 2;
Then it substitutes the macros A and B, resulting in
int x = 2 + 3 * 2;
Only when the preprocessor is done doing all this substituting does the compiler get into the act. But
when it evaluates that expression (using the normal precedence of multiplication over addition), it ends
up initializing x with the value 8!
To guard against this sort of problem, it is always a good idea to include explicit parentheses in the
definitions of macros which contain expressions. If we were to define the macro C as
#define C (A + B)
then the declaration of x would ultimately expand to
int x = (2 + 3) * 2;
and x would be initialized to 10, as we probably expected.
Notice that there does not have to be (and in fact there usually is not) a semicolon at the end of a
#define line. (This is just one of the ways that the syntax of the preprocessor is different from the rest
of C.) If you accidentally type
#define MAXLINE 100;
/* WRONG */
then when you later declare

char line[MAXLINE];
the preprocessor will expand it to
char line[100;];
/* WRONG */
which is a syntax error. This is what we mean when we say that the preprocessor doesn't know much of
anything about the syntax of C--in this last example, the value or replacement text for the macro
MAXLINE was the 4 characters 1 0 0 ; , and that's exactly what the preprocessor substituted (even
though it didn't make any sense).
Simple macros like MAXLINE act sort of like little variables, whose values are constant (or constant
expressions). It's also possible to have macros which look like little functions (that is, you invoke them
with what looks like function call syntax, and they expand to replacement text which is a function of the
actual arguments they are invoked with) but we won't be looking at these yet.


The last preprocessor directive we're going to look at is #ifdef. If you have the sequence
#ifdef name
program text
#else
more program text
#endif
in your program, the code that gets compiled depends on whether a preprocessor macro by that name is
defined or not. If it is (that is, if there has been a #define line for a macro called name), then
``program text'' is compiled and ``more program text'' is ignored. If the macro is not defined, ``more
program text'' is compiled and ``program text'' is ignored. This looks a lot like an if statement, but it
behaves completely differently: an if statement controls which statements of your program are executed
at run time, but #ifdef controls which parts of your program actually get compiled.
Just as for the if statement, the #else in an #ifdef is optional. There is a companion directive
#ifndef, which compiles code if the macro is not defined (although the ``#else clause'' of an
#ifndef directive will then be compiled if the macro is defined). There is also an #if directive which
compiles code depending on whether a compile-time expression is true or false. (The expressions which
are allowed in an #if directive are somewhat restricted, however, so we won't talk much about #if
here.)
Conditional compilation is useful in two general classes of situations:
You are trying to write a portable program, but the way you do something is different depending
on what compiler, operating system, or computer you're using. You place different versions of
your code, one for each situation, between suitable #ifdef directives, and when you compile the
progam in a particular environment, you arrange to have the macro names defined which select
the variants you need in that environment. (For this reason, compilers usually have ways of letting
you define macros from the invocation command line or in a configuration file, and many also
predefine certain macro names related to the operating system, processor, or compiler in use. That
way, you don't have to change the code to change the #define lines each time you compile it in
a different environment.)
For example, in ANSI C, the function to delete a file is remove. On older Unix systems,
however, the function was called unlink. So if filename is a variable containing the name of
a file you want to delete, and if you want to be able to compile the program under these older
Unix systems, you might write
#ifdef unix
unlink(filename);
#else
remove(filename);
#endif
Then, you could place the line
#define unix
at the top of the file when compiling under an old Unix system. (Since all you're using the macro
unix for is to control the #ifdef, you don't need to give it any replacement text at all. Any
definition for a macro, even if the replacement text is empty, causes an #ifdef to succeed.)
(In fact, in this example, you wouldn't even need to define the macro unix at all, because C
compilers on old Unix systems tend to predefine it for you, precisely so you can make tests like
these.)
You want to compile several different versions of your program, with different features present in
the different versions. You bracket the code for each feature with #ifdef directives, and (as for
the previous case) arrange to have the right macros defined or not to build the version you want to
build at any given time. This way, you can build the several different versions from the same
source code. (One common example is whether you turn debugging statements on or off. You can
bracket each debugging printout with #ifdef DEBUG and #endif, and then turn on
debugging only when you need it.)
For example, you might use lines like this:
#ifdef DEBUG
printf("x is %d\n", x);
#endif
to print out the value of the variable x at some point in your program to see if it's what you
expect. To enable debugging printouts, you insert the line
#define DEBUG
at the top of the file, and to turn them off, you delete that line, but the debugging printouts quietly
remain in your code, temporarily deactivated, but ready to reactivate if you find yourself needing
them again later. (Also, instead of inserting and deleting the #define line, you might use a
compiler flag such as -DDEBUG to define the macro DEBUG from the compiler invocatin line.)
Conditional compilation can be very handy, but it can also get out of hand. When large chunks of the
program are completely different depending on, say, what operating system the program is being
compiled for, it's often better to place the different versions in separate source files, and then only use
one of the files (corresponding to one of the versions) to build the program on any given system. Also, if
you are using an ANSI Standard compiler and you are writing ANSI-compatible code, you usually won't
need so much conditional compilation, because the Standard specifies exactly how the compiler must do
certain things, and exactly which library functions it much provide, so you don't have to work so hard to
accommodate the old variations among compilers and libraries.


Pointers are often thought to be the most difficult aspect of C. It's true that many people have various
problems with pointers, and that many programs founder on pointer-related bugs. Actually, though, many
of the problems are not so much with the pointers per se but rather with the memory they point to, and
more specifically, when there isn't any valid memory which they point to. As long as you're careful to
ensure that the pointers in your programs always point to valid memory, pointers can be useful, powerful,
and relatively trouble-free tools. (We'll talk about memory allocation in the next chapter.)
[This chapter is the only one in this series that contains any graphics. If you are using a text-only
browser, there are a few figures you won't be able to see.]
A pointer is a variable that points at, or refers to, another variable. That is, if we have a pointer variable
of type ``pointer to int,`` it might point to the int variable i, or to the third cell of the int array a.
Given a pointer variable, we can ask questions like, ``What's the value of the variable that this pointer
points to?''
Why would we want to have a variable that refers to another variable? Why not just use that other
variable directly? The answer is that a level of indirection can be very useful. (Indirection is just another
word for the situation when one variable refers to another.)
Imagine a club which elects new officers each year. In its clubroom, it might have a set of mailboxes for
each member, along with special mailboxes for the president, secretary, and treasurer. The bank doesn't
mail statements to the treasurer under the treasurer's name; it mails them to ``treasurer,'' and the
statements go to the mailbox marked ``treasurer.'' This way, the bank doesn't have to change the mailing
address it uses every year. The mailboxes labeled ``president,'' ``treasurer,'' and ``secretary'' are a little bit
like pointers--they don't refer to people directly.
If we make the analogy that a mailbox holding letters is like a variable holding numbers, then mailboxes
for the president, secretary, and treasurer aren't quite like pointers, because they're still mailboxes which
in principle could hold letters directly. But suppose that mail is never actually put in those three
mailboxes: suppose each of the officers' mailboxes contains a little marker listing the name of the
member currently holding that office. When you're sorting mail, and you have a letter for the treasurer,
you first go to the treasurer's mailbox, but rather than putting the letter there, you read the name on the
marker there, and put the mail in the mailbox for that person. Similarly, if the club is poorly organized,
and the treasurer stops doing his job, and you're the president, and one day you get a call from the bank
saying that the club's account is in arrears and the treasurer hasn't done anything about it and asking if
you, the president, can look into it; and if the club is so poorly organized that you've forgotten who the
treasurer is, you can go to the treasurer's mailbox, read the name on the marker there, and go to that
mailbox (which is probably overflowing) to find all the treasury-related mail.
We could say that the markers in the mailboxes for the president, secretary, and treasurer were pointers
to other mailboxes. In an analogous way, pointer variables in C contain pointers to other variables or
memory locations.
10.1 Basic Pointer Operations
10.2 Pointers and Arrays; Pointer Arithmetic
10.3 Pointer Subtraction and Comparison
10.4 Null Pointers
10.5 `Èquivalence'' between Pointers and Arrays
10.6 Arrays and Pointers as Function Arguments
10.7 Strings
10.8 Example: Breaking a Line into ``Words''


The first things to do with pointers are to declare a pointer variable, set it to point somewhere, and finally
manipulate the value that it points to. A simple pointer declaration looks like this:
int *ip;
This declaration looks like our earlier declarations, with one obvious difference: that asterisk. The asterisk
means that ip, the variable we're declaring, is not of type int, but rather of type pointer-to-int.
(Another way of looking at it is that *ip, which as we'll see is the value pointed to by ip, will be an
int.)
We may think of setting a pointer variable to point to another variable as a two-step process: first we
generate a pointer to that other variable, then we assign this new pointer to the pointer variable. We can
say (but we have to be careful when we're saying it) that a pointer variable has a value, and that its value
is ``pointer to that other variable''. This will make more sense when we see how to generate pointer
values.
Pointers (that is, pointer values) are generated with the `àddress-of'' operator &, which we can also think
of as the ``pointer-to'' operator. We demonstrate this by declaring (and initializing) an int variable i, and
then setting ip to point to it:
int i = 5;
ip = &i;
The assignment expression ip = &i; contains both parts of the ``two-step process'': &i generates a
pointer to i, and the assignment operator assigns the new pointer to (that is, places it `ìn'') the variable
ip. Now ip ``points to'' i, which we can illustrate with this picture:
i is a variable of type int, so the value in its box is a number, 5. ip is a variable of type pointer-to-int,
so the ``value'' in its box is an arrow pointing at another box. Referring once again back to the ``two-step
process'' for setting a pointer variable: the & operator draws us the arrowhead pointing at i's box, and the
assignment operator =, with the pointer variable ip on its left, anchors the other end of the arrow in ip's
box.
We discover the value pointed to by a pointer using the ``contents-of'' operator, *. Placed in front of a
pointer, the * operator accesses the value pointed to by that pointer. In other words, if ip is a pointer,
then the expression *ip gives us whatever it is that's in the variable or location pointed to by ip. For
example, we could write something like
which would print 5, since ip points to i, and i is (at the moment) 5.
(You may wonder how the asterisk * can be the pointer contents-of operator when it is also the
multiplication operator. There is no ambiguity here: it is the multiplication operator when it sits between
two variables, and it is the contents-of operator when it sits in front of a single variable. The situation is
analogous to the minus sign: between two variables or expressions it's the subtraction operator, but in
front of a single operator or expression it's the negation operator. Technical terms you may hear for these
distinct roles are unary and binary: a binary operator applies to two operands, usually on either side of it,
while a unary operator applies to a single operand.)
The contents-of operator * does not merely fetch values through pointers; it can also set values through
pointers. We can write something like
*ip = 7;
which means ``set whatever ip points to to 7.'' Again, the * tells us to go to the location pointed to by ip,
but this time, the location isn't the one to fetch from--we're on the left-hand sign of an assignment
operator, so *ip tells us the location to store to. (The situation is no different from array subscripting
expressions such as a[3] which we've already seen appearing on both sides of assignments.)
The result of the assignment *ip = 7 is that i's value is changed to 7, and the picture changes to:
If we called printf("%d\n", *ip) again, it would now print 7.

At this point, you may be wondering why we're going through this rigamarole--if we wanted to set i to 7,
why didn't we do it directly? We'll begin to explore that next, but first let's notice the difference between
changing a pointer (that is, changing what variable it points to) and changing the value at the location it
points to. When we wrote *ip = 7, we changed the value pointed to by ip, but if we declare another
variable j:
int j = 3;
and write
ip = &j;
we've changed ip itself. The picture now looks like this:
We have to be careful when we say that a pointer assignment changes ``what the pointer points to.'' Our
earlier assignment
*ip = 7;
changed the value pointed to by ip, but this more recent assignment
ip = &j;
has changed what variable ip points to. It's true that ``what ip points to'' has changed, but this time, it
has changed for a different reason. Neither i (which is still 7) nor j (which is still 3) has changed. (What
has changed is ip's value.) If we again call
this time it will print 3.
We can also assign pointer values to other pointer variables. If we declare a second pointer variable:
int *ip2;
then we can say
ip2 = ip;
Now ip2 points where ip does; we've essentially made a ``copy'' of the arrow:
Now, if we set ip to point back to i again:

ip = &i;
the two arrows point to different places:
We can now see that the two assignments

ip2 = ip;
and
*ip2 = *ip;
do two very different things. The first would make ip2 again point to where ip points (in other words,
back to i again). The second would store, at the location pointed to by ip2, a copy of the value pointed
to by ip; in other words (if ip and ip2 still point to i and j respectively) it would set j to i's value, or
7.
It's important to keep very clear in your mind the distinction between a pointer and what it points to. The
two are like apples and oranges (or perhaps oil and water); you can't mix them. You can't ``set ip to 5'' by
writing something like
ip = 5;
/* WRONG */
5 is an integer, but ip is a pointer. You probably wanted to ``set the value pointed to by ip to 5,'' which
you express by writing
*ip = 5;
Similarly, you can't ``see what ip is'' by writing
printf("%d\n", ip);
/* WRONG */
Again, ip is a pointer-to-int, but %d expects an int. To print what ip points to, use
Finally, a few more notes about pointer declarations. The * in a pointer declaration is related to, but
different from, the contents-of operator *. After we declare a pointer variable
int *ip;
the expression
ip = &i
sets what ip points to (that is, which location it points to), while the expression
*ip = 5
sets the value of the location pointed to by ip. On the other hand, if we declare a pointer variable and
include an initializer:
int *ip3 = &i;
we're setting the initial value for ip3, which is where ip3 will point, so that initial value is a pointer. (In
other words, the * in the declaration int *ip3 = &i; is not the contents-of operator, it's the indicator
that ip3 is a pointer.)
If you have a pointer declaration containing an initialization, and you ever have occasion to break it up
into a simple declaration and a conventional assignment, do it like this:
int *ip3;
ip3 = &i;
Don't write
int *ip3;
*ip3 = &i;
or you'll be trying to mix oil and water again.
Also, when we write
int *ip;
although the asterisk affects ip's type, it goes with the identifier name ip, not with the type int on the
left. To declare two pointers at once, the declaration looks like
int *ip1, *ip2;
Some people write pointer declarations like this:
int* ip;
This works for one pointer, because C essentially ignores whitespace. But if you ever write
int* ip1, ip2;
/* PROBABLY WRONG */
it will declare one pointer-to-int ip1 and one plain int ip2, which is probably not what you meant.
What is all of this good for? If it was just for changing variables like i from 5 to 7, it would not be good
for much. What it's good for, among other things, is when for various reasons we don't know exactly
which variable we want to change, just like the bank didn't know exactly which club member it wanted to
send the statement to.


Pointers do not have to point to single variables. They can also point at the cells of an array. For
example, we can write
int *ip;
int a[10];
ip = &a[3];
and we would end up with ip pointing at the fourth cell of the array a (remember, arrays are 0-based, so
a[0] is the first cell). We could illustrate the situation like this:
We'd use this ip just like the one in the previous section: *ip gives us what ip points to, which in this
case will be the value in a[3].
Once we have a pointer pointing into an array, we can start doing pointer arithmetic. Given that ip is a
pointer to a[3], we can add 1 to ip:
ip + 1
What does it mean to add one to a pointer? In C, it gives a pointer to the cell one farther on, which in this
case is a[4]. To make this clear, let's assign this new pointer to another pointer variable:
ip2 = ip + 1;
Now the picture looks like this:
If we now do
*ip2 = 4;
we've set a[4] to 4. But it's not necessary to assign a new pointer value to a pointer variable in order to
use it; we could also compute a new pointer value and use it immediately:
*(ip + 1) = 5;
In this last example, we've changed a[4] again, setting it to 5. The parentheses are needed because the
unary ``contents of'' operator * has higher precedence (i.e., binds more tightly than) the addition
operator. If we wrote *ip + 1, without the parentheses, we'd be fetching the value pointed to by ip,
and adding 1 to that value. The expression *(ip + 1), on the other hand, accesses the value one past
the one pointed to by ip.
Given that we can add 1 to a pointer, it's not surprising that we can add and subtract other numbers as
well. If ip still points to a[3], then
*(ip + 3) = 7;
sets a[6] to 7, and
*(ip - 2) = 4;
sets a[1] to 4.
Up above, we added 1 to ip and assigned the new pointer to ip2, but there's no reason we can't add one
to a pointer, and change the same pointer:
ip = ip + 1;
Now ip points one past where it used to (to a[4], if we hadn't changed it in the meantime). The
shortcuts we learned in a previous chapter all work for pointers, too: we could also increment a pointer
using
ip += 1;
or
ip++;
Of course, pointers are not limited to ints. It's quite common to use pointers to other types, especially
char. Here is the innards of the mystrcmp function we saw in a previous chapter, rewritten to use
pointers. (mystrcmp, you may recall, compares two strings, character by character.)
char *p1 = &str1[0], *p2 = &str2[0];

while(1)
{
if(*p1 != *p2)
return *p1 - *p2;
if(*p1 == '\0' || *p2 == '\0')
return 0;
p1++;
p2++;
}
The autoincrement operator ++ (like its companion, --) makes it easy to do two things at once. We've
seen idioms like a[i++] which accesses a[i] and simultaneously increments i, leaving it referencing
the next cell of the array a. We can do the same thing with pointers: an expression like *ip++ lets us
access what ip points to, while simultaneously incrementing ip so that it points to the next element. The
preincrement form works, too: *++ip increments ip, then accesses what it points to. Similarly, we can
use notations like *ip-- and *--ip.
As another example, here is the strcpy (string copy) loop from a previous chapter, rewritten to use
pointers:
char *dp = &dest[0], *sp = &src[0];
while(*sp != '\0')
*dp++ = *sp++;
*dp = '\0';
(One question that comes up is whether the expression *p++ increments p or what it points to. The
answer is that it increments p. To increment what p points to, you can use (*p)++.)
When you're doing pointer arithmetic, you have to remember how big the array the pointer points into is,
so that you don't ever point outside it. If the array a has 10 elements, you can't access a[50] or a[-1]
or even a[10] (remember, the valid subscripts for a 10-element array run from 0 to 9). Similarly, if a
has 10 elements and ip points to a[3], you can't compute or access ip + 10 or ip - 5. (There is
one special case: you can, in this case, compute, but not access, a pointer to the nonexistent element just
beyond the end of the array, which in this case is &a[10]. This becomes useful when you're doing
pointer comparisons, which we'll look at next.)

As we've seen, you can add an integer to a pointer to get a new pointer, pointing somewhere beyond the
original (as long as it's in the same array). For example, you might write
ip2 = ip1 + 3;
Applying a little algebra, you might wonder whether
ip2 - ip1 = 3
and the answer is, yes. When you subtract two pointers, as long as they point into the same array, the
result is the number of elements separating them. You can also ask (again, as long as they point into the
same array) whether one pointer is greater or less than another: one pointer is ``greater than'' another if it
points beyond where the other one points. You can also compare pointers for equality and inequality: two
pointers are equal if they point to the same variable or to the same cell in an array, and are (obviously)
unequal if they don't. (When testing for equality or inequality, the two pointers do not have to point into
the same array.)
One common use of pointer comparisons is when copying arrays using pointers. Here is a code fragment
which copies 10 elements from array1 to array2, using pointers. It uses an end pointer, ep, to keep
track of when it should stop copying.
int array1[10], array2[10];
int *ip1, *ip2 = &array2[0];
int *ep = &array1[10];
for(ip1 = &array1[0]; ip1 < ep; ip1++)
*ip2++ = *ip1;
As we mentioned, there is no element array1[10], but it is legal to compute a pointer to this
(nonexistent) element, as long as we only use it in pointer comparisons like this (that is, as long as we
never try to fetch or store the value that it points to.)

10.4 Null Pointers
10.4 Null Pointers

We said that the value of a pointer variable is a pointer to some other variable. There is one other value a pointer may
have: it may be set to a null pointer. A null pointer is a special pointer value that is known not to point anywhere.
What this means that no other valid pointer, to any other variable or array cell or anything else, will ever compare
equal to a null pointer.
The most straightforward way to ``get'' a null pointer in your program is by using the predefined constant NULL,
which is defined for you by several standard header files, including <stdio.h>, <stdlib.h>, and
<string.h>. To initialize a pointer to a null pointer, you might use code like
#include <stdio.h>
int *ip = NULL;
and to test it for a null pointer before inspecting the value pointed to you might use code like
if(ip != NULL)
It is also possible to refer to the null pointer by using a constant 0, and you will see some code that sets null pointers
by simply doing
int *ip = 0;
(In fact, NULL is a preprocessor macro which typically has the value, or replacement text, 0.)
Furthermore, since the definition of ``true'' in C is a value that is not equal to 0, you will see code that tests for nonnull pointers with abbreviated code like
if(ip)
This has the same meaning as our previous example; if(ip) is equivalent to if(ip != 0) and to if(ip !=
NULL).
All of these uses are legal, and although I recommend that you use the constant NULL for clarity, you will come
across the other forms, so you should be able to recognize them.
You can use a null pointer as a placeholder to remind yourself (or, more importantly, to help your program remember)
that a pointer variable does not point anywhere at the moment and that you should not use the ``contents of'' operator
on it (that is, you should not try to inspect what it points to, since it doesn't point to anything). A function that returns
pointer values can return a null pointer when it is unable to perform its task. (A null pointer used in this way is
analogous to the EOF value that functions like getchar return.)
As an example, let us write our own version of the standard library function strstr, which looks for one string
10.4 Null Pointers
within another, returning a pointer to the string if it can, or a null pointer if it cannot. Here is the function, using the
obvious brute-force algorithm: at every character of the input string, the code checks for a match there of the pattern
string:
#include <stddef.h>
char *mystrstr(char input[], char pat[])
{
char *start, *p1, *p2;
for(start = &input[0]; *start != '\0'; start++)
{
/* for each position in input string... */
p1 = pat;
/* prepare to check for pattern string there */
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2) /* characters differ */
break;
p1++;
p2++;
}
if(*p1 == '\0')
/* found match */
return start;
}
return NULL;
}
The start pointer steps over each character position in the input string. At each character, the inner loop checks
for a match there, by using p1 to step over the pattern string (pat), and p2 to step over the input string (starting at
start). We compare successive characters until either (a) we reach the end of the pattern string (*p1 == '\0'),
or (b) we find two characters which differ. When we're done with the inner loop, if we reached the end of the pattern
string (*p1 == '\0'), it means that all preceding characters matched, and we found a complete match for the
pattern starting at start, so we return start. Otherwise, we go around the outer loop again, to try another starting
position. If we run out of those (if *start == '\0'), without finding a match, we return a null pointer.
Notice that the function is declared as returning (and does in fact return) a pointer-to-char.
We can use mystrstr (or its standard library counterpart strstr) to determine whether one string contains
another:
if(mystrstr("Hello, world!", "lo") == NULL)
printf("no\n");
else
printf("yes\n");
In general, C does not initialize pointers to null for you, and it never tests pointers to see if they are null before using
them. If one of the pointers in your programs points somewhere some of the time but not all of the time, an excellent
convention to use is to set it to a null pointer when it doesn't point anywhere valid, and to test to see if it's a null
pointer before using it. But you must use explicit code to set it to NULL, and to test it against NULL. (In other words,
10.4 Null Pointers
just setting an unused pointer variable to NULL doesn't guarantee safety; you also have to check for the null value
before using the pointer.) On the other hand, if you know that a particular pointer variable is always valid, you don't
have to insert a paranoid test against NULL before using it.


There are a number of similarities between arrays and pointers in C. If you have an array
int a[10];
you can refer to a[0], a[1], a[2], etc., or to a[i] where i is an int. If you declare a pointer
variable ip and set it to point to the beginning of an array:
int *ip = &a[0];
you can refer to *ip, *(ip+1), *(ip+2), etc., or to *(ip+i) where i is an int.
There are also differences, of course. You cannot assign two arrays; the code
int a[10], b[10];
a = b;
/* WRONG */
is illegal. As we've seen, though, you can assign two pointer variables:
int *ip1, *ip2;
ip1 = &a[0];
ip2 = ip1;
Pointer assignment is straightforward; the pointer on the left is simply made to point wherever the pointer
on the right does. We haven't copied the data pointed to (there's still just one copy, in the same place);
we've just made two pointers point to that one place.
The similarities between arrays and pointers end up being quite useful, and in fact C builds on the
similarities, leading to what is called ``the equivalence of arrays and pointers in C.'' When we speak of
this `èquivalence'' we do not mean that arrays and pointers are the same thing (they are in fact quite
different), but rather that they can be used in related ways, and that certain operations may be used
between them.
The first such operation is that it is possible to (apparently) assign an array to a pointer:
int a[10];
int *ip;
ip = a;
What can this mean? In that last assignment ip = a, aren't we mixing apples and oranges again? It
turns out that we are not; C defines the result of this assignment to be that ip receives a pointer to the
first element of a. In other words, it is as if you had written
ip = &a[0];
The second facet of the equivalence is that you can use the `àrray subscripting'' notation [i] on
pointers, too. If you write
ip[3]
it is just as if you had written
*(ip + 3)
So when you have a pointer that points to a block of memory, such as an array or a part of an array, you
can treat that pointer `às if'' it were an array, using the convenient [i] notation. In other words, at the
beginning of this section when we talked about *ip, *(ip+1), *(ip+2), and *(ip+i), we could
have written ip[0], ip[1], ip[2], and ip[i]. As we'll see, this can be quite useful (or at least
convenient).
The third facet of the equivalence (which is actually a more general version of the first one we
mentioned) is that whenever you mention the name of an array in a context where the ``value'' of the
array would be needed, C automatically generates a pointer to the first element of the array, as if you had
written &array[0]. When you write something like
int a[10];
int *ip;
ip = a + 3;
it is as if you had written
ip = &a[0] + 3;
which (and you might like to convince yourself of this) gives the same result as if you had written
ip = &a[3];
For example, if the character array
char string[100];
contains some string, here is another way to find its length:

int len;
char *p;
for(p = string; *p != '\0'; p++)
;
len = p - string;
After the loop, p points to the '\0' terminating the string. The expression p - string is equivalent
to p - &string[0], and gives the length of the string. (Of course, we could also call strlen; in
fact here we've essentially written another implementation of strlen.)


Earlier, we learned that functions in C receive copies of their arguments. (This means that C uses call by
value; it means that a function can modify one of its arguments without modifying the value in the
caller.) We didn't say so at the time, but when a function is called, the copies of the arguments are made
as if by assignment. But since arrays can't be assigned, how can a function receive an array as an
argument? The answer will explain why arrays are an apparent exception to the rule that functions cannot
modify their arguments.
We've been regularly calling a function getline like this:
char line[100];
getline(line, 100);
with the intention that getline read the next line of input into the character array line. But in the
previous paragraph, we learned that when we mention the name of an array in an expression, the
compiler generates a pointer to its first element. So the call above is as if we had written
char line[100];
getline(&line[0], 100);
In other words, the getline function does not receive an array of char at all; it actually receives a
pointer to char!
As we've seen throughout this chapter, it's straightforward to manipulate the elements of an array using
pointers, so there's no particular insurmountable difficulty if getline receives a pointer. One question
remains, though: we had been defining getline with its line parameter declared as an array:
int getline(char line[], int max)
{
...
}
We mentioned that we didn't have to specify a size for the line parameter, with the explanation that
getline really used the array in its caller, where the actual size was specified. But that declaration
certainly does look like an array--how can it work when getline actually receives a pointer?
The answer is that the C compiler does a little something behind your back. It knows that whenever you
mention an array name in an expression, it (the compiler) generates a pointer to the array's first element.
Therefore, it knows that a function can never actually receive an array as a parameter. Therefore,
whenever it sees you defining a function that seems to accept an array as a parameter, the compiler
quietly pretends that you had declared it as accepting a pointer, instead. The definition of getline
above is compiled exactly as if it had been written
int getline(char *line, int max)
{
...
}
Let's look at how getline might be written if we thought of its first parameter (argument) as a pointer,
instead:
int
{
int
int
max
getline(char *line, int max)

nch = 0;
c;
= max - 1;
#ifndef FGETLINE
#else
while((c = getc(fp)) != EOF)
#endif
{
if(c == '\n')
break;
if(nch < max)
{
*(line + nch) = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
*(line + nch) = '\0';
return nch;
}
But, as we've learned, we can also use `àrray subscript'' notation with pointers, so we could rewrite the
pointer version of getline like this:

int
{
int
int
max
getline(char *line, int max)

nch = 0;
c;
= max - 1;
#ifndef FGETLINE
#else
#endif
{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
But this is exactly what we'd written before (see chapter 6, Sec. 6.3), except that the declaration of the
line parameter is different. In other words, within the body of the function, it hardly matters whether
we thought line was an array or a pointer, since we can use array subscripting notation with both arrays
and pointers.
These games that the compiler is playing with arrays and pointers may seem bewildering at first, and it
may seem faintly miraculous that everything comes out in the wash when you declare a function like
getline that seems to accept an array. The equivalence in C between arrays and pointers can be
confusing, but it does work and is one of the central features of C. If the games which the compiler plays
(pretending that you declared a parameter as a pointer when you thought you declared it as an array)
bother you, you can do two things:
1. Continue to pretend that functions can receive arrays as parameters; declare and use them that
way, but remember that unlike other arguments, a function can modify the copy in its caller of an
argument that (seems to be) an array.
2. Realize that arrays are always passed to functions as pointers, and always declare your functions
as accepting pointers.

10.7 Strings
10.7 Strings
Because of the `èquivalence'' of arrays and pointers, it is extremely common to refer to and manipulate
strings as character pointers, or char *'s. It is so common, in fact, that it is easy to forget that strings
are arrays, and to imagine that they're represented by pointers. (Actually, in the case of strings, it may not
even matter that much if the distinction gets a little blurred; there's certainly nothing wrong with referring
to a character pointer, suitably initialized, as a ``string.'') Let's look at a few of the implications:
1. Any function that manipulates a string will actually accept it as a char * argument. The caller
may pass an array containing a string, but the function will receive a pointer to the array's
(string's) first element (character).
2. The %s format in printf expects a character pointer.
3. Although you have to use strcpy to copy a string from one array to another, you can use simple
pointer assignment to assign a string to a pointer. The string being assigned might either be in an
array or pointed to by another pointer. In other words, given
char *p1, *p2;
both
p1 = string
and
p2 = p1
are legal. (Remember, though, that when you assign a pointer, you're making a copy of the pointer
but not of the data it points to. In the first example, p1 ends up pointing to the string in string.
In the second example, p2 ends up pointing to the same string as p1. In any case, after a pointer
assignment, if you ever change the string (or other data) pointed to, the change is ``visible'' to both
pointers.
4. Many programs manipulate strings exclusively using character pointers, never explicitly declaring
any actual arrays. As long as these programs are careful to allocate appropriate memory for the
strings, they're perfectly valid and correct.
When you start working heavily with strings, however, you have to be aware of one subtle fact.
When you initialize a character array with a string constant:
10.7 Strings
you end up with an array containing the string, and you can modify the array's contents to your heart's
content:
string[0] = 'J';
However, it's possible to use string constants (the formal term is string literals) at other places in your
code. Since they're arrays, the compiler generates pointers to their first elements when they're used in
expressions, as usual. That is, if you say
char *p1 = "Hello";
int len = strlen("world");
it's almost as if you'd said
char internal_string_1[] = "Hello";
char internal_string_2[] = "world";
char *p1 = &internal_string_1[0];
int len = strlen(&internal_string_2[0]);
Here, the arrays named internal_string_1 and internal_string_2 are supposed to suggest
the fact that the compiler is actually generating little temporary arrays every time you use a string
constant in your code. However, the subtle fact is that the arrays which are ``behind'' the string constants
are not necessarily modifiable. In particular, the compiler may store them in read-only-memory.
Therefore, if you write
p3[0] = 'J';
your program may crash, because it may try to store a value (in this case, the character 'J') into
nonwritable memory.
The moral is that whenever you're building or modifying strings, you have to make sure that the memory
you're building or modifying them in is writable. That memory should either be an array you've
allocated, or some memory which you've dynamically allocated by the techniques which we'll see in the
next chapter. Make sure that no part of your program will ever try to modify a string which is actually
one of the unnamed, unwritable arrays which the compiler generated for you in response to one of your
string constants. (The only exception is array initialization, because if you write to such an array, you're
writing to the array, not to the string literal which you used to initialize the array.)
10.7 Strings


In an earlier assignment, an `èxtra credit'' version of a problem asked you to write a little checkbook
balancing program that accepted a series of lines of the form
deposit 1000
check 10
check 12.34
deposit 50
check 20
It was a surprising nuisance to do this in an ad hoc way, using only the tools we had at the time. It was
easy to read each line, but it was cumbersome to break it up into the word (``deposit'' or ``check'') and the
amount.
I find it very convenient to use a more general approach: first, break lines like these into a series of
whitespace-separated words, then deal with each word separately. To do this, we will use an array of
pointers to char, which we can also think of as an `àrray of strings,'' since a string is an array of char,
and a pointer-to-char can easily point at a string. Here is the declaration of such an array:
char *words[10];
This is the first complicated C declaration we've seen: it says that words is an array of 10 pointers to
char. We're going to write a function, getwords, which we can call like this:
int nwords;
nwords = getwords(line, words, 10);
where line is the line we're breaking into words, words is the array to be filled in with the (pointers to
the) words, and nwords (the return value from getwords) is the number of words which the function
finds. (As with getline, we tell the function the size of the array so that if the line should happen to
contain more words than that, it won't overflow the array).
Here is the definition of the getwords function. It finds the beginning of each word, places a pointer to
it in the array, finds the end of that word (which is signified by at least one whitespace character) and
terminates the word by placing a '\0' character after it. (The '\0' character will overwrite the first
whitespace character following the word.) Note that the original input string is therefore modified by
getwords: if you were to try to print the input line after calling getwords, it would appear to contain
only its first word (because of the first inserted '\0').
#include <stddef.h>
http://www.eskimo.com/~scs/cclass/notes/sx10h.html (1 of 3) [22/07/2003 5:32:24 PM]
#include <ctype.h>
getwords(char *line, char *words[], int maxwords)
{
char *p = line;
int nwords = 0;
while(1)
{
while(isspace(*p))
p++;
if(*p == '\0')
return nwords;
words[nwords++] = p;
while(!isspace(*p) && *p != '\0')
p++;
if(*p == '\0')
return nwords;
*p++ = '\0';
if(nwords >= maxwords)
return nwords;
}
}
Each time through the outer while loop, the function tries to find another word. First it skips over
whitespace (which might be leading spaces on the line, or the space(s) separating this word from the
previous one). The isspace function is new: it's in the standard library, declared in the header file
<ctype.h>, and it returns nonzero (``true'') if the character you hand it is a space character (a space or
a tab, or any other whitespace character there might happen to be).
When the function finds a non-whitespace character, it has found the beginning of another word, so it
places the pointer to that character in the next cell of the words array. Then it steps though the word,
looking at non-whitespace characters, until it finds another whitespace character, or the \0 at the end of
the line. If it finds the \0, it's done with the entire line; otherwise, it changes the whitespace character to
a \0, to terminate the word it's just found, and continues. (If it's found as many words as will fit in the
words array, it returns prematurely.)
Each time it finds a word, the function increments the number of words (nwords) it has found. Since
arrays in C start at [0], the number of words the function has found so far is also the index of the cell in
the words array where the next word should be stored. The function actually assigns the next word and
increments nwords in one expression:
words[nwords++] = p;
You should convince yourself that this arrangement works, and that (in this case) the preincrement form
words[++nwords] = p;
/* WRONG */
would not behave as desired.

When the function is done (when it finds the \0 terminating the input line, or when it runs out of cells in
the words array) it returns the number of words it has found.
Here is a complete example of calling getwords:
char line[] = "this is a test";
int i;
nwords = getwords(line, words, 10);
for(i = 0; i < nwords; i++)
printf("%s\n", words[i]);


In this chapter, we'll meet malloc, C's dynamic memory allocation function, and we'll cover dynamic
memory allocation in some detail.
As we begin doing dynamic memory allocation, we'll begin to see (if we haven't seen it already) what
pointers can really be good for. Many of the pointer examples in the previous chapter (those which used
pointers to access arrays) didn't do all that much for us that we couldn't have done using arrays.
However, when we begin doing dynamic memory allocation, pointers are the only way to go, because
what malloc returns is a pointer to the memory it gives us. (Due to the equivalence between pointers
and arrays, though, we will still be able to think of dynamically allocated regions of storage as if they
were arrays, and even to use array-like subscripting notation on them.)
You have to be careful with dynamic memory allocation. malloc operates at a pretty ``low level''; you
will often find yourself having to do a certain amount of work to manage the memory it gives you. If you
don't keep accurate track of the memory which malloc has given you, and the pointers of yours which
point to it, it's all too easy to accidentally use a pointer which points ``nowhere'', with generally
unpleasant results. (The basic problem is that if you assign a value to the location pointed to by a pointer:
*p = 0;
and if the pointer p points ``nowhere'', well actually it can be construed to point somewhere, just not
where you wanted it to, and that ``somewhere'' is where the 0 gets written. If the ``somewhere'' is
memory which is in use by some other part of your program, or even worse, if the operating system has
not protected itself from you and ``somewhere'' is in fact in use by the operating system, things could get
ugly.)
11.1 Allocating Memory with malloc
11.2 Freeing Memory
11.3 Reallocating Memory Blocks
11.4 Pointer Safety

11.1 Allocating Memory with <TT>malloc</TT>
11.1 Allocating Memory with malloc

[This section corresponds to parts of K&R Secs. 5.4, 5.6, 6.5, and 7.8.5]
A problem with many simple programs, including in particular little teaching programs such as we've
been writing so far, is that they tend to use fixed-size arrays which may or may not be big enough. We
have an array of 100 ints for the numbers which the user enters and wishes to find the average of--what
if the user enters 101 numbers? We have an array of 100 chars which we pass to getline to receive
the user's input--what if the user types a line of 200 characters? If we're lucky, the relevant parts of the
program check how much of an array they've used, and print an error message or otherwise gracefully
abort before overflowing the array. If we're not so lucky, a program may sail off the end of an array,
overwriting other data and behaving quite badly. In either case, the user doesn't get his job done. How can
we avoid the restrictions of fixed-size arrays?
The answers all involve the standard library function malloc. Very simply, malloc returns a pointer to
n bytes of memory which we can do anything we want to with. If we didn't want to read a line of input
into a fixed-size array, we could use malloc, instead. Here's the first step:
#include <stdlib.h>
char *line;
int linelen = 100;
line = malloc(linelen);
/* incomplete -- malloc's return value not checked */
getline(line, linelen);
malloc is declared in <stdlib.h>, so we #include that header in any program that calls malloc.
A ``byte'' in C is, by definition, an amount of storage suitable for storing one character, so the above
invocation of malloc gives us exactly as many chars as we ask for. We could illustrate the resulting
pointer like this:
The 100 bytes of memory (not all of which are shown) pointed to by line are those allocated by
malloc. (They are brand-new memory, conceptually a bit different from the memory which the compiler
arranges to have allocated automatically for our conventional variables. The 100 boxes in the figure don't
have a name next to them, because they're not storage for a variable we've declared.)
As a second example, we might have occasion to allocate a piece of memory, and to copy a string into it
with strcpy:
char *p = malloc(15);

strcpy(p, "Hello, world!");
When copying strings, remember that all strings have a terminating \0 character. If you use strlen to
count the characters in a string for you, that count will not include the trailing \0, so you must add one
before calling malloc:
char *somestring, *copy;
...
copy = malloc(strlen(somestring) + 1);
/* +1 for \0 */
strcpy(copy, somestring);
What if we're not allocating characters, but integers? If we want to allocate 100 ints, how many bytes is
that? If we know how big ints are on our machine (i.e. depending on whether we're using a 16- or 32-bit
machine) we could try to compute it ourselves, but it's much safer and more portable to let C compute it
for us. C has a sizeof operator, which computes the size, in bytes, of a variable or type. It's just what
we need when calling malloc. To allocate space for 100 ints, we could call
int *ip = malloc(100 * sizeof(int));
The use of the sizeof operator tends to look like a function call, but it's really an operator, and it does
its work at compile time.
Since we can use array indexing syntax on pointers, we can treat a pointer variable after a call to malloc
almost exactly as if it were an array. In particular, after the above call to malloc initializes ip to point at
storage for 100 ints, we can access ip[0], ip[1], ... up to ip[99]. This way, we can get the effect
of an array even if we don't know until run time how big the `àrray'' should be. (In a later section we'll
see how we might deal with the case where we're not even sure at the point we begin using it how big an
`àrray'' will eventually have to be.)
Our examples so far have all had a significant omission: they have not checked malloc's return value.
Obviously, no real computer has an infinite amount of memory available, so there is no guarantee that
malloc will be able to give us as much memory as we ask for. If we call malloc(100000000), or if
we call malloc(10) 10,000,000 times, we're probably going to run out of memory.
When malloc is unable to allocate the requested memory, it returns a null pointer. A null pointer,
remember, points definitively nowhere. It's a ``not a pointer'' marker; it's not a pointer you can use. (As
we said in section 9.4, a null pointer can be used as a failure return from a function that returns pointers,
and malloc is a perfect example.) Therefore, whenever you call malloc, it's vital to check the returned
pointer before using it! If you call malloc, and it returns a null pointer, and you go off and use that null
pointer as if it pointed somewhere, your program probably won't last long. Instead, a program should
immediately check for a null pointer, and if it receives one, it should at the very least print an error
message and exit, or perhaps figure out some way of proceeding without the memory it asked for. But it
cannot go on to use the null pointer it got back from malloc in any way, because that null pointer by
definition points nowhere. (`Ìt cannot use a null pointer in any way'' means that the program cannot use
the * or [] operators on such a pointer value, or pass it to any function that expects a valid pointer.)
A call to malloc, with an error check, typically looks something like this:
int *ip = malloc(100 * sizeof(int));
if(ip == NULL)
{
exit or return
}
After printing the error message, this code should return to its caller, or exit from the program entirely; it
cannot proceed with the code that would have used ip.
Of course, in our examples so far, we've still limited ourselves to ``fixed size'' regions of memory,
because we've been calling malloc with fixed arguments like 10 or 100. (Our call to getline is still
limited to 100-character lines, or whatever number we set the linelen variable to; our ip variable still
points at only 100 ints.) However, since the sizes are now values which can in principle be determined
at run-time, we've at least moved beyond having to recompile the program (with a bigger array) to
accommodate longer lines, and with a little more work, we could arrange that the `àrrays'' automatically
grew to be as large as required. (For example, we could write something like getline which could read
the longest input line actually seen.) We'll begin to explore this possibility in a later section.

11.2 Freeing Memory
11.2 Freeing Memory

Memory allocated with malloc lasts as long as you want it to. It does not automatically disappear when
a function returns, as automatic-duration variables do, but it does not have to remain for the entire
duration of your program, either. Just as you can use malloc to control exactly when and how much
memory you allocate, you can also control exactly when you deallocate it.
In fact, many programs use memory on a transient basis. They allocate some memory, use it for a while,
but then reach a point where they don't need that particular piece any more. Because memory is not
inexhaustible, it's a good idea to deallocate (that is, release or free) memory you're no longer using.
Dynamically allocated memory is deallocated with the free function. If p contains a pointer previously
returned by malloc, you can call
free(p);
which will ``give the memory back'' to the stock of memory (sometimes called the `àrena'' or ``pool'')
from which malloc requests are satisfied. Calling free is sort of the ultimate in recycling: it costs you
almost nothing, and the memory you give back is immediately usable by other parts of your program.
(Theoretically, it may even be usable by other programs.)
(Freeing unused memory is a good idea, but it's not mandatory. When your program exits, any memory
which it has allocated but not freed should be automatically released. If your computer were to somehow
``lose'' memory just because your program forgot to free it, that would indicate a problem or deficiency
in your operating system.)
Naturally, once you've freed some memory you must remember not to use it any more. After calling
free(p);
it is probably the case that p still points at the same memory. However, since we've given it back, it's
now `àvailable,'' and a later call to malloc might give that memory to some other part of your
program. If the variable p is a global variable or will otherwise stick around for a while, one good way to
record the fact that it's not to be used any more would be to set it to a null pointer:
free(p);
p = NULL;
Now we don't even have the pointer to the freed memory any more, and (as long as we check to see that
p is non-NULL before using it), we won't misuse any memory via the pointer p.
11.2 Freeing Memory
When thinking about malloc, free, and dynamically-allocated memory in general, remember again
the distinction between a pointer and what it points to. If you call malloc to allocate some memory, and
store the pointer which malloc gives you in a local pointer variable, what happens when the function
containing the local pointer variable returns? If the local pointer variable has automatic duration (which
is the default, unless the variable is declared static), it will disappear when the function returns. But
for the pointer variable to disappear says nothing about the memory pointed to! That memory still exists
and, as far as malloc and free are concerned, is still allocated. The only thing that has disappeared is
the pointer variable you had which pointed at the allocated memory. (Furthermore, if it contained the
only copy of the pointer you had, once it disappears, you'll have no way of freeing the memory, and no
way of using it, either. Using memory and freeing memory both require that you have at least one pointer
to the memory!)


Sometimes you're not sure at first how much memory you'll need. For example, if you need to store a series
of items you read from the user, and if the only way to know how many there are is to read them until the
user types some `ènd'' signal, you'll have no way of knowing, as you begin reading and storing the first few,
how many you'll have seen by the time you do see that `ènd'' marker. You might want to allocate room for,
say, 100 items, and if the user enters a 101st item before entering the `ènd'' marker, you might wish for a
way to say `ùh, malloc, remember those 100 items I asked for? Could I change my mind and have 200
instead?''
In fact, you can do exactly this, with the realloc function. You hand realloc an old pointer (such as
you received from an initial call to malloc) and a new size, and realloc does what it can to give you a
chunk of memory big enough to hold the new size. For example, if we wanted the ip variable from an
earlier example to point at 200 ints instead of 100, we could try calling
ip = realloc(ip, 200 * sizeof(int));
Since you always want each block of dynamically-allocated memory to be contiguous (so that you can treat
it as if it were an array), you and realloc have to worry about the case where realloc can't make the
old block of memory bigger `ìn place,'' but rather has to relocate it elsewhere in order to find enough
contiguous space for the new requested size. realloc does this by returning a new pointer. If realloc
was able to make the old block of memory bigger, it returns the same pointer. If realloc has to go
elsewhere to get enough contiguous memory, it returns a pointer to the new memory, after copying your old
data there. (In this case, after it makes the copy, it frees the old block.) Finally, if realloc can't find
enough memory to satisfy the new request at all, it returns a null pointer. Therefore, you usually don't want
to overwrite your old pointer with realloc's return value until you've tested it to make sure it's not a null
pointer. You might use code like this:
int *newp;
newp = realloc(ip, 200 * sizeof(int));
if(newp != NULL)
ip = newp;
else
{
/* exit or return */
/* but ip still points at 100 ints */
}
If realloc returns something other than a null pointer, it succeeded, and we set ip to what it returned.
(We've either set ip to what it used to be or to a new pointer, but in either case, it points to where our data is
now.) If realloc returns a null pointer, however, we hang on to our old pointer in ip which still points at
our original 100 values.
Putting this all together, here is a piece of code which reads lines of text from the user, treats each line as an
integer by calling atoi, and stores each integer in a dynamically-allocated `àrray'':
#define MAXLINE 100
char line[MAXLINE];
int *ip;
int nalloc, nitems;
nalloc = 100;
ip = malloc(nalloc * sizeof(int));
if(ip == NULL)
{
exit(1);
}
/* initial allocation */
nitems = 0;
while(getline(line, MAXLINE) != EOF)
{
if(nitems >= nalloc)
{
/* increase allocation */
int *newp;
nalloc += 100;
newp = realloc(ip, nalloc * sizeof(int));
if(newp == NULL)
{
exit(1);
}
ip = newp;
}
ip[nitems++] = atoi(line);
}
We use two different variables to keep track of the `àrray'' pointed to by ip. nalloc is now many
elements we've allocated, and nitems is how many of them are in use. Whenever we're about to store
another item in the `àrray,'' if nitems >= nalloc, the old `àrray'' is full, and it's time to call realloc
to make it bigger.
Finally, we might ask what the return type of malloc and realloc is, if they are able to return pointers to
char or pointers to int or (though we haven't seen it yet) pointers to any other type. The answer is that
both of these functions are declared (in <stdlib.h>) as returning a type we haven't seen, void * (that is,
pointer to void). We haven't really seen type void, either, but what's going on here is that void * is
specially defined as a ``generic'' pointer type, which may be used (strictly speaking, assigned to or from) any
pointer type.

11.4 Pointer Safety
11.4 Pointer Safety

At the beginning of the previous chapter, we said that the hard thing about pointers is not so much
manipulating them as ensuring that the memory they point to is valid. When a pointer doesn't point
where you think it does, if you inadvertently access or modify the memory it points to, you can damage
other parts of your program, or (in some cases) other programs or the operating system itself!
When we use pointers to simple variables, as in section 10.1, there's not much that can go wrong. When
we use pointers into arrays, as in section 10.2, and begin moving the pointers around, we have to be more
careful, to ensure that the roving pointers always stay within the bounds of the array(s). When we begin
passing pointers to functions, and especially when we begin returning them from functions (as in the
strstr function of section 10.4) we have to be more careful still, because the code using the pointer
may be far removed from the code which owns or allocated the memory.
One particular problem concerns functions that return pointers. Where is the memory to which the
returned pointer points? Is it still around by the time the function returns? The strstr function returns
either a null pointer (which points definitively nowhere, and which the caller presumably checks for) or it
returns a pointer which points into the input string, which the caller supplied, which is pretty safe. One
thing a function must not do, however, is return a pointer to one of its own, local, automatic-duration
arrays. Remember that automatic-duration variables (which includes all non-static local variables),
including automatic-duration arrays, are deallocated and disappear when the function returns. If a
function returns a pointer to a local array, that pointer will be invalid by the time the caller tries to use it.
Finally, when we're doing dynamic memory allocation with malloc, realloc, and free, we have to
be most careful of all. Dynamic allocation gives us a lot more flexibility in how our programs use
memory, although with that flexibility comes the responsibility that we manage dynamically allocated
memory carefully. The possibilities for misdirected pointers and associated mayhem are greatest in
programs that make heavy use of dynamic memory allocation. You can reduce these possibilities by
designing your program in such a way that it's easy to ensure that pointers are used correctly and that
memory is always allocated and deallocated correctly. (If, on the other hand, your program is designed in
such a way that meeting these guarantees is a tedious nuisance, sooner or later you'll forget or neglect to,
and maintenance will be a nightmare.)


So far, we've been calling printf to print formatted output to the ``standard output'' (wherever that is).
We've also been calling getchar to read single characters from the ``standard input,'' and putchar to
write single characters to the standard output. ``Standard input'' and ``standard output'' are two predefined
I/O streams which are implicitly available to us. In this chapter we'll learn how to take control of input
and output by opening our own streams, perhaps connected to data files, which we can read from and
write to.
12.1 File Pointers and fopen
12.2 I/O with File Pointers
12.3 Predefined Streams
12.4 Closing Files
12.5 Example: Reading a Data File

12.1 File Pointers and <TT>fopen</TT>
12.1 File Pointers and fopen

How will we specify that we want to access a particular data file? It would theoretically be possible to
mention the name of a file each time it was desired to read from or write to it. But such an approach
would have a number of drawbacks. Instead, the usual approach (and the one taken in C's stdio library) is
that you mention the name of the file once, at the time you open it. Thereafter, you use some little token-in this case, the file pointer--which keeps track (both for your sake and the library's) of which file you're
talking about. Whenever you want to read from or write to one of the files you're working with, you
identify that file by using its file pointer (that is, the file pointer you obtained when you opened the file).
As we'll see, you store file pointers in variables just as you store any other data you manipulate, so it is
possible to have several files open, as long as you use distinct variables to store the file pointers.
You declare a variable to store a file pointer like this:
FILE *fp;
The type FILE is predefined for you by <stdio.h>. It is a data structure which holds the information
the standard I/O library needs to keep track of the file for you. For historical reasons, you declare a
variable which is a pointer to this FILE type. The name of the variable can (as for any variable) be
anything you choose; it is traditional to use the letters fp in the variable name (since we're talking about
a file pointer). If you were reading from two files at once you'd probably use two file pointers:
FILE *fp1, *fp2;
If you were reading from one file and writing to another you might declare and input file pointer and an
output file pointer:
FILE *ifp, *ofp;
Like any pointer variable, a file pointer isn't any good until it's initialized to point to something.
(Actually, no variable of any type is much good until you've initialized it.) To actually open a file, and
receive the ``token'' which you'll store in your file pointer variable, you call fopen. fopen accepts a
file name (as a string) and a mode value indicating among other things whether you intend to read or
write this file. (The mode variable is also a string.) To open the file input.dat for reading you might
call
ifp = fopen("input.dat", "r");
The mode string "r" indicates reading. Mode "w" indicates writing, so we could open output.dat
12.1 File Pointers and <TT>fopen</TT>
for output like this:

ofp = fopen("output.dat", "w");
The other values for the mode string are less frequently used. The third major mode is "a" for append.
(If you use "w" to write to a file which already exists, its old contents will be discarded.) You may also
add a + character to the mode string to indicate that you want to both read and write, or a b character to
indicate that you want to do ``binary'' (as opposed to text) I/O.
One thing to beware of when opening files is that it's an operation which may fail. The requested file
might not exist, or it might be protected against reading or writing. (These possibilities ought to be
obvious, but it's easy to forget them.) fopen returns a null pointer if it can't open the requested file, and
it's important to check for this case before going off and using fopen's return value as a file pointer.
Every call to fopen will typically be followed with a test, like this:
ifp = fopen("input.dat", "r");
if(ifp == NULL)
{
printf("can't open file\n");
exit or return
}
If fopen returns a null pointer, and you store it in your file pointer variable and go off and try to do I/O
with it, your program will typically crash.
It's common to collapse the call to fopen and the assignment in with the test:
if((ifp = fopen("input.dat", "r")) == NULL)
{
printf("can't open file\n");
exit or return
}
You don't have to write these ``collapsed'' tests if you're not comfortable with them, but you'll see them
in other people's code, so you should be able to read them.


For each of the I/O library functions we've been using so far, there's a companion function which accepts
an additional file pointer argument telling it where to read from or write to. The companion function to
printf is fprintf, and the file pointer argument comes first. To print a string to the output.dat
file we opened in the previous section, we might call
fprintf(ofp, "Hello, world!\n");
The companion function to getchar is getc, and the file pointer is its only argument. To read a
character from the input.dat file we opened in the previous section, we might call
int c;
c = getc(ifp);
The companion function to putchar is putc, and the file pointer argument comes last. To write a
character to output.dat, we could call
putc(c, ofp);
Our own getline function calls getchar and so always reads the standard input. We could write a
companion fgetline function which reads from an arbitrary file pointer:
#include <stdio.h>
/*
/*
/*
/*
int
{
int
int
max
Read one line from fp, */

copying it to line array (but no more than max chars). */
Does not place terminating \n in line array. */
Returns line length, or 0 for empty line, or EOF for end-of-file. */
fgetline(FILE *fp, char line[], int max)
nch = 0;
c;
= max - 1;

{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
Now we could read one line from ifp by calling
char line[MAXLINE];
...
fgetline(ifp, line, MAXLINE);


Besides the file pointers which we explicitly open by calling fopen, there are also three predefined
streams. stdin is a constant file pointer corresponding to standard input, and stdout is a constant file
pointer corresponding to standard output. Both of these can be used anywhere a file pointer is called for;
for example, getchar() is the same as getc(stdin) and putchar(c) is the same as putc(c,
stdout). The third predefined stream is stderr. Like stdout, stderr is typically connected to
the screen by default. The difference is that stderr is not redirected when the standard output is
redirected. For example, under Unix or MS-DOS, when you invoke
program > filename
anything printed to stdout is redirected to the file filename, but anything printed to stderr still
goes to the screen. The intent behind stderr is that it is the ``standard error output''; error messages
printed to it will not disappear into an output file. For example, a more realistic way to print an error
message when a file can't be opened would be
if((ifp = fopen(filename, "r")) == NULL)
{
fprintf(stderr, "can't open file %s\n", filename);
exit or return
}
where filename is a string variable indicating the file name to be opened. Not only is the error
message printed to stderr, but it is also more informative in that it mentions the name of the file that
couldn't be opened. (We'll see another example in the next chapter.)

12.4 Closing Files
12.4 Closing Files

Although you can open multiple files, there's a limit to how many you can have open at once. If your
program will open many files in succession, you'll want to close each one as you're done with it;
otherwise the standard I/O library could run out of the resources it uses to keep track of open files.
Closing a file simply involves calling fclose with the file pointer as its argument:
fclose(fp);
Calling fclose arranges that (if the file was open for output) any last, buffered output is finally written
to the file, and that those resources used by the operating system (and the C library) for this file are
released. If you forget to close a file, it will be closed automatically when the program exits.


Suppose you had a data file consisting of rows and columns of numbers:
1
5
9
2
6
10
34
78
112
Suppose you wanted to read these numbers into an array. (Actually, the array will be an array of arrays,
or a ``multidimensional'' array; see section 4.1.2.) We can write code to do this by putting together
several pieces: the fgetline function we just showed, and the getwords function from chapter 10.
Assuming that the data file is named input.dat, the code would look like this:
#define MAXLINE 100
#define MAXROWS 10
#define MAXCOLS 10
int array[MAXROWS][MAXCOLS];
char *filename = "input.dat";
FILE *ifp;
char line[MAXLINE];
char *words[MAXCOLS];
int nrows = 0;
int n;
int i;
ifp = fopen(filename, "r");
if(ifp == NULL)
{
fprintf(stderr, "can't open %s\n", filename);
exit(EXIT_FAILURE);
}
while(fgetline(ifp, line, MAXLINE) != EOF)
{
if(nrows >= MAXROWS)
{
fprintf(stderr, "too many rows\n");
exit(EXIT_FAILURE);
}
n = getwords(line, words, MAXCOLS);
for(i = 0; i < n; i++)

array[nrows][i] = atoi(words[i]);
nrows++;
}
Each trip through the loop reads one line from the file, using fgetline. Each line is broken up into
``words'' using getwords; each ``word'' is actually one number. The numbers are however still
represented as strings, so each one is converted to an int by calling atoi before being stored in the
array. The code checks for two different error conditions (failure to open the input file, and too many
lines in the input file) and if one of these conditions occurs, it prints an error message, and exits. The
exit function is a Standard library function which terminates your program. It is declared in
<stdlib.h>, and accepts one argument, which will be the exit status of the program.
EXIT_FAILURE is a code, also defined by <stdlib.h>, which indicates that the program failed.
Success is indicated by a code of EXIT_SUCCESS, or simply 0. (These values can also be returned from
main(); calling exit with a particular status value is essentially equivalent to returning that same
status value from main.)


We've mentioned several times that a program is rarely useful if it does exactly the same thing every time
you run it. Another way of giving a program some variable input to work on is by invoking it with
command line arguments.
(We should probably admit that command line user interfaces are a bit old-fashioned, and currently
somewhat out of favor. If you've used Unix or MS-DOS, you know what a command line is, but if your
experience is confined to the Macintosh or Microsoft Windows or some other Graphical User Interface,
you may never have seen a command line. In fact, if you're learning C on a Mac or under Windows, it
can be tricky to give your program a command line at all. Think C for the Macintosh provides a way; I'm
not sure about other compilers. If your compilation environment doesn't provide an easy way of
simulating an old-fashioned command line, you may skip this chapter.)
C's model of the command line is that it consists of a sequence of words, typically separated by
whitespace. Your main program can receive these words as an array of strings, one word per string. In
fact, the C run-time startup code is always willing to pass you this array, and all you have to do to receive
it is to declare main as accepting two parameters, like this:
int main(int argc, char *argv[])
{
...
}
When main is called, argc will be a count of the number of command-line arguments, and argv will
be an array (``vector'') of the arguments themselves. Since each word is a string which is represented as a
pointer-to-char, argv is an array-of-pointers-to-char. Since we are not defining the argv array, but
merely declaring a parameter which references an array somewhere else (namely, in main's caller, the
run-time startup code), we do not have to supply an array dimension for argv. (Actually, since functions
never receive arrays as parameters in C, argv can also be thought of as a pointer-to-pointer-to-char, or
char **. But multidimensional arrays and pointers to pointers can be confusing, and we haven't
covered them, so we'll talk about argv as if it were an array.) (Also, there's nothing magic about the
names argc and argv. You can give main's two parameters any names you like, as long as they have
the appropriate types. The names argc and argv are traditional.)
The first program to write when playing with argc and argv is one which simply prints its arguments:
#include <stdio.h>
main(int argc, char *argv[])

{
int i;
for(i = 0; i < argc; i++)
printf("arg %d: %s\n", i, argv[i]);
return 0;
}
(This program is essentially the Unix or MS-DOS echo command.)
If you run this program, you'll discover that the set of ``words'' making up the command line includes the
command you typed to invoke your program (that is, the name of your program). In other words,
argv[0] typically points to the name of your program, and argv[1] is the first argument.
There are no hard-and-fast rules for how a program should interpret its command line. There is one set of
conventions for Unix, another for MS-DOS, another for VMS. Typically you'll loop over the arguments,
perhaps treating some as option flags and others as actual arguments (input files, etc.), interpreting or
acting on each one. Since each argument is a string, you'll have to use strcmp or the like to match
arguments against any patterns you might be looking for. Remember that argc contains the number of
words on the command line, and that argv[0] is the command name, so if argc is 1, there are no
arguments to inspect. (You'll never want to look at argv[i], for i >= argc, because it will be a null
or invalid pointer.)
As another example, also illustrating fopen and the file I/O techniques of the previous chapter, here is a
program which copies one or more input files to its standard output. Since ``standard output'' is usually
the screen by default, this is therefore a useful program for displaying files. (It's analogous to the
obscurely-named Unix cat command, and to the MS-DOS type command.) You might also want to
compare this program to the character-copying program of section 6.2.
#include <stdio.h>
main(int argc, char *argv[])
{
int i;
FILE *fp;
int c;
{
fp = fopen(argv[i], "r");
if(fp == NULL)
{
fprintf(stderr, "cat: can't open %s\n", argv[i]);

continue;
}
putchar(c);
fclose(fp);
}
return 0;
}
As a historical note, the Unix cat program is so named because it can be used to concatenate two files
together, like this:
cat a b > c
This illustrates why it's a good idea to print error messages to stderr, so that they don't get redirected.
The ``can't open file'' message in this example also includes the name of the program as well as the name
of the file.
Yet another piece of information which it's usually appropriate to include in error messages is the reason
why the operation failed, if known. For operating system problems, such as inability to open a file, a
code indicating the error is often stored in the global variable errno. The standard library function
strerror will convert an errno value to a human-readable error message string. Therefore, an even
more informative error message printout would be
fp = fopen(argv[i], "r");
if(fp == NULL)
fprintf(stderr, "cat: can't open %s: %s\n",
argv[i], strerror(errno));
If you use code like this, you can #include <errno.h> to get the declaration for errno, and
<string.h> to get the declaration for strerror().


This last handout contains a brief list of the significant topics in C which we have not covered, and which
you'll want to investigate further if you want to know all of C.
Types and Declarations
Operators
Statements
Functions
C Preprocessor
Standard Library Functions


We have not talked about the void, short int, and long double types. void is a type with no
values, used as a placeholder to indicate functions that do not return values or that accept no arguments,
and in the ``generic'' pointer type void * that can point to anything. short int is an integer type
that might use less space than a plain int; long double is a floating-point type that might have even
more range or precision than plain double.
The char type and the various sizes of int also have `ùnsigned'' versions, which are declared using the
keyword unsigned. Unsigned types cannot hold negative values but have guaranteed properties on
overflow. (Whether a plain char is signed or unsigned is implementation-defined; you can use the
keyword signed to force a character type to contain signed characters.) Unsigned types are also useful
when manipulating individual bits and bytes, when ``sign extension'' might otherwise be a problem.
Two additional type qualifiers const and volatile allow you to declare variables (or pointers to
data) which you promise not to change, or which might change in unexpected ways behind the program's
back.
There are user-defined structure and union types. A structure or struct is a ``record'' consisting of one
or more values of one or more types concreted together into one entity which can be manipulated as a
whole. A union is a type which, at any one time, can hold a value from one of a specified set of types.
There are user-defined enumeration types (`ènum'') which are like integers but which always contain
values from some fixed, predefined set, and for which the values are referred to by name instead of by
number.
Pointers can point to functions as well as to data types.
Types can be arbitrarily complicated, when you start using multiple levels of pointers, arrays, functions,
structures, and/or unions. Eventually, it's important to understand the concept of a declarator: in the
declaration
int i, *ip, *fpi();
we have the base type int and three declarators i, *ip, and *fpi(). The declarator gives the name of
a variable (or function) and also indicates whether it is a simple variable or a pointer, array, function, or
some more elaborate combination (array of pointers, function returning pointer, etc.). In the example, i
is declared to be a plain int, ip is declared to be a pointer to int, and fpi is declared to be a function
returning pointer to int. (Complicated declarators may also contain parentheses for grouping, since
there's a precedence hierarchy in declarators as well as expressions: [] for arrays and () for functions
have higher precedence than * for pointers.)
We have not said much about pointers to pointers, or arrays of arrays (i.e. multidimensional arrays), or
the ramifications of array/pointer equivalence on multidimensional arrays. (In particular, a reference to
an array of arrays does not generate a pointer to a pointer; it generates a pointer to an array. You cannot
pass a multidimensional array to a function which accepts pointers to pointers.)
Variables can be declared with a hint that they be placed in high-speed CPU registers, for efficiency.
(These hints are rarely needed or used today, because modern compilers do a good job of register
allocation by themselves, without hints.)
A mechanism called typedef allows you to define user-defined aliases (i.e. new and perhaps moreconvenient names) for other types.

Operators
Operators
The bitwise operators &, |, ^, and ~ operate on integers thought of as binary numbers or strings of bits.
The & operator is bitwise AND, the | operator is bitwise OR, the ^ operator is bitwise exclusive-OR
(XOR), and the ~ operator is a bitwise negation or complement. (&, |, and ^ are ``binary'' in that they
take two operands; ~ is unary.) These operators let you work with the individual bits of a variable; one
common use is to treat an integer as a set of single-bit flags. You might define the 3rd (2**2) bit as the
``verbose'' flag bit by defining
#define VERBOSE 4
Then you can ``turn the verbose bit on'' in an integer variable flags by executing
flags = flags | VERBOSE;
or
flags |= VERBOSE;
and turn it off with
flags = flags & ~VERBOSE;
or
flags &= ~VERBOSE;
and test whether it's set with
if(flags & VERBOSE)
The left-shift and right-shift operators << and >> let you shift an integer left or right by some number of
bit positions; for example, value << 2 shifts value left by two bits.
The ?: or conditional operator (also called the ``ternary operator'') essentially lets you embed an
if/then statement in an expression. The assignment
a = expr ? b : c;
is roughly equivalent to
if(expr)
else
a = b;
a = c;
Operators
Since you can use ?: anywhere in an expression, it can do things that if/then can't, or that would be
cumbersome with if/then. For example, the function call
f(a, b, c ? d : e);
is roughly equivalent to
if(c)
else
f(a, b, d);
f(a, b, e);
(Exercise: what would the call

g(a, b, c ? d : e, h ? i : j, k);
be equivalent to?)
The comma operator lets you put two separate expressions where one is required; the expressions are
executed one after the other. The most common use for comma operators is when you want multiple
variables controlling a for loop, for example:
for(i = 0, j = 10; i < j; i++, j--)
A cast operator allows you to explicitly force conversion of a value from one type to another. A cast
consists of a type name in parentheses. For example, you could convert an int to a double by typing
int i = 10;
double d;
d = (double)i;
(In this case, though, the cast is redundant, since this is a conversion that C would have performed for
you automatically, i.e. if you'd just said d = i .) You use explicit casts in those circumstances where C
does not do a needed conversion automatically. One example is division: if you're dividing two integers
and you want a floating-point result, you must explicitly force at least one of the operands to floatingpoint, otherwise C will perform an integer division and will discard the remainder. The code
int i = 1, j = 2;
double d = i / j;
will set d to 0, but
Operators
d = (double)i / j;
will set d to 0.5. You can also ``cast to void'' to explicitly indicate that you're ignoring a function's
return value, as in
(void)fclose(fp);
or
(void)printf("Hello, world!\n");
(Usually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the
(void) cast keeps some compilers from issuing warnings every time you ignore a value.)
There's a precise, mildly elaborate set of rules which C uses for converting values automatically, in the
absence of explicit casts.
The . and -> operators let you access the members (components) of structures and unions.

Statements
Statements
The switch statement allows you to jump to one of a number of numeric case labels depending on the
value of an expression; it's more convenient than a long if/else chain. (However, you can use
switch only when the expression is integral and all of the case labels are compile-time constants.)
The do/while loop is a loop that tests its controlling expression at the bottom of the loop, so that the
body of the loop always executes once even if the condition is initially false. (C's do/while loop is
therefore like Pascal's repeat/until loop, while C's while loop is like Pascal's while/do loop.)
Finally, when you really need to write ``spaghetti code,'' C does have the all-purpose goto statement,
and labels to go to.

Functions
Functions
Functions can't return arrays, and it's tricky to write a function as if it returns an array (perhaps by
simulating the array with a pointer) because you have to be careful about allocating the memory that the
returned pointer points to.
The functions we've written have all accepted a well-defined, fixed number of arguments. printf
accepts a variable number of arguments (depending on how many % signs there are in the format string)
but we haven't seen how to declare and write functions that do this.

C Preprocessor
C Preprocessor
If you're careful, it's possible (and can be useful) to use #include within a header file, so that you end
up with ``nested header files.''
It's possible to use #define to define ``function-like'' macros that accept arguments; the expansion of
the macro can therefore depend on the arguments it's `ìnvoked'' with.
Two special preprocessing operators # and ## let you control the expansion of macro arguments in
fancier ways.
The preprocessor directive #if lets you conditionally include (or, with #else, conditionally not
include) a section of code depending on some arbitrary compile-time expression. (#if can also do the
same macro-definedness tests as #ifdef and #ifndef, because the expression can use a defined()
operator.)
Other preprocessing directives are #elif, #error, #line, and #pragma.
There are a few predefined preprocessor macros, some required by the C standard, others perhaps
defined by particular compilation environments. These are useful for conditional compilation (#ifdef,
#ifndef).

http://www.eskimo.com/~scs/cclass/notes/sx14e.html [22/07/2003 5:33:00 PM]

C's standard library contains many features and functions which we haven't seen.
We've seen many of printf's formatting capabilities, but not all. Besides format specifier characters for
a few types we haven't seen, you can also control the width, precision, justification (left or right) and a
few other attributes of printf's format conversions. (In their full complexity, printf formats are
about as elaborate and powerful as FORTRAN format statements.)
A scanf function lets you do ``formatted input'' analogous to printf's formatted output. scanf reads
from the standard input; a variant fscanf reads from a specified file pointer.
The sprintf and sscanf functions let you ``print'' and ``read'' to and from in-memory strings instead
of files. We've seen that atoi lets you convert a numeric string into an integer; the inverse operation can
be performed with sprintf:
int i = 10;
char str[10];
sprintf(str, "%d", i);
We've used printf and fprintf to write formatted output, and getchar, getc, putchar, and
putc to read and write characters. There are also functions gets, fgets, puts, and fputs for
reading and writing lines (though we rarely need these, especially if we're using our own getline and
maybe fgetline), and also fread and fwrite for reading or writing arbitrary numbers of
characters.
It's possible to `ùn-read'' a character, that is, to push it back on an input stream, with ungetc. (This is
useful if you accidentally read one character too far, and would prefer that some other part of your
program read that character instead.)
You can use the ftell, fseek, and rewind functions to jump around in files, performing random
access (as opposed to sequential) I/O.
The feof and ferror functions will tell you whether you got EOF due to an actual end-of-file
condition or due to a read error of some sort. You can clear errors and end-of-file conditions with
clearerr.
You can open files in ``binary'' mode, or for simultaneous reading and writing. (These options involve
extra characters appended to fopen's mode string: b for binary, + for read/write.)
There are several more string functions in <string.h>. A second set of string functions strncpy,
strncat, and strncmp all accept a third argument telling them to stop after n characters if they
haven't found the \0 marking the end of the string. A third set of ``mem'' functions, including memcpy
and memcmp, operate on blocks of memory which aren't necessarily strings and where \0 is not treated
as a terminator. The strchr and strrchr functions find characters in strings. There is a motley
collection of ``span'' and ``scan'' functions, strspn, strcspn, and strpbrk, for searching out or
skipping over sequences of characters all drawn from a specified set of characters. The strtok function
aids in breaking up a string into words or ``tokens,'' much like our own getwords function.
The header file <ctype.h> contains several functions which let you classify and manipulate
characters: check for letters or digits, convert between upper- and lower-case, etc.
A host of mathematical functions are defined in the header file <math.h>. (As we've mentioned,
besides including <math.h>, you may on some Unix systems have to ask for a special library
containing the math functions while compiling/linking.)
There's a random-number generator, rand, and a way to ``seed'' it, srand. rand returns integers from
0 up to RAND_MAX (where RAND_MAX is a constant #defined in <stdlib.h>). One way of getting
random integers from 1 to n is to call
(int)(rand() / (RAND_MAX + 1.0) * n) + 1
Another way is
rand() / (RAND_MAX / n + 1) + 1
It seems like it would be simpler to just say
rand() % n + 1
but this method is imperfect (or rather, it's imperfect if n is a power of two and your system's
implementation of rand() is imperfect, as all too many of them are).
Several functions let you interact with the operating system under which your program is running. The
exit function returns control to the operating system immediately, terminating your program and
returning an `èxit status.'' The getenv function allows you to read your operating system's or process's
`ènvironment variables'' (if any). The system function allows you to invoke an operating-system
command (i.e. another program) from within your program.
The qsort function allows you to sort an array (of any type); you supply a comparison function (via a
function pointer) which knows how to compare two array elements, and qsort does the rest. The
bsearch function allows you to search for elements in sorted arrays; it, too, operates in terms of a
caller-supplied comparison function.
Several functions--time, asctime, gmtime, localtime, asctime, mktime, difftime, and

strftime--allow you to determine the current date and time, print dates and times, and perform other
date/time manipulations. For example, to print today's date in a program, you can write
#include <time.h>
time_t now;
now = time((time_t *)NULL);
printf("It's %.24s", ctime(&now));
The header file <stdarg.h> lets you manipulate variable-length function argument lists (such as the
ones printf is called with). Additional members of the printf family of functions let you write your
own functions which accept printf-like format specifiers and variable numbers of arguments but call
on the standard printf to do most of the work.
There are facilities for dealing with multibyte and ``wide'' characters and strings, for use with
multinational character sets.

Copyright
This collection of hypertext pages is Copyright 1995-1997 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
http://www.eskimo.com/~scs/cclass/notes/copyright.html [22/07/2003 5:33:02 PM]
C Programming Notes
C Programming Notes
Intermediate C Programming Class Notes, Chapter 15
Steve Summit
Chapter 15: User-Defined Data Structures

Chapter 16: The Standard I/O (stdio) Library
Chapter 17: Data Files
Chapter 18: Miscellaneous C Features
Chapter 19: Returning Arrays
Chapter 20: More About the Preprocessor
Chapter 21: Pointer Allocation Strategies
Chapter 22: Pointers to Pointers
Chapter 23: Two-Dimensional (and Multidimensional) Arrays
Chapter 24: Pointers To Functions
Chapter 25: Variable-Length Argument Lists
Read Sequentially
http://www.eskimo.com/~scs/cclass/int/top.html [22/07/2003 5:33:07 PM]

So far, we have been using C's basic types--char, int, long int, double, etc.--and a few derived
types--arrays of basic types, pointers to basic types, and functions returning basic types. In this chapter,
we'll learn about another way to derive types: by building user-defined types.
There's an old joke about Henry Ford saying that you could get the Model T in any color you wanted, as
long as it was black. User-defined data types have a little bit of the same restriction: you don't have
ultimate flexibility; you can define your own data types any way you want, as long as they're collections
of other types. (What you couldn't define would be data types that held, say, tractors or teddy bears or
Model T Fords, because of course computers have no way of holding those objects. You're ultimately
restricted to the primitive types of data which computers can represent, and C's basic types cover most of
those.)
Very roughly speaking, a structure is a little bit like an array. An array is a collection of associated
values, all of the same type. A structure is a collection of associated values, but the values can all have
different types.
15.1: Structures
15.2: Accessing Members of Structures
15.3: Operations on Structures
15.4: Pointers to Structures
15.5: Linked Data Structures

http://www.eskimo.com/~scs/cclass/int/sx1.html [22/07/2003 5:33:09 PM]
15.1: Structures
15.1: Structures
The basic user-defined data type in C is the structure, or struct. (C structures are analogous to the
records found in some other languages.) Defining structures is a two-step process: first you define a
``template'' which describes the new type, then you declare variables having the new type (or functions
returning the new type, etc.).
As a simple example, suppose we wanted to define our own type for representing complex numbers. (If
you're blissfully ignorant of these beasts, a complex number consists of a ``real'' and `ìmaginary'' part,
where the imaginary part is some multiple of the square root of negative 1. You don't have to understand
complex numbers to understand this example; you can think of the real and imaginary parts as the x and y
coordinates of a point on a plane.) FORTRAN has a built-in complex type, but C does not. How might
we add one? Since a complex number consists of a real and imaginary part, we need a way of holding
both these quantities in one data type, and a structure will do just the trick. Here is how we might declare
our complex type:
struct complex
{
double real;
double imag;
};
A structure declaration consists of up to four parts, of which we can see three in the example above. The
first part is the keyword struct which indicates that we are talking about a structure. The second part
is a name or tag by which this structure (that is, this new data type) will be known. The third part is a list
of the structure's members (also called components or fields). This list is enclosed in braces {}, and
contains what look like the declarations of ordinary variables. Each member has a name and a type, just
like ordinary variables, but here we are not declaring variables; we are setting up the structure of the
structure by defining the collection of data types which will make up the structure. Here we see that the
complex structure will be made up of two members, both of type double, one named real and one
named imag.
It's important to understand that what we've defined here is just the new data type; we have not yet
declared any variables of this new type! The name complex (the second part of the structure
declaration) is not the name of a variable; it's the name of the structure type. The names real and imag
are not the names of variables; they're identifiers for the two components of the structure.
We declare variables of our new complex type with declarations like these:
http://www.eskimo.com/~scs/cclass/int/sx1a.html (1 of 4) [22/07/2003 5:33:13 PM]
15.1: Structures
struct complex c1;

or
struct complex c2, c3;
These look almost like our previous declarations of variables having basic types, except that instead of a
type keyword like int or double, we have the two-word type name struct complex. The
keyword struct indicates that we're talking about a structure, and the identifier complex is the name
for the particular structure we're talking about. c1, c2, and c3 will all be declared as variables of type
struct complex; each one of them will have real and imaginary parts buried inside them. (We'll see
how to get at those parts in the next section.) Using our graphic, ``labeled box'' notation, we could draw
representations of c1, c2, and c3 like this:
Actually, these pictures are a bit misleading; the outer box indicating each composite structure suggests
that there might be more inside them than just the two members, real and imag (that is, more than the
two values of type double). A simpler but more representative picture would be:
The only memory allocated is for two values of type double (the two boxes); all the names are just for
our convenience and the compiler's reference; none are typically stored in the program's memory at run
time.
Notice that when we define structures in this way we have not quite defined a new type on a par with
int or double. We can not say
complex c1;
/* WRONG */
The name complex does not become a full-fledged type name like int or double; it's just the name
of a particular structure, and we must use the keyword struct and the name of a particular structure
(e.g. complex) to talk about that structure type. (There is a way to define new full-fledged type names
like int and double, and in C++ a new structure does automatically become a full-fledged type, but
we won't worry about these wrinkles for now.)
I said that a structure definition consisted of up to four parts. We saw the first three of them in the first
example; the fourth part of a full strucure declaration is simply a list of variables, which are to be
declared as having the structure type at the same time as the structure itself is defined. For example, if we
had written
struct complex
15.1: Structures
{
double real;
double imag;
} c1, c2, c3;
we would have defined the type struct complex, and right away declared three variables c1, c2,
and c3 all of type struct complex.
In fact, three of the four parts of a structure declaration (all but the keyword struct) are optional. If a
declaration contains the keyword struct, a structure tag, and a brace-enclosed list of members (as in
the first structure definition we saw), it's a definition of the structure itself (that is, just the template). If a
declaration contains the keyword struct, a structure tag, and a list of variable names (as in the first
declarations of c1, c2, and c3 we saw), it's a declaration of those variables having that structure type
(the structure type itself must of course typically be declared elsewhere). If a declaration contains all four
elements (as in the second declaration of c1, c2, and c3 we saw), it's a definition of the structure type
and a declaration of some variables. It's also possible to use the first, third, and fourth parts:
struct
{
double real;
double imag;
} c1, c2, c3;
Here we declare c1, c2, and c3 as having a structure type with no tag name, which is not usually very
useful, because without a tag name we won't be able to declare any other variables or functions of this
type for c1, c2, and c3 to play with. (Finally, it's also possible to declare just the tag name, leaving both
the list of members and any declarations of variables for later, but this is only needed in certain fairly rare
and obscure situations.)
Because a structure definition can also declare variables, it's important not to forget the semicolon at the
end of a structure definition. If you accidentally write
struct complex
{
double real;
double imag;
}
without a semicolon, the compiler will keep looking for something later in the file and try to declare it as
being of type struct complex, which will either result in a confusing error message or (if the
compiler succeeds) a confusing misdeclaration.
15.1: Structures


We said that a structure was a little bit like an array: a collection of members (elements). We access the
elements of an array by using a numeric subscript in square brackets []. We access the elements of a
structure by name, using the structure selection operator which is a dot (a period). The structure
selection operator is a little like the other binary operators we've seen, but much more restricted: on its
left must be a variable or object of structure type, and on its right must be the name of one of the
members of that structure. For example, if c1 is a variable of type struct complex as declared in
the previous section, then c1.real is its real part and c1.imag is its imaginary part.
Like subscripted array references, references to the members of structure variables (using the structure
selection operator) can appear anywhere, either on the right or left side of assignment operators. We
could say
c1.real = 1
to set the real part of c1 (that is, the real member within c1) to 1, or
c1.imag = c2.imag
to fetch the imaginary part of c2 and assign it to the imaginary part of c1, or
c1.real = c2.real + c3.real
to take the real parts of c2 and c3, add them together, and assign the result to the real part of c1.

http://www.eskimo.com/~scs/cclass/int/sx1b.html [22/07/2003 5:33:17 PM]

[This section corresponds roughly to K&R Sec. 6.2]
There is a relatively small number of operations which C directly supports on structures. As we've seen,
we can define structures, declare variables of structure type, and select the members of structures. We
can also assign entire structures: the expression
c1 = c2
would assign all of c2 to c1 (both the real and imaginary parts, assuming the preceding declarations).
We can also pass structures as arguments to functions, and declare and define functions which return
structures. But to do anything else, we typically have to write our own code (often as functions). For
example, we could write a function to add two complex numbers:
struct complex
cpx_add(struct complex c1, struct complex c2)
{
struct complex sum;
sum.real = c1.real + c2.real;
sum.imag = c1.imag + c2.imag;
return sum;
}
We could then say things like
c1 = cpx_add(c2, c3)
(Two footnotes: Typically, you do need a temporary variable like sum in a function like cpx_add
above that returns a structure. In C++, it's possible to couple your own functions to the built-in operators,
so that you could write c1 = c2 + c2 and have it call your complex add function.)
One more thing you can do with a structure is initialize a structure variable while declaring it. As for
array initializations, the initializer consists of a comma-separated list of values enclosed in braces {}:
struct complex c1 = {1, 2};
struct complex c2 = {3, 4};
Of course, the type of each initializer in the list must be compatible with the type of the corresponding
structure member.
http://www.eskimo.com/~scs/cclass/int/sx1c.html (1 of 2) [22/07/2003 5:33:19 PM]


Pointers in C are general; we can have pointers to any type. It turns out that pointers to structures are
particularly useful.
We declare pointers to structures the same way we declare any other pointers: by preceding the variable
name with an asterisk in the declaration. We could declare two pointers to struct complex with
struct complex *p1, *p2;
And, as before, we could set these pointers to point to actual variables of type complex:
p1 = &c2;
p2 = &c3;
Then,
*p1 = *p2
would copy the structure pointed to by p2 to the structure pointed to by p1 (i.e. c3 to c2), and
p1 = p2
would set p1 to point wherever p2 points. (None of this is new, these are the obvious analogs of how all
pointer assignments work.) If we wanted to access the member of a pointed-to structure, it would be a
tiny bit messy--first we'd have to use * to get the structure pointed to, then . to access the desired
member. Furthermore, since . has higher precedence than *, we'd have to use parentheses:
(*p1).real
(Without the parentheses, i.e. if we wrote simply *p1.real, we'd be taking the structure p1, selecting
its member named real, and accessing the value that p1.real points to, which would be doubly
nonfunctional, since the real member is in our ongoing example not a pointer, and p1 is not a structure
but rather a pointer to a structure, so the . operator won't work on it.)
Since pointers to structures are common, and the parentheses in the example above are a nuisance, there's
another structure selection operator which works on pointers to structures. If p is a pointer to a structure
and m is a member of that structure, then
http://www.eskimo.com/~scs/cclass/int/sx1d.html (1 of 2) [22/07/2003 5:33:20 PM]
p->m
selects that member of the pointed-to structure. The expression p->m is therefore exactly equivalent to
(*p).m


One reason that pointers to structures are useful and common is that they can be used to build linked data
structures, in which a structure contains a pointer to another instance of the same structure (or perhaps a
different structure). The simplest example is a singly-linked list, which we might declare like this:
struct listnode
{
char *item;
struct listnode *next;
};
This structure describes one node in a list; a list may consist of many nodes, one for each item in the list.
Here, each item in the list (the field we've named item) will be a string, represented (i.e. pointed to) by
a char *. Each node in the list is linked to its successor by its next field, which is a pointer to another
struct listnode. (The compiler is perfectly happy to place a pointer to a structure inside that very
same structure; it would only complain if you tried to stuff an entire struct listnode into a
struct listnode, which would tend to make the struct listnode explode.) We'll use a null
pointer as the next field of the last node in the list, since by definition a null pointer doesn't point
anywhere.
We could set up a tiny list with these declarations:
struct listnode node2 = {"world", NULL};
struct listnode node1 = {"hello", &node2};
struct listnode *head = &node1;
A box-and-arrows picture of the resulting list would look like this:
The list has two nodes, allocated by the compiler in response to the first two declarations we gave. We've
also allocated a pointer variable, head, which points at the ``head'' of the list.
Once we've built a list, we'll naturally want to do things with it. One of the simplest operations is to print
the list back out. The code to do so is very simple. We declare another list pointer lp and then cause it to
step over each node in the list, in turn:
http://www.eskimo.com/~scs/cclass/int/sx1e.html (1 of 5) [22/07/2003 5:33:26 PM]
struct listnode *lp;

for(lp = head; lp != NULL; lp = lp->next)
printf("%s\n", lp->item);
This for loop deserves some attention, especially if you haven't seen one like it before. Many for loops
step an int variable (often called i) through a series of integer values. However, the three controlling
expressions of a for loop are not limited to that pattern; you may in fact use any expressions at all,
although it's best if they conform to the expected initialize;test;increment pattern. The list-printing loop
above certainly does: the expression lp = head initializes lp to point to the head of the loop; the
expression lp != NULL tests whether lp still points to a real node (or whether it has reached the null
pointer which marks the end of the list); and the expression lp = lp->next sets lp to point to the
next node in the list, one past where it did before.
The two-element list above is pretty useless; its only worth is as a first example. The real power of linked
lists (and other linked data structures) is that they can grow on demand, in response to the data that your
program finds itself working with. For a linked list to grow on demand, however, we'll have to allocate
its nodes dynamically, because we won't know in advance how many of them we'll need. (In the first
example, we had two static nodes, because we knew in advance, at compile time, that our list would have
two elements. But that static allocation won't do for a dynamic list. How would we know how many
struct listnode variables to allocate?)
The general solution, of course, is to call malloc. Here is a scrap of code which inserts a new word at
the head of a list:
#include <stdio.h>
#include <stdlib.h>
/* for fprintf, stderr */

/* for malloc */
char *newword = "test";

struct listnode *newnode = malloc(sizeof(struct listnode));
if(newnode == NULL)
{
fprintf(stderr, "out of memory\n");
exit(1);
}
newnode->item = newword;
newnode->next = head;
head = newnode;
The expression sizeof(struct listnode) in the call to malloc asks the compiler to compute
the number of bytes required to store one struct listnode, and that's exactly how many bytes of
memory we ask for from malloc. Make sure you see how the last two lines work to splice the new node
in at the head of the list, by making the new node's next pointer point at the old head of the list, and
then resetting the head of the list to be the new node. (Another word for a list where we always add new
items at the beginning is a stack.)
Naturally, we'd like to encapsulate this operation of prepending an item to a list as a function. Doing so is
just a little bit tricky, because the list's head pointer is modified every time. There are several ways to
achieve this modification; the way we'll do it is to have our list-prepending function return a pointer to
the new head of the list. Here is the function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* for fprintf, stderr */

/* for malloc, exit */
/* for strlen, strcpy */
struct listnode *prepend(char *newword, struct listnode *oldhead)

{
struct listnode *newnode = malloc(sizeof(struct listnode));
if(newnode == NULL)
{
exit(1);
}
newnode->item = malloc(strlen(newword) + 1);
if(newnode->item == NULL)
{
exit(1);
}
strcpy(newnode->item, newword);
newnode->next = oldhead;
return newnode;
}
Since we want this to be a general-purpose function, we also allocate new space for the new string (word,
item) being stored. Otherwise, we'd be depending on the caller to arrange that the pointer to the new
string remain valid for as long as the list was in use. As we'll see, that's not always a safe assumption. By
allocating our own memory, which ``belongs'' to the list, we ensure that the list isn't dependent on the
caller in this way. (Notice, too, that the number of bytes we ask for is strlen(newword) + 1.)
(As an aside, it's a mild blemish on the above code that it contains two identical calls to fprintf,
complaining about two separate potential failures in two calls to malloc. It's quite possible to combine
these two cases, and many C programmers prefer to do so, although the expression may be a bit scaryhttp://www.eskimo.com/~scs/cclass/int/sx1e.html (3 of 5) [22/07/2003 5:33:26 PM]
looking at first:
struct listnode *newnode;
if((newnode = malloc(sizeof(struct listnode))) == NULL ||
(newnode->item = malloc(strlen(newword) + 1)) == NULL)
{
exit(1);
}
How does this work? First it calls malloc(sizeof(struct listnode)), and assigns the result
to newnode. Then it calls malloc(strlen(newword) + 1), and assigns the result to newnode>item. This code relies on two special guarantees of C's || operator, namely that it always evaluates its
left-hand side first, and that if the left-hand side evaluates as true, it doesn't bother to evaluate the righthand side, because once the left-hand side is true, the final result is definitely going to be true. Therefore,
we're guaranteed that we'll allocate space for newnode to point to before we try to fill in newnode>item, but if the first call to malloc fails, we won't call malloc a second time or try to fill in
newnode->item at all. These guarantees--of left-to-right evaluation, and of skipping the evaluation of
the right-hand side if the left-hand side determines the result--are unique to the || and && operators. It's
perfectly acceptable to rely on these guarantees when using || and &&, but don't assume that other
operators will operate deterministically from left to right, because most of them don't.)
Now that we have our prepend function, we can build a list by calling it several times in succession:
struct listnode *head = NULL;
head = prepend("world", head);
head = prepend("hello", head);
This code builds essentially the same list as our first, static example. Notice how we initialize the list
head pointer with a null pointer, which is synonymous with an empty list. (Notice also that the code we
wrote up above, for printing a list, would also deal correctly with an empty list.)
Using static calls to prepend is hardly more interesting than building a static link by hand. To make
things truly interesting, let's read words or strings typed by the user (or redirected from a file), and
prepend those to a list. The code is not hard:
#define MAXLINE 200
char line[MAXLINE];
struct listnode *head = NULL;
struct listnode *lp;

head = prepend(line, head);
for(lp = head; lp != NULL; lp = lp->next)
printf("%s\n", lp->item);
(getline is the line-reading function we've been using. If you don't have a copy handy, you can use the
fgets function from the standard library.)
If you type in this code and run it, you will find that it prints lines back out in the reverse order that you
typed them. (In doing so, of course, it slurps all the lines into memory, so you might run out of memory
if you tried to use this technique for reversing all the lines in a huge file.) Notice that when we call
prepend in this way, it is important that prepend allocate memory for, and stash away, each string.
Can you see what would happen if prepend did not, that is, if it simply said newnode->item =
newword?
Linked lists are only the simplest example of linked data structures. There are also queues, doubly-linked
lists, trees, and circular lists. We'll see more concrete examples of linked structures in the adventure
game example. The set of objects sitting in a room (or held by the player) will be represented by a linked
list of objects, and the rooms will be linked to each other to indicate the passages between rooms.

Chapter 16: The Standard I/O (stdio)

Library
In chapter 12, we met several of the functions in C's Standard I/O library (often called stdio,
sometimes pronounced ``studio'', named after the header file <stdio.h> which declares its routines).
In this chapter, we'll describe most of the functions and other facilities available in <stdio.h>, and
explain how they're useful.
(Note: this is an uncharacteristically long and complete chapter. It tries to describe just about all of the
Standard I/O library, including features you aren't likely to be using for a while. Don't feel you have to
understand every word in this chapter--when you get to an obscure part, just skim through it to get an
idea of what's available, and come back and read it again as you have occasion to use the feature.)
16.1: Files and Streams
16.2: Opening and Closing Files (fopen, fclose, etc.)
16.3: Character Input and Output (getchar, putchar, etc.)
16.4: Line Input and Output (fgets, fputs, etc.)
16.5: Formatted Output (printf and friends)
16.6: Formatted Input (scanf)
16.7: Arbitrary Input and Output (fread, fwrite)
16.8: EOF and Errors
16.9: Random Access (fseek, ftell, etc.)
16.10: File Operations (remove, rename, etc.)
16.11: Redirection (freopen)
http://www.eskimo.com/~scs/cclass/int/sx2.html (1 of 2) [22/07/2003 5:33:28 PM]


Since the beginning, we've been using ``standard input'' and ``standard output,'' two predefined I/O
streams which are available to every C program. The disposition of these streams is left deliberately
unclear: the program can assume that they're connected to the ``right place''; usually (for an interactive
program) to the user's keyboard and screen, respectively. However, since a program typically doesn't
know exactly where they go, it's possible to redirect them, behind the program's back, and thereby to
apply a program to some noninteractive input or to capture its output, without rewriting the program or
doing any special I/O programming. (This ability is a cornerstone of the Unix ``toolkit'' methodology. In
Unix and several other systems, you can redirect the input or output of a program as you invoke it from
the shell command line using the < or > characters.)
Standard input is assumed by functions like getchar, and standard output is assumed by functions like
putchar and printf.
Of course, it's also possible to open files (or other I/O sources) explicitly. We can open files using the
function fopen; certain systems may also provide specialized ways of opening streams connected to I/O
devices or set up in more exotic ways. A successful call to fopen returns a pointer of type FILE *,
that is, ``pointer to FILE,'' where FILE is a special type defined by <stdio.h>. A FILE * (also
called ``file pointer'') is the handle by which we refer to an I/O stream in C. I/O functions which do not
assume standard input or standard output all accept a FILE * argument telling them which stream to
read from or write to. (Examples are getc, putc, and fprintf.) Notice that a file pointer which has
been opened on a file is not the same thing as the file itself. A file pointer is a data structure which helps
us access or manipulate the file.
Occasionally it is necessary to refer to the standard input or standard output in a situation which calls for
a general-purpose FILE *. To handle these cases, there are two predefined constants: stdin and
stdout. Both of these are of type FILE *, are declared in <stdio.h>, and can be used wherever a
FILE * is required. (For example, we could simulate--or, in fact, implement--getchar as
getc(stdin).)
There is also a third predefined stream, the ``standard error output.'' It has its own constant, stderr. By
default, stderr is typically connected to the same output device as stdout; the difference between
stdout and stderr is that stderr is not redirected when stdout is. stderr is, as its name
implies, intended for error messages: if your program printed its error messages to stdout (e.g. by
calling printf), they would disappear into the output file if the user redirected the standard output.
Therefore, it's customary to print error messages (and also prompts, or anything else that shouldn't be
redirected) to stderr, often by calling fprintf(stderr, message, ...).

16.2: Opening and Closing Files (<TT>fopen</TT>, <TT>fclose</TT>, etc.)
16.2: Opening and Closing Files (fopen, fclose,

etc.)
As mentioned, the fopen function opens a file (or perhaps some other I/O object, if the operating
system permits devices to be treated as if they were files) and returns a stream (FILE *) to be used with
later I/O calls. fopen's prototype is
FILE *fopen(char *filename, char *mode)
For the rest of this chapter, we'll often use prototype notations like these to describe functions, since a
prototype gives us just the information we need about a function: its name, its return type, and the types
of its arguments (perhaps along with identifying names for the arguments).
fopen's prototype tells us that it returns a FILE *, as we expect, and that it takes two arguments, both
of type char * (i.e. ``string''). The first string is the file name, which can be any string (simple filename
or complicated pathname) which is acceptable to the underlying operating system. The second string is
the mode in which the file should be opened. The simple mode arguments are
r open for reading

w open for writing (truncate file before writing, if it exists already)
a open for writing, appending to file if it exists already
You can also tack two optional modifier characters onto the mode string:
+ open for both reading and writing

b open for ``binary'' I/O
Modes "r+" and "w+" let you read and write to the file. You can't read and write at the same time;
between stints of reading and stints of writing you must explicitly reposition the read/write indicator (see
section 16.9 below).
``Binary'' or "b" mode means that no translations are done by the stdio library when reading or
writing the file. Normally, the newline character \n is translated to and from some operating system
dependent end-of-line representation (LF on Unix, CR on the Macintosh, CRLF on MS-DOS, or an endof-record mark on record-oriented operating systems). On MS-DOS systems, without binary mode, a
control-Z character in a file is treated as an end-of-file indication; neither the control-Z nor any
characters following it will be seen by a program reading the file. In binary mode, on the other hand,
characters are read and written verbatim; newline characters (and, in MS-DOS, control-Z characters) are
not treated specially. You need to use binary mode when what you're reading and writing is arbitrary byte
values which are not to be interpreted as text characters.
http://www.eskimo.com/~scs/cclass/int/sx2b.html (1 of 2) [22/07/2003 5:33:33 PM]
16.2: Opening and Closing Files (<TT>fopen</TT>, <TT>fclose</TT>, etc.)
Of course, it's possible to use both optional modes: "r+b", "w+b", etc. (For maximum portability, it's
preferable to put + before b in these cases.)
If, for any reason, fopen can't open the requested file in the requested mode, it returns a null pointer
(NULL). Whenever you call fopen, you must check that the returned file pointer is not null before
attempting to use the pointer for any I/O.
Most operating systems let you keep only a limited number of files open at a time. Also, many versions
of the stdio library allocate only a limited number of FILE structures for fopen to return pointers to.
Therefore, if a program opens many files in sequence, it's important for it to close them as it finishes with
them. Closing a file fp simply requires calling fclose(fp). (Any open streams are automatically
closed when the program exits normally.)
The standard I/O library normally buffers characters--that is, when you're writing, it saves up a chunk of
characters and then writes them to the actual file all at once; and when you're reading, it reads a chunk of
characters from the file all at once and them parcels them out to the program one at a time (or as many
characters at a time as the program asks for). The reasons for buffering have to do with efficiency--the
calls to the underlying operating system which request it to read and write files may be inefficient if
called once for each character or each few characters, and may be much more efficient if they're always
called for large blocks of characters. Normally, buffering is transparent to your program, but occasionally
it's necessary to ensure that some characters have actually been written. (One example is when you've
printed a prompt to the standard output, and you want to be sure that it's actually been written to the
screen.) In these cases, you can call fflush(fp), which flushes a stream's buffered output to the
underlying file (or screen, or other device). Naturally, the library automatically flushes output when you
call fclose on a stream, and also when your program exits.
(fflush is only defined for output streams. There is no standard way to discard unread, buffered input.)

16.3: Character Input and Output (<TT>getchar</TT>, <TT>putchar</TT>, etc.)
16.3: Character Input and Output (getchar, putchar, etc.)

Character-at-a-time input and output is simple and straightforward. The getchar function reads the next character from
the standard input; getc(fp) reads the next character from the stream fp. Both return the next character or, if the next
character can't be read, the non-character constant EOF, which is defined in <stdio.h>. (Usually the reason that the
next character can't be read is that the input stream has reached end-of-file, but it's also possible that there's been some
I/O error.) Since the value EOF is distinct from all character values, it's important that the return value from getc and
getchar be assigned to a variable of type int, not char. Don't declare the variable to hold getc's or getchar's
return value as a char; don't try to read characters directly into a character array with code like
while(i < max && (a[i] = getc(fp)) != EOF)
/* WRONG, for char a[] */
The code may seem to work at first, but some day it will get confused when it reads a real character with a value which
seems to equal that which results when the non-char value EOF is crammed into a char.
One more reminder about getchar: although it returns and therefore seems to read one character at a time, it typically
delivers characters from internal buffers which may hold more characters which will be delivered later. For example,
most command-line-based operating systems let you type an entire line of input, and wait for you to type the RETURN
or ENTER key before making any of those characters available to a program (even if the program thought it was doing
character-at-a-time input with calls to getchar). There are, of course, ways to read characters immediately (without
waiting for the RETURN key), but they differ from operating system to operating system.
Writing single characters is just as easy as reading: putchar(c) writes the character c to standard output; putc(c,
fp) writes the character c to the stream fp. (The character c must be a real character. If you want to ``send'' an end-offile condition to a stream, that is, cause the program reading the stream to ``get'' end-of-file, you do that by closing the
stream, not by trying to write EOF to it.)
Occasionally, when reading characters, you sometimes find that you've read a bit too far. For example, if one part of
your code is supposed to read a number--a string of digits--from a file, leaving the characters after the digits on the input
stream for some other part of the program to read, the digit-reading part of the program won't know that it has read all
the digits until it has read a non-digit, at which point it's too late. (The situation recalls Dave Barry's recipe for ``food
heated up'': ``Put the food in a pot on the stove on medium heat until just before the kitchen fills with black smoke.'')
When reading characters with the standard I/O library, at least, we have an escape: the ungetc function `ùn-reads'' one
character, pushing it back on the input stream for a later call to getc (or some other input function) to read. The
prototype for ungetc is
int ungetc(int c, FILE *fp)
where c is the character which is to be pushed back onto the stream fp. For example, here is a code scrap that reads
digits from a stream (and converts them to the corresponding integer), stopping at the first non-digit character and
leaving it on the input stream:
#include <ctype.h>
int c, n = 0;
while((c = getchar()) != EOF && isdigit(c))
n = 10 * n + (c - '0');
if(c != EOF)
16.3: Character Input and Output (<TT>getchar</TT>, <TT>putchar</TT>, etc.)
ungetc(c, stdin);
It's only guaranteed that you can push one character back, but that's usually all you need.

16.4: Line Input and Output (<TT>fgets</TT>, <TT>fputs</TT>, etc.)
16.4: Line Input and Output (fgets, fputs, etc.)

The function
char *gets(char *line)
reads the next line of text (i.e. up to the next \n) from the standard input and places the characters
(except for the \n) in the character array pointed to by line. It returns a pointer to the line (that is, it
returns the same pointer value you gave it), unless it reaches end-of-file, in which case it returns a null
pointer. It is assumed that line points to enough memory to hold all of the characters read, plus a
terminating \0 (so that the line will be usable as a string). Since there's usually no way for anyone to
guarantee that the array is big enough, and no way for gets to check it, gets is actually a useless
function, and no serious program should call it.
The function
char *fgets(char *line, int max, FILE *fp)
is somewhat more useful. It reads the next line of input from the stream fp and places the characters,
including the \n, in the character array pointed to by line. The second argument, max, gives the
maximum number of characters to be written to the array, and is usually the size of the array. Like gets,
fgets returns a pointer to the line it reads, or a null pointer if it reaches end-of-file. Unlike gets,
fgets does include the \n in the string it copies to the array. Therefore, the number of characters in the
line, plus the \n, plus the \0, will always be less than or equal to max. (If fgets reads max-1
characters without finding a \n, it stops reading there, copies a \0 to the last element of the array, and
leaves the rest of the line to be read next time.) Since fgets does let you guarantee that the line being
read won't go off the end of the array, you should always use fgets instead of gets. (If you want to
read a line from standard input, you can just pass the constant stdin as the third argument.) If you'd
rather not have the \n retained in the input line, you can either remove it right after calling fgets
(perhaps by calling strchr and overwriting the \n with a \0), or maybe call the getline or
fgetline function we've been using instead. (See chapters 6 and 12; these functions are also handy in
that they return the length of the line read. They differ from fgets in their treatment of overlong lines,
though.)
The function
int puts(char *line)
writes the string pointed to by line to the standard output, and writes a \n to terminate it. It returns a
nonnegative value (we don't really care what the value is) unless there's some kind of a write error, in
which case it returns EOF.
16.4: Line Input and Output (<TT>fgets</TT>, <TT>fputs</TT>, etc.)
Finally, the function

int fputs(char *line, FILE *fp)
writes the string pointed to by line to the stream fp. Like puts, fputs returns a nonnegative value
or EOF on error. Unlike puts, fputs does not automatically append a \n.

16.5: Formatted Output (<TT>printf</TT> and friends)
16.5: Formatted Output (printf and friends)

C's venerable printf function, which we've been using since day one, prints or writes formatted output
to the standard output. As we've seen (by example, if not formally), printf's operation is controlled by
its first, ``format'' argument, which is either a simple string to be printed or a string containing percent
signs and other characters which cause the formatted values of printf's other arguments to be
interspersed with the other text (if any) of the format string.
So far, we've been using simple format specifiers such as %d, %f, and %s. But format specifiers can
actually consist of several parts. You can specify a ``field width''; for example, %5d prints an integer in a
field five characters wide, padding the integer's value with extra characters if necessary so that at least
five characters are printed. You can specify a ``precision''; for example, %.2f formats a floating-point
number with two digits printed after the decimal point. You can also add certain characters which specify
various options, such as how a too-narrow field is to be padded out to its field width, or what type of
number you're printing. For example, %-5d indicates that the padding characters should be added after
the field's value (so that it's left-justified), and %ld indicates that you're printing a long instead of a
plain int.
Formally, then, the complete framework for a printf format specifier looks like
% flags width . precision modifier character
where all of the parts except the % and the final character are optional.
The width gives the minimum overall width of the output (the field) generated by this format specifier. If
the output (the number of digits or characters) would be less than the width, it will be padded on the right
(or left, if the - flag is present), usually with spaces. If the output for the field ends up being larger than
the specified width, however, the field essentially overflows or grows; the output is not truncated or
anything. That is, printf("%2d", 12345) prints 12345.
The precision is either:
The number of digits printed after the decimal point, for the floating-point formats %e, %f, and
%g; or
The maximum number of characters to be printed, for %s; or
The minimum number of digits to be printed, for the integer formats %d, %o, %x, and %u.
For example, printf("%.3s", "Hello, world!") prints Hel, and printf("%.5d", 12)
prints 00012.
Either the width or the precision (or both) can be specified as *, which indicates that the next int
argument from the argument list should be used as the field width or precision. For example,
printf("%.*f", 2, 76.54321) prints 76.54.
The flags are a few optional characters which modify the conversion in some way. They are:
- Force left adjustment, by padding (out to the field width) on the right.
0 Use 0 as the padding character, instead of a space.
space For numeric formats, if the converted number is positive, leave an extra space before it (so
that it will line up with negative numbers if printed in columns).
+ Print positive numbers with a leading + sign.
# Use an `àlternate form'' of the conversion. (The details of the `àlternate forms'' are described
below, under the individual format characters.)
The modifier specifies the size of the corresponding argument: l for long int, h for short int, L
for long double.
Finally, the format character controls the overall appearance of the conversion (and, along with the
modifier, specifies the type of the corresponding argument). We've seen many of these already. The
complete list of format characters is:
c Print a single character. The corresponding argument is an int (or, by the default argument
promotions, a char or short int).
d Print a decimal integer. The corresponding argument is an int, or a long int if the l
modifier appears, or a short int if the h modifier appears. If the number is negative, it is
preceded by a -. If the space flag appears and the the number is positive, it is preceded by a space.
If the + flag appears and the the number is positive, it is preceded by a +.
e Print a floating-point number in scientific notation: [-]m.nnnnnne[-]nn . The
corresponding argument is either a float or a double or, if the L modifier appears, a long
double. The precision gives the number of places after the decimal point; the default is 6. If the
# flag appears, a decimal point will be printed even if the precision is 0.
E Like e, but use a capital E to set off the exponent.
f Print a floating-point decimal number (mmm.nnnnnn). The corresponding argument is either a
float or a double or, if the L modifier appears, a long double. The precision gives the
number of places after the decimal point; the default is 6. If the # flag appears, a decimal point
will be printed even if the precision is 0.
g Use either e or f, whichever works best given the range and precision of the number. (Roughly
speaking, ``works best'' means to display the most precision in the least space.) If the # flag
appears, don't strip trailing 0's.
G Like g, but use E instead of e.
i Just like d.
n The corresponding argument is an int *. Rather than printing anything, %n stores the number
of characters printed so far (by this call to printf) into the integer pointed to by the
corresponding argument. For example, if the variable np is an int, printf("%s %n%s!",

"Hello", &n, "world") stores 6 into np.
o Print an unsigned integer, in octal (base 8). The corresponding argument is an unsigned
int, or an unsigned long int if the l modifier appears, or an unsigned short int if
the h modifier appears. If the # flag appears, and the number is nonzero, it will be preceded by an
extra 0, to make it look like a C octal constant.
p Print a pointer value (the pointer, not what it points to), in some implementation-defined format.
The corresponding argument is a void *. Usually, the value printed is the numeric value of the
pointer--that is, the memory address pointed to--in hexadecimal. For the segmented architecture of
the IBM PC, pointers are often printed using a segment:offset notation.
s Print a string. The corresponding argument is a char * (which may result from an array of
char). If the optional precision is present, at most that many characters of the string will be
printed (if the \0 isn't encountered first).
u Print an unsigned decimal integer. The corresponding argument is an unsigned int, or an
unsigned long int if the l modifier appears, or an unsigned short int if the h
modifier appears.
x Print an unsigned integer, in hexadecimal (base 16). The corresponding argument is an
unsigned int, or an unsigned long int if the l modifier appears, or an unsigned
short int if the h modifier appears. If the # flag appears, and the number is nonzero, it will be
preceded by the characters 0x, to make it look like a C hexadecimal constant.
X Like x, except that the capital letters A, B, C, D, E, and F are used for the hexadecimal digits 1015. (Also, the # flag leads to a leading 0X.)
% Print a single % sign. There is no corresponding argument.
When you want to print to an arbitrary stream, instead of standard output, just use fprintf, which
takes a leading FILE * argument:
fprintf(stderr, "Syntax error on line %d\n", lineno);
Sometimes, it's useful to do some printf-like formatting, but not output the string right away. The
sprintf function is a printf variant which ``prints'' to an in-memory string rather than to a FILE *
stream. For example, one way to convert an integer to a string (the opposite of atoi) is:
int i = 123;
char string[20];
sprintf(string, "%d", i);
One thing to be careful of with sprintf, though, is that it's up to you to make sure that the destination
string is big enough.

16.6: Formatted Input (<TT>scanf</TT>)
16.6: Formatted Input (scanf)

Just as putchar has its getchar and fputs has its fgets, there's an input analog to printf,
namely scanf. scanf reads characters from standard input, under control of a format string, perhaps
converting some components of the string and storing them into variables. For example, just as you could
use the call
printf("(%d, %d)", x, y);
to print two integer values and some surrounding punctuation, you could use the call
scanf("(%d, %d)", &x, &y);
to attempt to extract two integer values from some input containing similar punctuation.
scanf interprets a format string, much like printf, with the first difference being that scanf
attempts to read characters and match them against the format string, rather than printing under control of
the format string. For each ordinary character in the format string, scanf expects to see that character
on the input; if not, it fails. For each format specifier in the input string, scanf attempts to match and
convert a string appropriate to the format specifier, storing the converted result into a variable pointed to
by the corresponding argument. If it can't find any characters matching the format specifier, it fails.
Since scanf ``returns'' many values (one for each format specifier in the format string), it must do so
using pointers which the caller passes. For each value to be converted, the caller passes a pointer to the
variable (or other location) where scanf should write the converted value. All arguments passed to
scanf must be pointers.
The format strings used by scanf are similar to those used by printf, but there are several
differences.
The optional width gives the maximum number of characters to read while performing the conversion
requested by a particular format specifier. (If there are many adjacent characters which could satisfy a
request--many digits for one of the numeric conversions, or many characters for %s conversion--the
width keeps scanf from gobbling all of them up at once.)
There is no equivalent to the precision modifier.
If the * flag appears, it indicates that the converted value should be discarded, not written to a location
pointed to by one of the pointers in the argument list. (In other words, there is no corresponding
argument.) Since * is usurped for this function, there is no way to use a variable field width from the
argument list with scanf. There are no other flags.
http://www.eskimo.com/~scs/cclass/int/sx2f.html (1 of 4) [22/07/2003 5:33:46 PM]
The modifier characters are more significant. An h indicates that the corresponding integer pointer
argument (for %d, %u, %o, or %x) is a short int * or unsigned short int *. An l indicates
that the corresponding integer pointer argument (for %d, %u, %o, or %x) is a long int * or
unsigned long int *, or that the floating-point pointer argument (for %e, %f, or %g) is a
double * rather than a float *. (Similarly, an L indicates a long double *.)
The %c format will read more than one character if an explicit width greater than 1 is specified. The
corresponding argument must be a pointer to enough space to hold all the characters read.
The %e, %f, and %g formats all read strings in either scientific notation or conventional decimal fraction
m.n notation. (In other words, the three formats act just the same.) However, they assume a float *
argument unless the l modifier appears, in which case they expect a double *. (This is in contrast to
printf, which accepts either float or double arguments for %e, %f, and %g, due to the default
argument promotions.)
The %i format will read a number in decimal, octal, or hexadecimal, taking a leading 0 to indicate octal
and a leading 0x (or 0X) to indicate hexadecimal, i.e. the same rules as used by C constants.
The %n format causes the number of characters read so far (by this call to scanf) to be stored in the
integer pointed to by the corresponding argument.
The %s format will read a string, up to the next whitespace character, and copy the string, terminated by
a \0, to the corresponding argument, which must be a char *. The caller must ensure (perhaps by
using an explicit width) that there is enough space to hold the received characters.
scanf has a special format specifier %[...], which matches any string composed of characters specified
in the []. For example, %[abc] would match any string composed of a's, b's, and c's. The
corresponding argument is a char *; the matched string is written to the location pointed to, followed
by a \0. The caller must ensure (perhaps by using an explicit width) that there is enough space to hold
the received characters. A second form, %[^...], matches a string of characters not found in the set. For
example, scanf("(%[^)])", s) reads, into the string s, a string of characters (possibly including
whitespace) from an input in which the string appears enclosed in parentheses. It may also be possible to
specify ranges of characters (e.g. %[a-z], %[0-9], etc.), but these are not as portable.
With the exception of %c, %n, and %[, all of the conversion specifiers skip any leading whitespace
(spaces, tabs, or newlines) which might precede the value or string converted. Also, any whitespace
character in the format string matches any number of whitespace characters in the input. Therefore, the
format "%d %d" would match the input "12 34" or "12 34" or "12\t34". However, the format
"%d%d" would match all of these inputs as well, since the second %d first scans past any whitespace
preceding the 34.
scanf returns the number of items it successfully converts and stores. It will return a number less than
expected (less than the number of format specifiers not containing *, or less than the number of
corresponding pointer arguments) if the conversion fails at any point, and it will leave any unrecognized
characters (i.e. the ones that caused the last match to fail) waiting in the input for next time. scanf
returns EOF if it encounters end-of-file before converting anything.
If you want to read characters from an arbitrary stream, you can use fscanf, which takes an initial
FILE * argument.
You can scan and convert characters from a string (rather than from a stream) using sscanf. For
example,
int x, y;
sscanf("12 34", "%d %d", &x, &y);
would place 12 in x and 34 in y.
scanf and fscanf are seductively useful, but they have a number of drawbacks in practice. They
seem to make it very easy to, say, prompt the user for a number:
int x;
printf("Type a number:\n");
scanf("%d", &x);
But what happens if the user fumbles, and types something other than a number? Even if the code checks
scanf's return value, and prompts the user again if scanf returns 0, the non-numeric input remains on
the input, and will be encountered by the next call to scanf unless some other steps are taken. (That is,
scanf will rediscover the user's old, bad input before it gets to any new input.) It's also easy to write
things like
scanf("%d\n", &x);
but this code does not work as intended; the \n in the format string is a whitespace character, which asks
scanf to discard one or more whitespace characters, so it will keep reading characters as long as they
are whitespace characters, that is, it will read characters until it finds something that is not a whitespace
character. It won't read that eventual whitespace character once it finds it, but in the process of looking
for it it will seem to jam your program, since the call to scanf won't return right after the user types a
number.
Therefore, it's much better to read interactive user input a line at a time, and then use functions like atoi
(or perhaps sscanf) to interpret the line that the user typed.

16.7: Arbitrary Input and Output (<TT>fread</TT>, <TT>fwrite</TT>)
16.7: Arbitrary Input and Output (fread, fwrite)

Sometimes, you want to read a chunk of characters, without treating it as a ``line'' (as gets and fgets
do) and certainly without doing any scanf-like parsing. Similarly, you may want to write an arbitrary
chunk of characters, not as a string or a line. (Furthermore, the chunk might contain one or more \0
characters which would otherwise terminate a string.) In these situations, you want fread and fwrite.
fread's prototype is
size_t fread(void *buf, size_t sz, size_t n, FILE *fp)
Remember, void * is a ``generic'' pointer type (the type returned by malloc), which can point to
anything. It may make it easier to think about fread at first if you imagine that its first argument were
char *. size_t is a type we haven't met yet; it's a type that's guaranteed to be able to hold the size of
any object (i.e. as returned by the sizeof operator); you can imagine for the moment that it's
unsigned int. fread reads up to n objects, each of size sz, from the stream fp, and copies them to
the buffer pointed to by buf. It reads them as a stream of bytes, without doing any particular formatting
or other interpretation. (However, the default underlying stdio machinery may still translate newline
characters unless the stream is open in binary or "b" mode). fread returns the number of items read. It
returns 0 (not EOF) at end-of-file.
Similarly, the prototype for fwrite is
size_t fwrite(void *buf, size_t sz, size_t n, FILE *fp)
fread and fwrite are intended to write chunks or `àrrays'' of items, with the interpretation that there
are n items each of size sz. If what you want to do is read n characters, you can call fread with sz as
1, and buf pointing to an array of at least n characters. The return value will be in units of characters.
(Of course, you could write n characters by using similar arguments with fwrite.)
Besides reading and writing ``blocks'' of characters, you can use fread and fwrite to do ``binary''
I/O. For example, if you have an array of int values:
int array[N];
you could write them all out at once by calling
fwrite(array, sizeof(int), N, fp);
This would write them all out in a byte-for-byte way, i.e. as a block copy of bytes from memory to the
output stream, i.e. not as strings of digits as printf %d would. Since some of the bytes within the array
http://www.eskimo.com/~scs/cclass/int/sx2g.html (1 of 2) [22/07/2003 5:33:48 PM]
16.7: Arbitrary Input and Output (<TT>fread</TT>, <TT>fwrite</TT>)
of int might have the same value as the \n character, you would want to make sure that you had
opened the stream in binary or "wb" mode when calling fopen.
Later, you could try to read the integers in by calling
fread(array, sizeof(int), N, fp);
Similarly, if you had a variable of some structure type:
struct somestruct x;
you could write it out all at once by calling
fwrite(&x, sizeof(struct somestruct), 1, fp);
and read it in by calling
fread(&x, sizeof(struct somestruct), 1, fp);
Although this ``binary'' I/O using fwrite and fread looks easy and convenient, it has a number of
drawbacks, some of which we'll discuss in the next chapter.

http://www.eskimo.com/~scs/cclass/int/sx2g.html (2 of 2) [22/07/2003 5:33:48 PM]
16.8: <TT>EOF</TT> and Errors
16.8: EOF and Errors

When a function returns EOF (or, occasionally, 0 or NULL, as in the case of fread and fgets
respectively), we commonly say that we have reached `ènd of file,'' but it turns out that it's also possible
that there's been some kind of I/O error. When you want to distinguish between end-of-file and error, you
can do so with the feof and ferror functions. feof(fp) returns nonzero (that is, ``true'') if end-offile has been reached on the stream fp, and ferror(fp) returns nonzero if there has been an error.
Notice the past tense and passive voice: feof returns nonzero if end-of-file has been reached. It does
not tell you that the next attempt to read from the stream will reach end-of-file, but rather that the
previous attempt (by some other function) already did. (If you know Pascal, you may notice that the endof-file detection situation in C is therefore quite different from Pascal.) Therefore, you would never write
a loop like
while(!feof(fp))
fgets(line, max, fp);
Instead, check the return value of the input function directly:
while(fgets(line, max, fp) != NULL)
With a very few possible exceptions, you don't use feof to detect end-of-file; you use feof or
ferror to distinguish between end-of-file and error. (You can also use ferror to diagnose error
conditions on output files.)
Since the end-of-file and error conditions tend to persist on a stream, it's sometimes necessary to clear
(reset) them, which you can do with clearerr(FILE *fp).
What should your program do if it detects an I/O error? Certainly, it cannot continue as usual; usually, it
will print an error message. The simplest error messages are of the form
fp = fopen(filename, "r");
if(fp == NULL)
{
fprintf(stderr, "can't open file\n");
return;
}
or
while(fgets(line, max, fp) != NULL)
{
http://www.eskimo.com/~scs/cclass/int/sx2h.html (1 of 2) [22/07/2003 5:33:50 PM]
16.8: <TT>EOF</TT> and Errors
... process input ...

}
if(ferror(fp))
fprintf(stderr, "error reading input\n");
or
fprintf(ofp, "%d %d %d\n", a, b, c);
if(ferror(ofp))
fprintf(stderr, "output write error\n");
Error messages are much more useful, however, if they include a bit more information, such as the name
of the file for which the operation is failing, and if possible why it is failing. For example, here is a more
polite way to report that a file could not be opened:
#include <stdio.h>
#include <errno.h>
#include <string.h>
/* for fopen */
/* for errno */
/* for strerror */
fp = fopen(filename, "r");
if(fp == NULL)
{
fprintf(stderr, "can't open %s for reading: %s\n",
filename, strerror(errno));
return;
}
errno is a global variable, declared in <errno.h>, which may contain a numeric code indicating the
reason for a recent system-related error such as inability to open a file. The strerror function takes an
errno code and returns a human-readable string such as ``No such file'' or ``Permission denied''.
An even more useful error message, especially for a ``toolkit'' program intended to be used in
conjunction with other programs, would include in the message text the name of the program reporting
the error.

http://www.eskimo.com/~scs/cclass/int/sx2h.html (2 of 2) [22/07/2003 5:33:50 PM]
16.9: Random Access (<TT>fseek</TT>, <TT>ftell</TT>, etc.)
16.9: Random Access (fseek, ftell, etc.)

Normally, files and streams (that is, anything accessed via a FILE *) are read and written sequentially.
However, it's also possible to jump to a certain position in a file.
To jump to a position, it's generally necessary to have ``been there'' once already. First, you use the
function ftell to find out what your position in the file is; then, later, you can use the function fseek
to get back to a saved position.
File positions are stored as long ints. To record a position, you would use code like
long int pos;
pos = ftell(fp);
Later, you could ``seek'' back to that position with
fseek(fp, pos, SEEK_SET);
The third argument to fseek is a code telling it (in this case) to set the position with respect to the
beginning of the file; this is the mode of operation you need when you're seeking to a position returned
by ftell.
As an example, suppose we were writing a file, and one of the lines in it contained the words ``This file
is n lines long'', where n was supposed to be replaced by the actual number of lines in the file. At the time
when we wrote that line, we might not know how many lines we'd eventually write. We could resolve the
difficulty by writing a placeholder line, remembering where it was, and then going back and filling in the
right number later. The first part of the code might look like this:
long int nlinespos = ftell(fp);
fprintf(fp, "This file is %4d lines long\n", 0);
Later, when we'd written the last line to the file, we could seek back and rewrite the ``number-of-lines''
line like this:
fseek(fp, nlinespos, SEEK_SET);
fprintf(fp, "This file is %4d lines long\n", nlines);
There's no way to insert or delete characters in a file after the fact, so we have to make sure that if we
overwrite part of a file in this way, the overwritten text is exactly the same length as the previous text.
That's why we used %4d, so that the number would always be printed in a field 4 characters wide.
(However, since the field width in a printf format specifier is a minimum width, with this choice of
http://www.eskimo.com/~scs/cclass/int/sx2i.html (1 of 2) [22/07/2003 5:33:52 PM]
16.9: Random Access (<TT>fseek</TT>, <TT>ftell</TT>, etc.)
width, the code would fail if a file ever had more than 9999 lines in it.)
Three other file-positioning functions are rewind, which rewinds a file to its beginning, and fgetpos
and fsetpos, which are like ftell and fseek except that they record positions in a special type,
fpos_t, which may be able to record positions in huge files for which even a long int might not be
sufficient.
If you're ever using one of the ``read/write'' modes ("r+" or "w+"), you must use a call to a filepositioning function (fseek, rewind, or fsetpos) before switching from reading to writing or vice
versa. (You can also call fflush while writing and then switch to reading, or reach end-of-file while
reading and then switch back to writing.)
In binary ("b") mode, the file positions returned by ftell and used by fseek are byte offsets, such
that it's possible to compute an fseek target without having to have it returned by an earlier call to
ftell. On many systems (including Unix, the Macintosh, and to some extent MS-DOS), file
positioning works this way in text mode as well. Code that relies on this isn't as portable, though, so it's
not a good idea to treat ftell/fseek positions in text files as byte offsets unless you really have to.

http://www.eskimo.com/~scs/cclass/int/sx2i.html (2 of 2) [22/07/2003 5:33:52 PM]
16.10: File Operations (<TT>remove</TT>, <TT>rename</TT>, etc.)
16.10: File Operations (remove, rename, etc.)

You can delete a file by calling
int remove(char *filename)
You can rename a file by calling
int rename(char *oldname, char *newname)
Both of these functions return zero if they succeed and a nonzero value if they fail.
There are no standard C functions for dealing with directories (e.g. listing or creating them). On many
systems, you will find functions mkdir for making directories and rmdir for removing them, and a
suite of functions opendir, readdir, and closedir for listing them. Since these functions aren't
standard, however, we won't talk about them here. (They exist on most Unix systems, but they're not
standard under MS-DOS or Macintosh compilers, although you can find implementations on the net.)

http://www.eskimo.com/~scs/cclass/int/sx2j.html [22/07/2003 5:33:54 PM]
16.11: Redirection (<TT>freopen</TT>)
16.11: Redirection (freopen)

For some programs, standard input and standard output are enough, and these programs can get by using
just getchar, putchar, printf, etc., and letting any input/output redirection be handled by the user
and the operating system (perhaps using command-line redirection such as < and >). Other programs
handle all file manipulation themselves, opening files with fopen and maintaining file pointer (FILE
*) variables recording the streams to which all input and output is done (with getc, putc, fprintf,
etc.).
Occasionally, a program has to be rewritten in a hurry, to allow it to read or write a named file without
manipulating file pointers and changing every call to getchar to getc, every call to printf to
fprintf, etc. In these cases, the function freopen comes in handy: it reopens an existing stream on a
new file. The prototype is
FILE *freopen(char *filename, char *mode, FILE *fp)
freopen is about like fopen, except that rather than allocating a new stream, it uses (and returns) the
caller-supplied stream fp. For example, to redirect a program's output to a file ``from within,'' you could
call
freopen(filename, "w", stdout);
A disadvantage of freopen is that there's generally no way to undo it; you can't change your mind later
and make stdin or stdout go back to where they had been before you called freopen. In situations
where you want to be able to swich back and forth between streams, it's much better if you can chase
down and change every call to getchar to getc, every call to printf to fprintf, etc., and then
use some FILE * variable under your control (typically with a name like ifp or ofp) so that you can
set it to point to a file by calling fopen, and later back to stdin or stdout by simply reassigning it.

http://www.eskimo.com/~scs/cclass/int/sx2k.html [22/07/2003 5:33:56 PM]

Many programs need to read and write data files. A program might read data files to initialize its
configuration or to receive data from another program; a program might write data files to save its state
or to send data to another program. In this chapter we'll explore techniques for reading and writing data
files, and for designing data file formats so that they are functional, useful, and convenient.
What makes a good data file? There are many desirable attributes which we might want to achieve or to
trade off against one another. We might want data files to be small (to save disk space) or to be efficient
to read and write (to save computer time) or to easy to read and write (that is, easy to implement, to save
programmer time). We might want them to be human readable, to make them easier to debug or modify,
or so that they could be created ``by hand'' (i.e. all using standard file-manipulation tools). On the other
hand, if the files are to contain sensitive data, we might prefer that they not be human-readable. We
might want the files to be portable across different machine architectures (if we will be moving data files
from machine to machine). We might want to ensure that if the data file format ever changes (perhaps to
add new information), newer versions of our software (that is, the software that reads and writes the data
files) can still read the old files, and perhaps even that old versions of the software can at least partially
read the new files. We'll see ways of achieving all of these attributes.
Roughly speaking, there are two large classes of data file formats: ``text'' and ``binary''. Text files, as
their name implies, contain human-readable text; that is, if you were to read one into a text editor or
dump one to your screen, it would consist of strings of printable characters, arranged into lines. (By
``printable characters'' we mean characters which display nicely on the screen, as opposed to ``control
characters.'' Generally speaking, the only control characters a text file will contain will be CR or LF or
CRLF combinations to mark the ends of lines, and perhaps horizontal tabs. C represents the end-of-line
character(s) by \n, and tabs by \t.)
Binary files, on the other hand, contain arbitrary patterns of bits and bytes, arranged for the computer's
convenience, not the human's. The bytes making up a binary file are not intended to be interpreted as
characters or text; if you dump one to the screen, you get all sorts of garbage. Some of the bit patterns
will happen to represent printable characters, but others will be control characters, others may be special
graphics characters, and still others may end up representing sequences which will switch the display into
inverse video, clear the screen, etc. (Depending on your display environment, printing arbitrary binary
characters may confuse the display so badly that it becomes unusable and must be reset.)
In a text file, we might represent the integer 12345 as the five characters 1 2 3 4 5 (that is, as the text
string "12345"). In a binary file, on the other hand, we might represent it as two bytes with values
0x30 and 0x39, since 12345 base 10 is 3039 base 16. (Just to confuse the issue, it happens that in the
ASCII character set the values 0x30 and 0x39 represent the characters '0' and '9', but this is sheer
coincidence; the character values 0 and 9 of course have no meaningful relationship to the value 12345
that we're storing.)
17.1: Text Data Files

17.2: Binary Data Files


Text data files, it must be admitted, are not always as compact or as efficient to read and write as binary files.
It can be a bit more work to set up the code which reads and writes them. But they have some powerful
advantages: any time you need to, you can look at them using ordinary text editors and other tools. If
program A is writing a data file which program B is supposed to be able to read but cannot, you can
immediately look at the file to see if it's in the correct format and so determine whether it's program A's or
B's fault. If program A has not been written yet, you can easily create a data file by hand to test program B
with. Text files are automatically portable between machines, even those where integers and other data types
are of different sizes or are laid out differently in memory. Because they're not expected to have the rigid
formats of binary files, it tends to be more natural to arrange text files so that as the data file format changes
slightly, newer (or older) versions of the software can read older (or newer) versions of the data file. Text
data files are the focus of this chapter; they're what I use all the time, and they're what I recommend you use
unless you have compelling reasons not to.
When we're using text data files, we acknowledge that the internal and external representations of our data
are quite different. For example, a value of type int will usually be represented internally as a 2- or 4-byte
(16- or 32-bit) piece of memory. Externally, though, that integer will be represented as a string of characters
representing its decimal or hexadecimal value. Converting back and forth between the internal and external
representations is easy enough. To go from the internal representation to the external, we'll almost always
use printf or fprintf; for example, to convert an int we might use %d or %x format. To convert from
the external representation back to the internal, we could use scanf or fscanf, or read the characters in
some other way and then use functions like atoi, strtol, or sscanf.
We have a great many options when it comes to performing this mapping, that is, when converting between
the internal and external representations. Our choice may be determined by the layout we want the data file
to have, or by what's easiest to implement, or by some combination of these factors. Some of the choices are
pretty arbitrary; but in any case, what matters most is obviously that the reading and writing code ``match'',
that is, that the data file writing code write the data in the right format such that the data file reading code can
accurately read it. For the rest of this section, we'll explore several ways of writing and reading data to and
from text data files, using various combinations of the stdio functions (and perhaps one or two of our own).
Suppose we had an array of integers:
int a[10];
and suppose it had been filled up with values, and suppose we wanted to write them out to a data file. We
could write them all on one line, separated by spaces:
fprintf(ofp, "%d %d %d %d %d %d %d %d %d %d\n",
a[0], a[1], a[2], a[3], a[4], a[5],
a[6], a[7], a[8], a[9]);
We could write them on 10 separate lines:

for(i = 0; i < 10; i++)
fprintf(ofp, "%d\n", a[i]);
Realizing that the loop is easier and more flexible, we could go back to writing them all on one line, using a
loop:
for(i = 0; i < 10; i++)
fprintf(ofp, "%d ", a[i]);
fprintf(ofp, "\n");
If we were worried about that trailing space at the end of the line, we could arrange to eliminate it:
for(i = 0; i < 10; i++)
{
if(i > 0)
fprintf(ofp, " ");
fprintf(ofp, "%d", a[i]);
}
fprintf(ofp, "\n");
Recognizing that fprintf is overkill for printing single, fixed characters, we could replace two of the calls
with putc:
for(i = 0; i < 10; i++)
{
if(i > 0)
putc(' ', ofp);
fprintf(ofp, "%d", a[i]);
}
putc('\n', ofp);
When it came time to read the numbers in, we would have at least as many choices. We could read the ten
values all at once, using fscanf:
int r = fscanf(ifp, "%d %d %d %d %d %d %d %d %d %d",
&a[0], &a[1], &a[2], &a[3], &a[4], &a[5],
&a[6], &a[7], &a[8], &a[9]);
if(r != 10)
fprintf(stderr, "error in data file\n");
Since the scanf family treats all whitespace (spaces, tabs, and newlines) the same, this code would read
either the format with all the numbers on one line, or the format with one number per line. Notice that we
check fscanf's return value, to make sure that it successfully read in all the numbers we expected it to.
Since data files come in from the outside world, it's possible for them to be corrupted, and programs should
not blindly read them assuming that they're perfect. A program that crashes when it attempts to read a
damaged data file is terribly frustrating; a program that diagnoses the problem is much more polite.
We could also read the data file a line at a time, converting the text to integers via other means. If the
integers were stored one per line, we could use code like this:
#define MAXLINE 200
char line[MAXLINE];
for(i = 0; i < 10; i++)
{
if(fgets(line, MAXLINE, ifp) == NULL)
{
break;
}
a[i] = atoi(line);
}
(We could also use our own getline or fgetline function instead of fgets.) If the integers were
stored all on one line, we could use the getwords function from chapter 10 to separate the numbers at the
whitespace boundaries:
char *av[10];
else if(getwords(line, av, 10) != 10)
else
{
for(i = 0; i < 10; i++)
a[i] = atoi(av[i]);
}
Suppose, now, that there were not always 10 elements in the array a; suppose we had a separate integer
variable na to record how many elements the array a currently contains. When writing the data out, we
would certainly then use a loop; we might also want to precede the data by the count, in case that will make
it easier for the reading program:
fprintf(ofp, "%d\n", na);
for(i = 0; i < na; i++)

We could also print all of the numbers on one line:
fprintf(ofp, "%d", na);
for(i = 0; i < na; i++)
fprintf(ofp, " %d ", a[i]);
(Notice that the presence of the extra value at the beginning of the line makes the space separator game
easier to play.)
Now, when reading the data in, we would simply read the count first, then the data. Using fscanf:
if(fscanf(ifp, "%d", &na) != 1)
{
return;
}
if(na > 10)
{
fprintf(stderr, "too many items in data file\n");
return;
}
for(i = 0; i < na; i++)
{
if(fscanf(ifp, "%d", &a[i]) != 1)
{
return;
}
}
(Here we assume that the code to read the array from the data file is part of a function, and that when we
detect an error, we return early from the function. In practice, we would probably return some error code to
the caller.)
If we chose to use fgets (or fgetline), the code might look like this for data on separate lines:
{
return;
}
na = atoi(line);
if(na > 10)
{
return;
}
for(i = 0; i < na; i++)
{
{
return;
}
a[i] = atoi(line);
}
Or, if the data were all on one line, like this:
int ac;
char *av[11];
{
return;
}
ac = getwords(line, av,
if(ac < 1)
{
fprintf(stderr,
return;
}
na = atoi(av[1]);
if(na > 10)
{
fprintf(stderr,
return;
}
if(na != ac - 1)
{
fprintf(stderr,
return;
10);
"error in data file\n");
"too many items in data file\n");
"error in data file\n");
}
for(i = 0; i < na; i++)
a[i] = atoi(av[i+1]);
But sometimes, you don't need to save the count (na) explicitly; the reading program can deduce the number
of items from the number of items in the file. If the file contains only the integers in this array, then we can
simply read integers until we reach end-of-file. For example, using fscanf:
na = 0;
while(na < 10 && fscanf(ifp, "%d", &a[na]) == 1)
na++;
(This code is deceptively simple; we haven't carefully dealt with appropriate error messages for a data file
with more than 10 values, or a data file with a non-numeric ``value'' for which fscanf returns 0.)
Again, we could also use fgets. If the data is on separate lines:
na = 0;
while(na < 10 && fgets(line, MAXLINE, ifp) != NULL)
a[na++] = atoi(line);
If the data is all on one line:
{
return;
}
na = getwords(line, av, 10);
if(na > 10)
{
return;
}
for(i = 0; i < na; i++)
a[i] = atoi(av[i]);
Notice that this last implementation does not require that the file consist of only data for the array a. One line
of the file consists of data for the array a, but other lines of the file could contain other data.
We could also scatter a's data on multiple lines, without using an explicit count, and with the ability for the
file to contain other data as well, if we marked the end of the array data with an explicit marker in the file,
rather than assuming that the array's data continued until end-of-file. For example, we could write the data
out like this:

for(i = 0; i < na; i++)
fprintf(ofp, "end\n");
and read it like this:
na = 0;
while(fgets(line, MAXLINE, ifp) != NULL)
{
if(strncmp(line, "end", 3) == 0)
break;
if(na > 10)
{
return;
}
a[na++] = atoi(line);
}
(There's just one nuisance here in checking for the `ènd'' marker: fgets leaves the \n in the line it reads,
so a simple strcmp against "end" would fail. Here we use strncmp, which compares at most n
characters, and we pass the third argument, n, as 3. Other solutions would be to use strcmp against the
string "end\n", or to strip the \n somehow, or to use our old getline or fgetline functions, since
they strip the \n for us.)
Now that we've seen many (too many!) options for writing and reading the array, how do you decide which
to use? Should you use fscanf, or the slightly more ad hoc methods involving fgets, getwords, atoi,
etc? It's largely a matter of personal preference. In the code fragments we've looked at so far, the ones using
fscanf have seemed shorter, although in some cases that was because they weren't doing as much error
checking as the ones that used fgets. In general, the methods using fgets will allow somewhat more
flexibility, as we saw when checking for the explicit `ènd'' marker, which would have been difficult or
impossible using scanf or fscanf.
Now let's move to another example, a user-defined data structure. Suppose we have this structure:
struct s
{
int i;
float f;
char s[20];
};
To write an instance of this structure out, we could simply print its fields on one line:
struct s x;
...
fprintf(ofp, "%d %g %s\n", x.i, x.f, x.s);
or on several lines:
fprintf(ofp, "%d\n", x.i);
fprintf(ofp, "%g\n", x.f);
fprintf(ofp, "%s\n", x.s);
or simply
fprintf(ofp, "%d\n%g\n%s\n", x.i, x.f, x.s);
(We use %g format for the float field because %g tends to print the most accurate representation in the
smallest space, e.g. 1.23e6 instead of 1230000 and 1.23e-6 instead of 0.00000123 or 0.000001.)
To read this structure back in, we could again either use fscanf, or fgets and some other functions. As
before, fscanf seems easier:
if(fscanf(ifp, "%d %g %s", &x.i, &x.f, &x.s) != 3)
{
return;
}
Here we have a problem, though: what if the third, string field contains a space? In the scanf family, the
%s format stops reading at whitespace, so if x.s had contained the string "Hello, world!", it would be
read back in as "Hello,". As it happens, we could fix it by using the less-obvious format string "%d %g
%[^\n]", where %[^\n] means ``match any string of characters not including \n''. But we also have
another problem: what if the string is longer than the 20 characters we allocated for the s field? We could fix
this by using %20s or %20[^\n], although we'd have to remember to change the scanf format string if
we ever changed the size of the array.
Let's leave fscanf for a moment and look at our other alternatives. If we'd printed the data all on one line,
we could use
#include <stdlib.h>
/* for atof() */
char *av[3];

{
return;
}
if(getwords(line, av, 3) != 3)
{
return;
}
x.i = atoi(av[0]);
x.f = atof(av[1]);
strcpy(x.s, av[2]);
/* XXX */
Here we luck out on the question of what happens if the string contains a space, because it happens that our
version of getwords (see chapter 10, p. 13) leaves the remaining words in the last ``word'' if there are more
words in the string than we told it to find, i.e. more than the third argument to getwords which gives the
size of the av array. Here, we told it it could only look for 3 words, so if the string contains spaces, making
the line appear to have 4 or more words, words 3, 4, etc. will all be pointed to by av[2]. However, we still
have the problem that we haven't guarded against overflow of x.s if the third (plus fourth, etc.) word on the
data line is longer than 20 characters. (The comment /* XXX */ is a traditional marker which means ``this
line is inadequate and definitely won't work reliably in all situations but for one reason or another the person
writing it is not going to take the trouble to do it right just yet.'')
If the data is written on three lines, on the other hand, we obviously have to call fgets three times to read
it:
{ fprintf(stderr, "error in data file\n"); return; }
x.i = atoi(line);
x.f = atof(line);
strcpy(x.s, line);
/* XXX */
Now the last line has two problems: besides the lingering problem of overflow (if the line is more than 18
characters long), we have the problem that fgets retains the \n (which is why x.s will overflow if the
line is longer than 18 characters, not 19). In this case, one way to fix the overflow problem would be to have
fgets read into x.s directly:
if(fgets(x.s, 20, ifp) == NULL)

If we didn't want to have to remember to change that 20 in the call to fgets if we ever re-sized the array,
we could get clever and write fgets(x.s, sizeof(x.s), ifp). Also, we might as well figure out
how to get rid of that pesky \n. One way is by calling the standard library function strchr, which searches
for a certain character in a string. This will require that we #include <string.h>, and declare an extra
char * variable:
#include <string.h>
char *p;
p = strchr(x.s, '\n');
if(p != NULL)
*p = '\0';
strchr returns a pointer to the character that it finds, or a null pointer if it doesn't find the character. If
there's a \n in the line at all, we know it's at the end, so it's safe to overwrite it with a \0, making the string
one character shorter. (Since we know that the \n is at the end, we could also call the function strrchr,
which finds a character starting from the right.)
For any of the methods we've been using so far, what if one day we add a new field to the structure s?
Obviously, we'll have to rewrite the code which writes the structure out and also the code which reads it in.
Also, unless we're careful, the modified code won't be able to read in any data files we might happen to have
lying around which were written before the structure was changed. Depending on the nature of the data file
and the way it's used, this can be a real problem. (In principle, it's possible to write a utility program to
convert the old data files to the new format, but it can be a nuisance to write that program, and it can be a
real nuisance to track down all of the old data files that need converting.)
Therefore, when a data file format must be changed, it's often a good idea if the new, improved data file
reader can be made to automatically detect and read old-format files as well. (Automatic detection isn't a
strict necessity, but it's certainly a nicety.) Furthermore, it's much easier to write a new & improved data file
reader, that can read both old and new formats, if the possibility was thought of back when the original data
file format was designed.
One thing that helps a lot is if data file formats have version numbers, and if each data file begins with a
number, in a simple format and known location which won't change even if the rest of the format changes,
indicating which version of the format this file uses. Having a file format version number at the beginning of
each data file leads to two immediate advantages:
1. Whenever a new program reads a data file, it can immediately and unambiguously decide how it's
going to read it, whether it can use its new & improved reading routines or whether it might have to
fall back on its backwards-compatibility, old-style reader.
2. If there is a suite of several programs, all of which read the same data files, and if for some reason
there's an old version of one of the programs still in use, the old program can print an unambiguous
message along the lines of ``this is a new data file which I am too old to read'', rather than printing the
(misleading, in this case) `èrror in data file'' (or crashing).
Another technique which can be immensely useful and which we'll explore next is to define a data file
format in such a way that the overall format doesn't change even if new data is added to it.
It's easy to see why the simple data file fragments we've been looking at so far are not resilient in the face of
newly-introduced data fields. In the case of struct s, the reader always assumed that the first field in the
data file was i, the second field was f, and the third field was s. If we ever add any new fields, unless we're
careful to add them at the end of the file (and lucky on top of that), the simpleminded reader will get
confused.
One powerful way of getting around this problem is to tag each piece of data in the file, so that the reader
knows unambiguously what it is. For example, suppose that we wrote instances of our struct s out like
this:
fprintf(ofp, "i %d\n", x.i);
fprintf(ofp, "f %g\n", x.f);
fprintf(ofp, "s %s\n", x.s);
Now, each line begins with a little code which identifies it. (The code in the data file happens to match the
name of the corresponding structure member, but that's not necessary, nor is there any way of getting the
compiler to make any correspondence automatically.)
If we simply modified one of our previous file-reading code fragments to read this new, tagged format, we
might quickly end up with a mess. We'd be continually checking the tag on the line we just read against the
tag we expected to read, and constantly printing error messages or trying to resynchronize. But in fact, there's
no reason to expect the lines to come in a certain order, and it turns out that it's easier to read such a file a
line at a time, without that assumption, taking each line as it comes and not worrying what order the lines
come in. Here is how we might do it:
x.i = 0; x.f = 0.0; x.s[0] = '\0';
while(fgets(line, MAXLINE, ifp) != NULL)
{
if(*line == '#')
continue;
ac = getwords(line, av, 2);
if(ac == 0)
continue;
if(strcmp(av[0], "i") == 0)
x.i = atoi(av[1]);
else if(strcmp(av[0], "f") == 0)
x.f = atof(av[1]);
else if(strcmp(av[0], "s") == 0)

strcpy(x.s, av[1]);
/* XXX */
}
This example also throws in a few new little features: a line beginning with # is ignored, so we will be able
to place comment lines in data files by beginning them with #. The code also ignores blank lines (those for
which getwords returns 0).
We're now treating the ``data file'' almost like a ``command file''--the first word on each line is almost like a
``command'' telling us to do something: i means store this value in x.i; f means store this value in x.f,
etc. Since we don't have any easy way of telling whether we ever got around to setting a particular field, we
initialize each one to an appropriate default value before we start. Notice that we did not have a last line in
the if/else/if/else chain saying
else
Instead, we quietly ignore lines we don't recognize! This strategy is admittedly on the simpleminded side,
and it would not be adequate under all circumstances, but it means that an old program can read a new data
file containing fields it's never heard of. The old program will still be able to pluck out the data it does
recognize and can use, while (deliberately) ignoring the (new) data it doesn't know about.
This code is not perfect. We still have the same sorts of problems with that string field, s: it might contain
spaces, which we get around (this time) by calling getwords with a second argument of 2, so that all but
the first word on the line end up `ìn'' av[1]. Also, the code does not check to see that there actually was a
second word on the line before using it to set x.i, x.f, or x.s. (In this case, we could fix that by
complaining if getwords did not return 2.)
Finally, we still have the potential for overflow, and we might as well grit our teeth now and figure out how
to fix it. Since we already initialized x.s to the empty string with the assignment x.s[0] = '\0', one
way around the problem is to replace the call to strcpy with a call to strncat:
...
else if(strcmp(av[0], "s") == 0)
strncat(x.s, av[1], 19);
(or, again, perhaps strncat(x.s, av[1], sizeof(x.s)-1)). The strcat and strncat
functions are slightly misleadingly named: what they actually do is append the second string you hand them
(i.e. the second argument) to the first, in place. In the case of strncat, it never copies more than n
characters, where n is its third argument, although it does always append a \0, which is why we tell it to
copy at most 19 characters, not 20. (Since x1.s starts out empty, there's definitely room for 19, although we
would still have to worry about the possibility of a corrupted data file which contained two s lines. You
might wonder why we couldn't simply use strncpy, but it turns out that, for obscure historical reasons,
strncpy does not always append the \0.)
Although it has a few imperfections (which are easily remedied, and are left as exercises) this last example
(using fgets, getwords, and an if/strcmp/else... chain) is an excellent basis for a flexible, robust
data file reader.
One footnote about the troublesome string field, s: to get around the problem of fixed-size arrays, you might
one day decide to declare the s field of struct s as a pointer rather than a fixed-size array. You would
have to be careful while reading, however. It might seem that you could just write, for example,
x.s = av[1];
/* assumes char *s, but also WRONG */
but this would not work; remember that whenever you use pointers you have to worry about memory
allocation. If you assigned x.s in that way, where would be the memory that it points to? It would be
wherever av[1] points, which is back into the line array. Not only is that (probably) a local array, valid
only while the file-reading functions are active, but it's also overwritten with each new line in the data file.
You'll obviously want x.s to retain a useful pointer value pointing to the text read from the file, which
means that you'll still have to make a copy, after allocating some memory. In this case, you might do
x.s = malloc(strlen(av[1]) + 1);
if(x.s == NULL)
{ fprintf(stderr, "out of memory\n"); return; }
strcpy(x.s, av[1]);
To some extent, the problems we've been having with field s are fundamental. In particular, any time you
use text formats which are based on whitespace-separated ``words,'' string fields which might contain spaces
are always tricky to handle.


Normally, when writing notes like these, I progress from the easy to the hard, or the boring to the
interesting, or the deficient to the recommended. This chapter is the reverse; I heartily recommend that
you use the text data files of the previous section whenever possible. This section on binary data files is
included for completeness, and you're welcome to skip it if you're not interested in using binary data files
or if it doesn't make sense.
We've already seen two examples of writing and reading binary data files, in section 16.7 of the previous
chapter. To write out an array of integers, we called
fwrite(array, sizeof(int), na, fp);
To read them back in, we called
na = fread(array, sizeof(int), 10, fp);
To write out a structure, we called
fwrite(&x, sizeof(struct s), 1, fp);
To read it back in, we called
fread(&x, sizeof(struct s), 1, fp);
(which returns 1 if it succeeds).
These examples certainly seem attractive: they will result in compact data files, they will probably be
quite efficient, and they are certainly simple for the programmer to write. However, data files created in
this way fare quite badly when evaluated against our other criteria. They will not be human-readable;
they will contain sets of inscrutable byte values which are exact copies of the memory regions used to
contain the data structures. They will not be at all portable; they cannot be correctly read (at least, not
with the simple calls to fread) on machines where basic types such as int have different sizes, or
where the basic types are laid out differently in memory (e.g. ``big endian'' vs. ``little endian'', or
different floating-point representations). They may not even be able to be read by the same code
compiled under a different compiler on the same machine, since different compilers may use different
sizes for integers, or lay out the fields of structures differently in memory. (The fields will always be in
the order you expect, but different compilers may, for various reasons, leave different amounts of empty
space or ``padding'' between certain fields.) These binary files will have no provision whatsoever for
backwards or forwards compatibility; any change to the structure definition will completely change the
implied format of the data file, with no hope of reading older (or newer) files. The only other benefit
these files have is that if the data is for any reason sensitive, it will certainly be a bit better concealed
from prying eyes.
We can get around these disadvantages of binary data files, but in so doing we'll lose many of the
advantages, such as blinding efficiency or programmer convenience. If we care about data file portability
or backwards or forwards compatibility, we will have to write structures one field at a time, not in one
fell swoop. Furthermore, if we have an int to write, we may choose not to write it using fwrite:
fwrite(&i, sizeof(int), 1, fp);
but rather a byte at a time, using putc:
putc(i / 256, fp);
putc(i % 256, fp);
In this way, we'd have precise control over the order in which the two halves of the int are being
written. (We're assuming here that there's no more than two bytes' worth of data in the int, which is a
safe assumption if we're portably assuming that ints can only hold up to +-32767.) When it came time
to read the int back in, we might do that a byte at a time, too:
i = getc(fp);
i = 256 * i + getc(fp);
(We could not collapse this to i = 256 * getc(fp) + getc(fp), because we wouldn't know
which order the two calls to getc would occur in.)
We might also choose to use tags to mark the various ``fields'' within our binary data file; the fields
would be more likely to be byte codes such as 0x00, 0x01, and 0x02 than the character or string codes
we used in the tagged text data file of the previous section.
If you do choose to use binary data files, you must open them for writing with fopen mode "wb" and
for reading with "rb" (or perhaps one of the + modes; the point is that you do need the b). Remember
that, in the default mode, the standard I/O functions all assume text files, and translate between \n and
the operating system's end-of-line representation. If you try to read or write a binary data file in text
mode, whenever your internal data happens to contain a byte which matches the code for \n, or your
external data happens to contain bytes which match the operating system's end-of-line representation,
they may be translated out from under you, screwing up your data.

This chapter goes back and fills in details on several C language features which this series of notes (and
perhaps other introductory works) has downplayed or omitted.
18.1: Types
18.2: More Operators
18.3: More Statements

18.1: Types
18.1: Types
So far, we've seen the basic types char, int, long int, float, and double. This section
introduces the last few basic types: void, short int, long double, and the unsigned types.
Also, we'll meet storage classes, typedef, and the type qualifiers const and volatile.
18.1.1: void
18.1.2: short int
18.1.3: unsigned integers
18.1.4: long double
18.1.5: Storage Classes
18.1.6 Type Definitions (typedef)
18.1.7: Type Qualifiers

http://www.eskimo.com/~scs/cclass/int/sx4a.html [22/07/2003 5:34:05 PM]
18.1.1: <TT>void</TT>
18.1.1: void
From time to time we have seen the keyword void lurking in various programs and code samples.
void is sort of a ``placeholder'' type, used in circumstances where you need a type name but there's not
really any one right type to use. Formally, we can say that void is a type with no values.
There are three main uses of the void type:
1. As the return type of a function which does not return a value. A function declared as
void f()
is declared as ``void'' or ``returning void'' which actually means that it returns no value. The
compiler will not complain if you ``fall off the end'' of a void function without executing a
return statement; the compiler will complain if you execute a return statement that specifies a
value to be returned. (As far as the low-level syntax of the return statement is concerned, the
expression is optional; but the expression is required in functions that return values and
disallowed in void functions.)
2. As the argument list in the prototype of a function that accepts no parameters. In a function
definition such as
int f()
{
...
}
the empty parentheses indicate that the function accepts no parameters. But (for historical
reasons) in an external function prototype declaration such as
extern int f();
the empty parentheses indicate that we don't know how many (or what type of) parameters the
function accepts. In either case, we can make the fact that the function accepts zero parameters
explicit by using the keyword void in the parameter list:
extern int f(void);
int f(void)
{
...
}
http://www.eskimo.com/~scs/cclass/int/sx4aa.html (1 of 2) [22/07/2003 5:34:06 PM]
18.1.1: <TT>void</TT>
For obvious reasons, if void appears in a parameter list, it must be the first and only parameter,
and it must not declare an argument name. (That is, prototypes like int f(int, void) and
int f(void x) are meaningless and illegal.)
3. As a pointer type, to indicate a ``generic pointer'' which might in fact point to any type. We need
``generic pointers'' when we're using functions like malloc. malloc returns a pointer to n bytes
of memory, which we may use as any type of pointer we wish. Normally, it is an error to use a
value of one pointer type where another pointer type is required. For example, the fragments
int i;
double *dp = &i;
/* WRONG */
int *ip = dp;
/* WRONG */
and
would both generate errors, because you can't assign back and forth between int pointers and
double pointers. However, the type void * (``pointer to void'') is special: it is legal to assign
a value of type pointer-to-void to a variable of some other pointer type, and vice versa. (In case
the pointer types have different sizes or representations, the compiler will automatically perform
conversions. We'll say more about type conversions, including pointer type conversions, in a later
section.) So the lines
#include <stdlib.h>
char *cp = malloc(10);
int *ip = malloc(sizeof(int));
double *dp = malloc(sizeof(double));
are all legal, since <stdlib.h> declares malloc as returning void *, indicating that an
assignment of malloc's return value to any pointer type is permissible.

http://www.eskimo.com/~scs/cclass/int/sx4aa.html (2 of 2) [22/07/2003 5:34:06 PM]
18.1.2: <TT>short int</TT>
18.1.2: short int

Another type we haven't met is short int. A short int has the same guarantees as a plain int: it
will hold integers in at least the range +-32,767. The difference between short int and plain int is
that short int might be smaller. Remember, the definitions of both these types (like all C types) is
that they have at least the specified range. On some machines, plain int will hold numbers greater than
32,767. (On 32-bit machines, for example, it's common for plain int to be 32 bits, and to hold +2,147,483,647. Yes, this is all the way up to the minimum range for a long int.) You might use a
short int when you had a lot of them and were worried about saving memory. If you had a large
array of integers all less than 32,768, or a large number of structures with one or more members holding
integers all less than 32,768, you might declare the array or the structure members as short int, to
avoid devoting 4 bytes to each of them on 32-bit machines.

http://www.eskimo.com/~scs/cclass/int/sx4ba.html [22/07/2003 5:34:08 PM]
18.1.3: <TT>unsigned</TT> integers
18.1.3: unsigned integers

For each of the integral types, there is a corresponding unsigned type. Thus, we have unsigned
char, unsigned short int, unsigned int, and unsigned long int. There are two
differences between the unsigned types and the default, signed types:
1. They do not hold negative numbers; their range is from 0 up to some maximum.
2. They have guaranteed properties on overflow. If the range of unsigned int on some machine
is 0-65,535, and if an unsigned int variable contains the value 65,535, then adding 1 to it
will wrap around to 0. Whenever a calculation involving unsigned integers overflows, or tries
to go negative, the result is the remainder that would be obtained if the true (mathematical) result
were divided by the range of the unsigned type. In other words, if UINT_MAX is the largest
value that will fit in an unsigned int (that is, if the range is 0-UINT_MAX), then the results of
calculations that would overflow, such as
65535 + 1
and
5 - 10
are actually
(65535 + 1) % (UINT_MAX+1)
and
(5 - 10) % (UINT_MAX+1)
The guaranteed minimum ranges of the unsigned types are:
unsigned
unsigned
unsigned
unsigned
char
0 - 255
short int
0 - 65535
int
0 - 65535
long int
0 - 4294967295
These multiword type names can also be abbreviated. Instead of writing long int, you can write
long. Instead of writing short int, you can write short. Instead of writing unsigned int, you
can write unsigned. Instead of writing unsigned long int, you can write unsigned long.
Instead of writing unsigned short int, you can write unsigned short.
In the absence of the unsigned keyword, types short int, int, and long int are all signed.
However, depending on the particular compiler you're using, plain type char might be signed or
unsigned. If you need an explicitly signed character type, you can use signed char.
http://www.eskimo.com/~scs/cclass/int/sx4ca.html (1 of 2) [22/07/2003 5:34:09 PM]
18.1.3: <TT>unsigned</TT> integers

http://www.eskimo.com/~scs/cclass/int/sx4ca.html (2 of 2) [22/07/2003 5:34:09 PM]
18.1.4: <TT>long double</TT>
18.1.4: long double

The two common floating-point types in C are float and double. We haven't said much about the
differences between them, because there isn't much to say: double generally gives you more precision
(more digits' worth of significance), and perhaps more range (an ability to use numbers with larger
exponents) than float. Continuing this progression, ANSI C added a third floating-point type, long
double, which may give you even more range or even more precision. If you're using a machine with
an extended-precision floating-point format, long double will let you access that format. But if your
machine has only two floating-point formats, float and double will probably map to those, and
long double won't end up being any better than plain double.
The printf and scanf formats for long double are %Le, %Lf, and %Lg.

http://www.eskimo.com/~scs/cclass/int/sx4da.html [22/07/2003 5:34:11 PM]

A full-blown declaration in C consists of several parts: the storage class, the base type, any type
qualifiers, and a list of declarators, where each declarator consists of: an identifier, additional characters
possibly indicating that it is a pointer, array, or function, and finally an optional initializer. We've met
each of these parts at various points along the way, although we have never explicitly mentioned or
defined the storage class. The storage class is optional; it generally appears at the beginning of the
declaration (before the base type) if it appears at all. At most one storage class may appear in any one
declaration.
We've seen two storage classes so far, extern and static. extern marks a declaration as an
external declaration, indicating that the identifier declared has its defining instance somewhere else
(where ``somewhere else'' might be somewhere else in the same source file, or in a different source file).
static is used in two different ways, (1) to indicate that a global (``file scope'') variable is private to
one source file, and cannot be accessed (even with external declarations) from other source files, or (2) to
indicate that a local variable should have static duration, such that it does not come and go as the function
is called and returns, and so that its value persists between invocations of the function.
Besides these two, there are three other storage classes. register indicates that the programmer
believes that the variable will be heavily used, and that it should be assigned to a high-speed CPU
register (rather than an ordinary memory location) if possible. Explicit register declarations are rare
these days, because modern compilers generally do an excellent job, all by themselves and without any
hints, of deciding which variables belong in machine registers. A limitation of register variables is
that you cannot generate pointers to them using the & operator. (This is because, on most machines,
pointers are implemented as memory addresses, and CPU registers usually do not have memory
addresses.)
The fourth storage class is auto, which indicates that a local variable should have automatic duration.
(Automatic duration, remember, means that variables are automatically allocated when a function is
called and automatically deallocated when it returns.) Since automatic duration is the default for local
variables anyway, auto is virtually never used; it's a relic from C's past.
The fifth storage class, typedef, is described in the next section.

http://www.eskimo.com/~scs/cclass/int/sx4ea.html [22/07/2003 5:34:12 PM]
18.1.6 Type Definitions (<TT>typedef</TT>)
18.1.6 Type Definitions (typedef)

When the storage class is typedef, as in
typedef int count;
a declaration means something completely different than it usually does. Instead of declaring a variable
named count, we are declaring a new type named count. (Actually, we're just declaring a new name
for an old type; you can think of typedef names as type aliases.) Having declared this new type count,
we can now use it as a base type in other declarations, such as
count napples, noranges;
The types FILE and size_t which we've seen at various points along the way (which we've described
as being ``new types defined by the header file <stdio.h>'' [or also several other headers in the case of
size_t]) are typically both defined using typedef.
You can use typedef to define new names for complicated types, too. You could define
typedef char *string;
typedef struct listnode list, *nodeptr;
after which you can declare several strings (char *'s) by saying
string string1, string2;
or several lists (struct listnode) by saying
list list1, list2;
or several pointers to list nodes by saying
nodeptr np1, np2;
typedef provides a way to simplify structure declarations. In a previous section, we saw that we had to
declare new variables of type struct complex using the syntax
struct complex c1, c2;
http://www.eskimo.com/~scs/cclass/int/sx4fa.html (1 of 3) [22/07/2003 5:34:15 PM]
Using typedef, however, we can introduce a single-word complex type, after all:
typedef struct complex complextype;
complextype c1, c2;
It's also possible to define a structure and a typedef tag for it at the same time:
typedef struct complex
{
double real;
double imag;
} complextype;
Furthermore, when using typedef names, you may not need the structure tag at all; you can also write
typedef struct
{
double real;
double imag;
} complextype;
(At this point, of course, you culd use the cleaner name ``complex'' for the typedef, instead of
``complextype''. Actually, it turns out that you could have done this all along. Structure tags and
typedef names share separate namespaces, so the declaration
typedef struct complex
{
double real;
double imag;
} complex;
is legal, though possibly confusing.)
Defining new type names is done mostly for convenience, or to make the code more self-documenting, or
to make it possible to change the actual base type used for a lot of variables without rewriting the
declarations of all those variables.
A typedef declaration is a little bit like a preprocessor #define directive. We could imagine writing
#define count int
#define string char *
in an attempt to accomplish the same thing. This won't work nearly as well, however: given the macro
definition, the line
string string1, string2;
would expand to
char * string1, string2;
which would declare string1 as a char * but string2 as a plain char. The typedef declaration,
however, would work correctly.
Some programmers capitalize typedef names to make them stand out a little better, and others use the
convention of ending all typedef names with the characters ``_t''.


[Type qualifiers are a fairly advanced feature which not all programs need. You may skip this section.]
Any type can be qualified by the type qualifiers const or volatile. Both of these were new with
ANSI C, and there is a lot of older code which does not use them. Even in new code, you will see const
fairly rarely, and volatile even less often.
In simple declarations, the type qualifier is simply another keyword in the type name, along with the
basic type and the storage class. For example,
const int i;
const float f;
extern volatile unsigned long int ul;
are all declarations involving type qualifiers.
A const value is one you promise not to modify. The compiler may therefore be able to make certain
optimizations, such as placing a const-qualified variable in read-only memory. However, a constqualified variable is not a true constant; that is, it does not qualify as a constant expression which C
requires in certain situations, such as array dimensions, case labels (see section 18.3.1 below), and
initializers for variables with static duration (globals and static locals).
A volatile value is one that might change unexpectedly. This situation generally only arises when
you're directly accessing special hardware registers, usually when writing device drivers. The compiler
should not assume that a volatile-qualified variable contains the last value that was written to it, or
that reading it again would yield the same result that reading it last time did. The compiler should
therefore avoid making any optimizations which would suppress seemingly-redundant accesses to a
volatile-qualified variable. Examples of volatile locations would be a clock register (which
always gave an up-to-date time value each time you read it), or a device control/status register, which
caused some peripheral device to perform an action each time the register was written to.
Type qualifiers become more interesting (or at least more complicated or confusing) when they modify
pointer types. The placement of the qualifier in a pointer declaration determines whether it is the pointer
itself, or the location pointed to, that is qualified. The declarations
int const *ci1;
and
const int *ci2;
declare pointers to constant ints, which means that although the pointers can be modified (to point to
http://www.eskimo.com/~scs/cclass/int/sx4ga.html (1 of 3) [22/07/2003 5:34:17 PM]
different locations), the locations pointed to (that is, *ci1 and *ci2) can not be modified. The
declaration
int * const cp;
on the other hand, declares a pointer which cannot be modified (it cannot be set to point anywhere else),
although the value it points to (*cp) can be modified.
Pointers to constants (such as ci1 and ci2 above) have a particularly important use: they can be used to
document (and enforce) pointer parameters which a function promises not to use to modify locations in
the caller.
Normally, C uses pass-by-value. A function receives copies of its arguments, which means that it cannot
modify any variables in the caller (since copies of those variables were passed). If a function receives a
pointer, however (including the pointer that results when the caller seems to ``pass'' an array), it can use
that pointer (more precisely, it can use its copy of the pointer) to modify locations in the caller.
Sometimes, this is just what is desired: when the caller ``passes'' an array which it wishes the function to
fill in, or when the function wants to return one or more values via pointers rather than as the
conventional return value, the function's modification of locations in the caller is deliberate and
understood by the caller. However, when a function receives a pointer argument for some other reason,
under circumstances in which the caller might not want the function to use the pointer to modify
anything in the caller, the caller might appreciate a guarantee that the pointer (within the function) won't
be used to modify anything. To make that guarantee, the function can declare the pointer as pointer-toconst.
For example, our old friend printf never scribbles on the string it's given as its format argument; it
merely uses it to decide what to print. Therefore, the prototype for printf is
int printf(const char *fmt, ...)
where the ... represents printf's optional arguments. If a caller writes something like
char mystring[] = "Hello, world!\n";
printf(mystring);
it knows, from printf's prototype, that printf won't be scribbling on mystring. Furthermore, with
that prototype for printf in scope, the actual author of the printf code couldn't accidentally write a
(buggy) version which inadvertently modified the format argument--since it's declared as const char
*, the compiler will complain if any attempt is made to write to the location(s) it points to.
const and volatile can also be used in combination. Theoretically, it's possible to have a single
variable which is both:
const volatile int x;

Also, both a pointer and what it points to can be qualified:
const char * const cpc;
Finally, as in several other situations, C tends to assume type int, so if you want to save a bit of typing,
you can write
const i;
instead of
const int i;
etc.


The operators we haven't met include the bitwise operators, the cast operators, the comma operator, and
the conditional (or ``ternary'') operator.
18.2.1: Bitwise Operators
18.2.2: Cast Operators
18.2.3: Default Type Promotions and Conversions
18.2.4: The Comma Operator
18.2.5: The Conditional Operator

http://www.eskimo.com/~scs/cclass/int/sx4b.html [22/07/2003 5:34:19 PM]

The bitwise operators operate on numbers (always integers) as if they were sequences of binary bits
(which, of course, internally to the computer they are). These operators will make the most sense,
therefore, if we consider integers as represented in binary, octal, or hexadecimal (bases 2, 8, or 16), not
decimal (base 10). Remember, you can use octal constants in C by prefixing them with an extra 0 (zero),
and you can use hexadecimal constants by prefixing them with 0x (or 0X).
The & operator performs a bitwise AND on two integers. Each bit in the result is 1 only if both
corresponding bits in the two input operands are 1. For example, 0x56 & 0x32 is 0x12, because (in
binary):
0 1 0 1 0 1 1 0
& 0 0 1 1 0 0 1 0
--------------0 0 0 1 0 0 1 0
The | (vertical bar) operator performs a bitwise OR on two integers. Each bit in the result is 1 if either of
the corresponding bits in the two input operands is 1. For example, 0x56 | 0x32 is 0x76, because:
0 1 0 1 0 1 1 0
| 0 0 1 1 0 0 1 0
--------------0 1 1 1 0 1 1 0
The ^ (caret) operator performs a bitwise exclusive-OR on two integers. Each bit in the result is 1 if one,
but not both, of the corresponding bits in the two input operands is 1. For example, 0x56 ^ 0x32 is
0x64:
0 1 0 1 0 1 1 0
^ 0 0 1 1 0 0 1 0
--------------0 1 1 0 0 1 0 0
The ~ (tilde) operator performs a bitwise complement on its single integer operand. (The ~ operator is
therefore a unary operator, like ! and the unary -, &, and * operators.) Complementing a number means
to change all the 0 bits to 1 and all the 1s to 0s. For example, assuming 16-bit integers, ~0x56 is
0xffa9:
http://www.eskimo.com/~scs/cclass/int/sx4ab.html (1 of 6) [22/07/2003 5:34:23 PM]
~ 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0
------------------------------1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1
The << operator shifts its first operand left by a number of bits given by its second operand, filling in
new 0 bits at the right. Similarly, the >> operator shifts its first operand right. If the first operand is
unsigned, >> fills in 0 bits from the left, but if the first operand is signed, >> might fill in 1 bits if the
high-order bit was already 1. (Uncertainty like this is one reason why it's usually a good idea to use all
unsigned operands when working with the bitwise operators.) For example, 0x56 << 2 is 0x258:
0 1 0 1 0 1 1 0 << 2
------------------0 1 0 1 0 1 1 0 0 0
And 0x56 >> 1 is 0x3b:
0 1 0 1 0 1 1 0 >> 1
--------------0 1 0 1 0 1 1
For both of the shift operators, bits that scroll `òff the end'' are discarded; they don't wrap around.
(Therefore, 0x56 >> 3 is 0x0a.)
The bitwise operators will make more sense if we take a look at some of the ways they're typically used.
We can use & to test if a certain bit is 1 or not. For example, 0x56 & 0x40 is 0x40, but 0x32 &
0x40 is 0x00:
0 1 0 1 0 1 1 0
& 0 1 0 0 0 0 0 0
--------------0 1 0 0 0 0 0 0
0 0 1 1 0 0 1 0
& 0 1 0 0 0 0 0 0
--------------0 0 0 0 0 0 0 0
Since any nonzero result is considered ``true'' in C, we can use an expression involving & directly to test
some condition, for example:
if(x & 0x04)
do something ;
(If we didn't like testing against the bitwise result, we could equivalently say if((x & 0x04) !=
0) . The extra parentheses are important, as we'll explain below.)
Notice that the value 0x40 has exactly one 1 bit in its binary representation, which makes it useful for
testing for the presence of a certain bit. Such a value is often called a bit mask. Often, we'll define a series
of bit masks, all targeting different bits, and then treat a single integer value as a set of flags. A ``flag'' is
an on-off, yes-no condition, so we only need one bit to record it, not the 16 or 32 bits (or more) of an
int. Storing a set of flags in a single int does more than just save space, it also makes it convenient to
assign a set of flags all at once from one flag variable to another, using the conventional assignment
operator =. For example, if we made these definitions:
#define
#define
#define
#define
#define
DIRTY
OPEN
VERBOSE
RED
SEASICK
0x01
0x02
0x04
0x08
0x10
we would have set up 5 different bits as keeping track of those 5 different conditions (``dirty,'' `òpen,''
etc.). If we had a variable
unsigned int flags;
which contained a set of these flags, we could write tests like
if(flags & DIRTY)
{ /* code for dirty case */ }
if(!(flags & OPEN))
{ /* code for closed case */ }
if(flags & VERBOSE)
{ /* code for verbose case */ }
else
{ /* code for quiet case */ }
A condition like if(flags & DIRTY) can be read as `ìf the DIRTY bit is on''.
These bitmasks would also be useful for setting the flags. To ``turn on the DIRTY bit,'' we'd say
flags = flags | DIRTY;
/* set DIRTY bit */
How would we ``turn off'' a bit? The way to do it is to leave on every bit but the one we're turning off, if
they were on already. We do this with the & and ~ operators:
flags = flags & ~DIRTY;
/* clear DIRTY bit */
This may be easier to see if we look at it in binary. If the DIRTY, RED, and SEASICK bits were already
on, flags would be 0x19, and we'd have
0 0 0 1 1 0 0 1
& 1 1 1 1 1 1 1 0
--------------0 0 0 1 1 0 0 0
As you can see, both the | operator when turning bits on and the & (plus ~) operator when turning bits
off have no effect if the targeted bit were already on or off, respectively.
The definition of the exclusive-OR operator means that you can use it to toggle a bit, that is, to turn it to
1 if it was 0 and to 0 if it was one:
flags = flags ^ VERBOSE;
/* toggle VERBOSE bit */
It's common to use the `òp='' shorthand forms when doing all of these operations:
flags |= DIRTY;
flags &= ~OPEN;
flags ^= VERBOSE;
/* set DIRTY bit */

/* clear OPEN bit */
/* toggle VERBOSE bit */
We can also use the bitwise operators to extract subsets of bits from the middle of an integer. For
example, to extract the second-to-last hexadecimal ``digit,'' we could use
(i & 0xf0) >> 4
If i was 0x56, we have:
i
& 0x56
0 1 0 1 0 1 1 0
& 1 1 1 1 0 0 0 0
--------------0 1 0 1 0 0 0 0
and shifting this result right by 4 bits gives us 0 1 0 1, or 5, as we wished. Replacing (or overwriting)
a subset of bits is a bit more complicated; we must first use & and ~ to clear all of the destination bits,
then use << and | to `ÒR in'' the new bits. For example, to replace that second-to-last hexadecimal digit
with some new bits, we might use:
(i & ~0xf0) | (newbits << 4)
If i was still 0x56 and newbits was 6, this would give us

i
& ~0xf0
| (newbits << 4)
0 1 0 1 0 1 1 0
& 0 0 0 0 1 1 1 1
--------------0 0 0 0 0 1 1 0
| 0 1 1 0 0 0 0 0
--------------0 1 1 0 0 1 1 0
resulting in 0x66, as desired.

We've been using extra parentheses in several of these bitwise expressions because it turns out that (for
the usual, hoary sort of ``historical reasons'') the precedence of the bitwise &, |, and ^ operators is low,
usually lower than we'd want. (The reason that they're low is that, once upon a time, C didn't have the
logical operators && and ||, and the bitwise operators & and | did double duty.) However, since the
precedence of & and | (and ^) is lower than ==, !=, <<, and >>, expressions like
if(a & 0x04 != 0)
/* WRONG */
i & 0xf0 >> 4
/* WRONG */
and
would not work as desired; these last two would be equivalent to

if(a & (0x04 != 0))
i & (0xf0 >> 4)
and would not do the bit test or subset extraction that we wanted.
[The rest of this section is somewhat advanced and may be skipped.]
Because of the nature of base-2 arithmetic, it turns out that shifting left and shifting right are equivalent
to multiplying and dividing by two. These operations are equivalent for the same reason that tacking
zeroes on to the right of a number in base 10 is the same as multiplying by 10, and deleting digits from
the right is the same as dividing by 10. You can convince yourself that 0x56 << 2 is the same as
0x56 * 4, and that 0x56 >> 1 is the same as 0x56 / 2. It's also the case that masking off all but
the low-order bits is the same as taking a remainder; for example, 0x56 & 0x07 is the same as 0x56
% 8. Some programmers therefore use <<, >>, and & in preference to *, /, and % when powers of two
are involved, on the grounds that the bitwise operators are ``more efficient.'' Usually it isn't worth
worrying about this, though, because most compilers are smart enough to perform these optimizations
anyway (that is, if you write x * 4, the compiler might generate a left shift instruction all by itself),
they're not always as readable, and they're not always correct for negative numbers.
The issue of negative numbers, by the way, explains why the right-shift operator >> is not precisely
defined when the high-order bit of the value being shifted is 1. For signed values, if the high-order bit is a
1, the number is negative. (This is true for 1's complement, 2's complement, and sign-magnitude
representations.) If you were using a right shift to implement division, you'd want a negative number to
stay negative, so on some computers, under some compilers, when you shift a signed value right and the
high-order bit is 1, new 1 bits are shifted in at the left instead of 0s. However, you can't depend on this,
because not all computers and compilers implement right shift this way. In any case, shifting negative
numbers to the right (even if the high-order 1 bit propagates) gives you an incorrect answer if there's a
remainder involved: in 2's complement, 16-bit arithmetic, -15 is 0xfff1, so -15 >> 1 might give you
0xfff8shifted which is -8. But integer division is supposed to discard the remainder, so -15 / 2
would have given you -7. (If you're having trouble seeing the way the shift worked, 0xfff1 is
1111111111110001<sub>2</sub> and 0xfff8 is 1111111111111000<sub>2</sub>. The loworder 1 bit got shifted off to the right, but because the high-order bit was 1, a 1 got shifted in at the left.)


[This section corresponds to the second half of K&R Sec. 2.7]
Most of the time, C performs conversions between related types automatically. (See section 18.2.3 for
the complete story.) When you assign an int value to a float variable or vice versa, or perform
calculations involving mixtures of arithmetic types, the types are converted automatically, as necessary.
C even performs some pointer conversions automatically: malloc returns type void * (pointer-tovoid), but a void * is automatically converted to whatever pointer type you assign (say) malloc's
return value to.
Occasionally, you need to request a type conversion explicitly. Consider the code
int i = 1, j = 2;
float f;
f = i / j;
Recall that the division operator / results in an integer division, discarding the remainder, when both
operands are integral. It performs a floating-point division, yielding a possibly fractional result, when one
or both operands have floating-point types. What happens here? Both operands are int, but the result of
the division is assigned to a float, which would be able to hold a fractional result. Is the compiler
smart enough to notice, and perform a floating-point division? No, it is not. The rule is, `ìf both
operands are integral, division is integer division and discards any remainder'', and this is the rule the
compiler follows. In this case, then, we must manually and explicitly force one of the operands to be of
floating-point type.
Explicit type conversions are requested in C using a cast operator. (The name of the operator comes
from the term typecast; ``typecasting'' is another term for explicit type conversion, and some languages
have ``typecast operators.'' Yet another term for type conversion is coercion.) A cast operator consists of
a type name, in parentheses. One way to fix the example above would be to rewrite it as
f = (float)i / j;
The construction (float)i involves a cast; it says, ``take i's value, and convert it to a float.'' (The
only thing being converted is the value fetched from i; we're not changing i's type or anything.) Now,
one operand of the / operator is floating-point, so we perform a floating-point division, and f receives
the value 0.5.
Equivalently, we could write
f = i / (float)j;
http://www.eskimo.com/~scs/cclass/int/sx4bb.html (1 of 3) [22/07/2003 5:34:26 PM]
or
f = (float)i / (float)j;
It's sufficient to use a cast on one of the operands, but it certainly doesn't hurt to cast both.
A similar situation is
int i = 32000, j = 32000;
long int li;
li = i + j;
An int is only guaranteed to hold values up to 32,767. Here, the result i + j is 64,000, which is not
guaranteed to fit into an int. Even though the eventual destination is a long int, the compiler does not
look ahead to see this. The addition is performed using int arithmetic, and it may overflow. Again, the
solution is to use a cast to explicitly convert one of the operands to a long int:
li = (long int)i + j;
Now, since one of the operands is a long int, the addition is performed using long int arithmetic,
and does not overflow.
Cast operators do not have to involve simple types; they can also involve pointer or structure or more
complicated types. Once upon a time, before the void * type had been invented, malloc returned a
char *, which had to be converted to the type you were using. For example, one used to write things
like
int *iarray = (int *)malloc(100 * sizeof(int));
and
struct list *lp = (struct list *)malloc(sizeof(struct list));
These casts are not necessary under an ANSI C compiler (because malloc returns void * which the
compiler converts automatically), but you may still see them in older code.


[This section corresponds to the first half of K&R Sec. 2.7]
In many cases, C performs type conversions automatically when values of differing types participate in
expressions. For most programming, you don't have to memorize these rules exactly, but it's good idea to
have a general understanding of how they work, so that you won't be surprised by any of the default
conversions, and so that you'll know to use explicit conversions (as described in the previous section) in
those few cases where C would not perform a needed conversion automatically.
The default conversion rules serve two purposes. One is purely selfish on the compiler's part: it does not
want to have to know how to generate code to add, say, a floating-point number to an integer. The
compiler would much prefer if all operations operated on two values of the same type: two integers, two
floating-point numbers, etc. (Indeed, few processors have an instruction for adding a floating-point
number to an integer; most have instructions for adding two integers, or two floating-point numbers.)
The other purpose for the default conversions is the programmer's convenience: the mentality that ``the
computer and the compiler are stupid, we programmers must specify everything in excruciating detail''
can be carried too far, and it's reasonable to define the language such that certain conversions are
performed implicitly and automatically by the compiler, when it's unambiguous and safe to do so.
The rules, then (which you can also find on page 44 of K&R2, or in section 6.2.1 of the newer ANSI/ISO
C Standard) are approximately as follows:
1. First, in most circumstances, values of type char and short int are converted to int right
off the bat.
2. If an operation involves two operands, and one of them is of type long double, the other one
is converted to long double.
3. If an operation involves two operands, and one of them is of type double, the other one is
converted to double.
4. If an operation involves two operands, and one of them is of type float, the other one is
converted to float.
5. If an operation involves two operands, and one of them is of type long int, the other one is
converted to long int.
6. If an operation involves both signed and unsigned integers, the situation is a bit more complicated.
If the unsigned operand is smaller (perhaps we're operating on unsigned int and long
int), such that the larger, signed type could represent all values of the smaller, unsigned type,
then the unsigned value is converted to the larger, signed type, and the result has the larger, signed
type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both
values are converted to a common unsigned type, and the result has that unsigned type.
7. Finally, when a value is assigned to a variable using the assignment operator, it is automatically
converted to the type of the variable if (a) both the value and the variable have arithmetic type
(that is, integer or floating point), or (b) both the value and the variable are pointers, and one or
http://www.eskimo.com/~scs/cclass/int/sx4cb.html (1 of 2) [22/07/2003 5:34:28 PM]
the other of them is of type void *.

(This is not a precise statement of these rules. If you need to understand a complicated type conversion
situation perfectly, you may have to consult a more definitive reference. In particular, the first five of
these rules are usually described as being applied in order, in the order 2, 3, 4, 1, 5. Rule 6 is especially
complicated, and although it is intended to prevent surprises, it still manages to introduce some.)

http://www.eskimo.com/~scs/cclass/int/sx4cb.html (2 of 2) [22/07/2003 5:34:28 PM]

Once in a while, you find yourself in a situation in which C expects a single expression, but you have two
things you want to say. The most common (and in fact the only common) example is in a for loop,
specifically the first and third controlling expressions. What if (for example) you want to have a loop in
which i counts up from 0 to 10 at the same time that j is counting down from 10 to 0? You could
manipulate i in the loop header and j ``by hand'':
j = 10;
for(i = 0; i < 10; i++)
{
... rest of loop ...
j--;
}
but here it's harder to see the parallel nature of i and j, and it also turns out that this won't work right if
the loop contains a continue statement. (A continue would jump back to the top of the loop, and i
would be incremented but j would not be decremented.) You could compute j in terms of i:
for(i = 0; i < 10; i++)
{
j = 10 - i;
}
but this also makes j needlessly subservient. The usual way to write this loop in C would be
for(i = 0, j = 10; i < 10; i++, j--)
{
}
Here, the first (initialization) expression is
i = 0, j = 10
The comma is the comma operator, which simply evaluates the first subexpression i = 0, then the
second j = 10. The third controlling expression,
i++, j--
http://www.eskimo.com/~scs/cclass/int/sx4db.html (1 of 2) [22/07/2003 5:34:31 PM]
also contains a comma operator, and again, performs first i++ and then j--.
Precisely stated, the meaning of the comma operator in the general expression
e1 , e2
is `èvaluate the subexpression e1, then evaluate e2; the value of the expression is the value of e2.''
Therefore, e1 had better involve an assignment or an increment ++ or decrement -- or function call or
some other kind of side effect, because otherwise it would calculate a value which would be discarded.
There's hardly any reason to use a comma operator anywhere other than in the first and third controlling
expressions of a for loop, and in fact most of the commas you see in C programs are not comma
operators. In particular, the commas between the arguments in a function call are not comma operators;
they are just punctuation which separate several argument expressions. It's pretty easy to see that they
cannot be comma operators, otherwise in a call like
printf("Hello, %s!\n", "world");
the action would be `èvaluate the string "Hello, %s!\n", discard it, and pass only the string
"world" to printf.'' This is of course not what we want; we expect both strings to be passed to
printf as two separate arguments (which is, of course, what happens).

http://www.eskimo.com/~scs/cclass/int/sx4db.html (2 of 2) [22/07/2003 5:34:31 PM]

C has one last operator which we haven't seen yet. It's called the conditional or ``ternary'' or ?: operator,
and in action it looks something like this:
average = (n > 0) ? sum / n : 0
The syntax of the conditional operator is
e1 ? e2 : e3
and what happens is that e1 is evaluated, and if it's true then e2 is evaluated and becomes the result of the
expression, otherwise e3 is evaluated and becomes the result of the expression. In other words, the
conditional expression is sort of an if/else statement buried inside of an expression. The above
computation of average could be written out in a longer form using an if statement:
if(n > 0)
average = sum / n;
else
average = 0;
The conditional operator, however, forms an expression and can therefore be used wherever an
expression can be used. This makes it more convenient to use when an if statement would otherwise
cause other sections of code to be needlessly repeated. For example, suppose we were trying to write a
complicated function call
func(a, b + 1, c + d, xx, (g + h + i) / 2);
where xx was supposed to be f if e was true and 0 if it was not. Using an if statement, we'd have to
write:
if(e)
else
func(a, b + 1, c + d, f, (g + h + i) / 2);
func(a, b + 1, c + d, 0, (g + h + i) / 2);
We could write this more compactly, more readably, and more safely (it's easier both to see and to
guarantee that the other arguments are always the same) by writing
func(a, b + 1, c + d, e ? f : 0, (g + h + i) / 2);
http://www.eskimo.com/~scs/cclass/int/sx4eb.html (1 of 2) [22/07/2003 5:34:33 PM]
(The obscure name ``ternary,'' by the way, comes from the fact that the conditional operator is neither
unary nor binary; it takes three operands.)

http://www.eskimo.com/~scs/cclass/int/sx4eb.html (2 of 2) [22/07/2003 5:34:33 PM]

We'll round out this section by looking at three more statements: switch, do/while, and goto.
18.3.1: switch
18.3.2: do/while
18.3.3: goto

http://www.eskimo.com/~scs/cclass/int/sx4c.html [22/07/2003 5:34:35 PM]
18.3.1: <TT>switch</TT>
18.3.1: switch
A frequent sort of pattern is exemplified by the sequence
if(x == e1)
/* some code */
else if(x == e2)
/* other code */
else if(x == e3)
/* some more code */
else if(x == e4)
/* yet more code */
else
/* default code */
Depending on the value of x, we have one of several chunks of code to execute, which we select with a long
if/else/if/else... chain. When the value we're selecting on is an integer, and when the values we're selecting
among are all constant, we can use a switch statement, instead. The switch statement evaluates an expression
and matches the result against a series of ``case labels''. The code beginning with the matching case label (if
any) is executed. A switch statement can also have a default case which is executed if none of the explicit
cases match.
A switch statement looks like this:
switch( expr )
{
case c1 :
... code
break;
case c2 :
... code
break;
case c3 :
... code
break;
...
default:
... code
break;
}
...
...
...
...
The expression expr is evaluated. If one of the case labels (c1, c2, c3, etc., which must all be integral constants)
matches, execution jumps to there, and continues until the next break statement. Otherwise, if there is a
default label, execution jumps to there (and continues to the next break statement). Otherwise, none of the
http://www.eskimo.com/~scs/cclass/int/sx4ac.html (1 of 4) [22/07/2003 5:34:37 PM]
code in the switch statement is executed. (Yes, the break statement is also used to break out of loops. It breaks
out of the nearest enclosing loop or switch statement it finds itself in.)
The switch statement only works on integral arguments and expressions (char, the various sizes of int, and
enums, though we haven't met enums yet). There is no direct way to switch on strings, or on floating-point values.
The target case labels must be specified explicitly; there is no general way to specify a case which corresponds to
a range of values.
One peculiarity of the switch statement is that the break at the end of one case's block of code is optional. If
you leave it out, control will ``fall through'' from one case to the next. Occasionally, this is what you want, but
usually not, so remember to put a break statement after most cases. (Since falling through is so rare, many
programmers highlight it, when they do mean to use it, with a comment like /* FALL THROUGH */, to indicate
that it's not a mistake.) One way to make use of ``fallthrough'' is when you have a small set or range of cases which
should all map to the same code. Since the case labels are just labels, and since there doesn't have to be a
statement immediately following a case label, you can associate several case labels with one block of code:
switch(x)
{
case 1:
... code ...
break;
case 2:
... code ...
break;
case 3:
case 4:
case 5:
... code ...
break;
default:
... code ...
break;
}
Here, the same chunk of code is executed when x is 3, 4, or 5.
The case labels do not have to be in any particular order; the compiler is smart enough to find the matching one if
it's there. The default case doesn't have to go at the end, either.
It's common to switch on characters:
switch(c)
{
case '+':
/* code for + */
break;
case '-':
/* code for - */
break;
case '\n':
/* code for newline */
/* FALL THROUGH */
case ' ':
case '\t':
/* code for other whitespace */
break;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
/* code for digits */
break;
default:
/* code for all other characters */
break;
}
It's also common to have a set of #defined values, and to switch on those:
#define
#define
#define
#define
APPLE
ORANGE
CHERRY
BROCCOLI
1
2
3
4
...
switch(fruit)
{
case APPLE:
printf("turnover"); break;
case ORANGE:
printf("marmalade"); break;
case CHERRY:
printf("pie"); break;
case BROCCOLI:
printf("wait a minute... that's not a fruit"); break;
}
Read sequentially:
prev
next
up
top
This page by Steve Summit

// Copyright 1996-1999
// mail feedback
18.3.2: <TT>do</TT>/<TT>while</TT>
18.3.2: do/while
Briefly stated, a do/while loop is like a while loop, except that the body of the loop is always
executed at least once, even if the condition is initially false. We'll motivate the usefulness of this loop
with a slightly long example.
We know that the digit character '1' is not the same as the int value 1, and that the string "123" is
not the same as the int value 123. We've learned that the atoi function will convert a string
(containing digits) to the corresponding integer, and that we can use the sprintf function to generate a
string of digits corresponding to an integer. Now let's see how we could convert an integer to a string of
digits by hand, if for some reason we couldn't use sprintf but had to do it ourselves.
If the number were less than 10 and not negative, it would be easy. Since we know that the digit
characters '0' to '9' have consecutive character set values, the expression i + '0' gives the
character corresponding to i's value if i is an integer between 0 and 9, inclusive. So our very first stab at
an integer-to-string routine, which would only work for one-digit numbers, might look like this:
char string[2];
string[0] = i + '0';
string[1] = '\0';
(Remember, the null character \0 is required to terminate strings in C.)
The limitation to single-digit numbers is obviously not acceptable. Suppose we went a little further, and
arranged to handle numbers less than 100, by using an if statement to choose between the 1-digit case
and the 2-digit case:
char string[3];
if(i < 10)
{
string[0]
string[1]
}
else
{
string[0]
string[1]
string[2]
}
= i + '0';
= '\0';
= (i / 10) + '0';
= (i % 10) + '0';
= '\0';
In the two-digit case, the subexpression i % 10 gives us the value of the low-order (1's) digit of the
http://www.eskimo.com/~scs/cclass/int/sx4bc.html (1 of 3) [22/07/2003 5:34:38 PM]
result, and i / 10 gives us the high-order (10's) digit.

We've still got a pretty limited piece of code, and if we kept extending it in this way, with explicit if
statements depending on how many digits the number could have, we'd duplicate a lot of code and end
up with quite a mess, and we wouldn't necessarily know how many cases we'll need (at least 5, because
type int is guaranteed to hold integers up to at least 32,767, but on some systems it can hold more). The
right solution to this problem, therefore, involves a loop.
One way of thinking about if statements and while loops is that an if statement allows you to select a
chunk of code which, if required, will complete some step towards the accomplishment of an overall
task, while a while loop selects a chunk of code that will whittle away at some task or subtask, but
without necessarily completing it on the first go, such that several trips through the loop might be
required. Since the operation i % 10 does give us one digit of our answer, but since we may end up
having many digits, our next attempt is to wrap the i % 10 and i / 10 code up in a while loop:
char string[25];
int j = 24;
string[j] = '\0';
while(i > 0)
{
string[--j] = (i % 10) + '0';
i /= 10;
}
Here we use an auxiliary variable j to keep track of which element of the string array we're filling in.
We fill in the array from the end back towards the beginning, because successive remainders when
dividing i by 10 give us digits in the reverse order (the reverse, that is, of the order we'd write the digits
left-to-right). In this clde, j holds the index of the element we've just filled in, so we use the
predecrement form --j to decrement j before filling in the next digit. When we're done, string[j]
will be the first (leftmost) digit of our result. (For the string array as declared, i had better have fewer
than 25 digits, but this is a safe assumption even for 64-bit machines.)
The third try just above, using a while loop, will work just fine except in the case when i == 0. If i
is 0, the controlling expression i > 0 of the while loop will immediately be false, and no trips
through the loop will be taken. This means that the integer 0 will be converted to the empty string "",
not the string "0". In this case, we would like to take one trip through the loop (to generate the digit 0)
even though the condition is initially false.
For loops like these, C has the do/while loop, which tests the condition at the bottom of the loop, after
making the first trip through without checking. The syntax of the do/while loop is
do statement
while( expression );
The statement is almost always a brace-enclosed block of statements, because a do/while loop without
braces looks odd even if there's only one statement in the body. Notice that there is a semicolon after the
close parenthesis after the controlling expression.
Using a do/while loop, we can write our final version of the integer-to-string converter:
char string[25];
int j = 24;
string[j] = '\0';
do
{
string[--j] = (i % 10) + '0';
i /= 10;
} while(i > 0);
This version is now almost perfect; its only deficiency is that it has no provision for negative numbers.
C's do/while loop is analogous to the repeat/until loop in Pascal. (C's while loop, on the other
hand, is like Pascal's while/do loop.)

18.3.3: <TT>goto</TT>
18.3.3: goto
Finally, C has a goto statement, and labels to go to. You will hear many people disparage goto
statements, and without taking a stand on whether they are inherently evil, I will say that most code can
be written, and written pretty cleanly, without them.
You can attach a label to any statement:
statement1;
label1: statement2;
statement3;
label2: statement4;
statement5;
A label is simply an identifier followed by a colon. The names for labels follow the same rules as for
variables and other identifiers (they consist of letters, digit characters, and underscores, generally
beginning with a letter, in any case not beginning with a digit). A label can in principle have the same
name as a variable; variables and labels are quite distinct, so the compiler can keep them separate. Label
names must obviously be unique within a function.
Anywhere you want, you can say
goto label ;
where label is one of the labels in the current function, and control flow will jump to that point. You can
only jump around within the same function; you can't go to a label in some other function. (If goto isn't
enough for you, if for some reason you must jump back to another function, there's a library function,
longjmp, which will do so under certain circumstances.) If you jump into a brace-enclosed block of
statements, and if that block has local, automatic variables which have initializers (that is, attached to
their declarations), the initializers will not take effect. (In other words, initializers for block-local
variables take effect only when the block is entered normally, from the top.)
For the vast majority of functions, the more ``structured'' if/else, switch, and loop statements,
perhaps augmented with break and continue statements, will accomplish the required control flow,
in a clean and obvious way. (Without practice, it may not always be immediately obvious how to
structure a piece of code as, say, a clean loop, but the more important point is that once it's structured as a
clean loop, it will be more obvious to a later reader what it's doing.) Remember, too, that function calls
and return statements also accomplish control flow (analogous to the GOSUB in BASIC) in a clean
and structured way. The complaint about goto is that when it's used without restraint, a tangled mess of
unwieldy ``spaghetti code'' often results. There are really only two times you ever need to use a goto
http://www.eskimo.com/~scs/cclass/int/sx4cc.html (1 of 2) [22/07/2003 5:34:40 PM]
18.3.3: <TT>goto</TT>
statement in real programs:

1. To break out of several loops at once. (When you have nested loops, the break statement only
breaks you out of the innermost loop it's in.)
2. To jump to the end of a function, to perform some cleanup code or something, but bypassing the
rest of the function. Usually, such a jump-to-the-end happens as a result of some error condition
which the function has detected.
It's customary for an author to claim that, although goto exists, `ìt has not been used in this book'', or
that the author can count on the fingers of one hand the number of times he's ever used goto in C, and in
fact I think I can make these claims myself.

http://www.eskimo.com/~scs/cclass/int/sx4cc.html (2 of 2) [22/07/2003 5:34:40 PM]

Arrays are ``second-class citizens'' in C. Related to the fact that arrays can't be assigned is the fact that
they can't be returned by functions, either; that is, there is no such type as ``function returning array of
...''. In this chapter we'll study three workarounds, three ways to implement a function which attempts to
return a string (that is, an array of char) or an array of some other type.
In the last chapter, we looked at some code for converting an integer into a string of digits representing
its value. This operation is the inverse of the function performed by the standard function atoi. Suppose
we wanted to wrap our digit-generating code up in a function and call it itoa. How would it return the
generated string of digits? We'll use this example to demonstrate all three techniques. For simplicity,
though, we won't repeat the do/while loop in each example function; instead, we'll simply call
sprintf. (In fact, since calling sprintf is so easy, most C programs call it directly when they need
to convert integers to strings, and consequently there is no standard itoa function.)
First, let's look at the way that won't work, so that we can set it aside and make sure we never use it.
What if we wrote itoa like this?
char *itoa(int n)
{
char retbuf[25];
return retbuf;
}
This looks superficially reasonable, and it might well be what we'd write at first if we weren't being
careful. (It might even seem to work, at first.) However, it has a serious, fatal flaw: let's think about that
local array, retbuf. Since it's a regular local variable, it has automatic duration, which means that it
springs into existence when the function is called and disappears when the function returns. Therefore,
the pointer that this version of itoa returns is to an array which no longer exists by the time the caller
receives the pointer. (Remember that the statement return retbuf; returns a pointer to the first
character in retbuf; by the `èquivalence of arrays and pointers,'' the mention of the array retbuf in
this context is equivalent to &retbuf[0].) When the caller tries to use the pointer, the string created by
itoa might still be there, or the memory might have been re-used by some other function. Therefore,
this first version of itoa is not adequate and not acceptable. Functions must never return pointers to
local, automatic-duration arrays.
Since the problem with returning a pointer to a local array is that the array has automatic duration by
default, the simplest fix to the above non-functional version of itoa, and the first of our three working
methods of returning arrays from functions, is to declare the array static, instead:
char *itoa(int n)
{
static char retbuf[25];
return retbuf;
}
Now, the retbuf array does not disappear when itoa returns, so the pointer is still valid by the time
the caller uses it.
Returning a pointer to a static array is a practical and popular solution to the problem of ``returning''
an array, but it has one drawback. Each time you call the function, it re-uses the same array and returns
the same pointer. Therefore, when you call the function a second time, whatever information it
``returned'' to you last time will be overwritten. (More precisely, the information, that the function
returned a pointer to, will be overwritten.) For example, suppose we had occasion to save the pointer
returned by itoa for a little while, with the intention of using it later, after calling itoa again in the
meantime:
int i = 23;
char *p1, *p2;
p1 = itoa(i);
i = i + 10;
char *p2 = itoa(i);
printf("old i = %s, new i = %s\n", p1, p2);
But this won't work as we expect--the second call to itoa will overwrite the string (stored in itoa's
static retbuf array) which was stored by the first call. Instead of printing i's old and new value, the
last line will print the new value, twice. Both p1 and p2 will point to the same place, to the retbuf
array down inside itoa, because each call to itoa always returns the same pointer to that same array.
We can see the same problem in an even simpler example. Suppose we had never heard of the %d format
specifier in printf. We might try to call something like this:
printf("i = %s, j = %s\n", itoa(i), itoa(j));
where i and j are two different int variables. What will happen? Either the compiler will make the first
call to itoa first, or the second. (It turns out that it's not specified which order the compiler will use;
different compilers behave differently in this respect.) Whichever call to itoa happens second will be
the one that gets to keep its return value in retbuf. The printf call will either print i's value twice,
or j's value twice, but it won't be able to print two distinct values.
The moral is that although the static return array technique will work, the caller has to be a little bit
careful, and must never expect the return pointer from one call to the function to be usable after a later
call to the function. Sometimes this restriction is a real problem; other times it's perfectly acceptable.
(Some of the functions in the standard C library use this technique; one example is ctime, which
converts timestamp values to printable strings. When you see a cryptic sentence like ``The returned
pointer is to static data which is overwritten with each call'' in the documentation for a library function, it
means that the function is using this technique.) When this restriction would be too onerous on the caller,
we should use one of the other two techniques, described next.
If the function can't use a local or local static array to hold the return value, the next option is to have
the caller allocate an array, and use that. In this case, the function accepts at least one additional
argument (in addition to any data to be operated on): a pointer to the location to write the result back to.
Our familiar getline function has worked this way all along. If we rewrote itoa along these lines, it
might look like this:
char *itoa(int n, char buf[])
{
sprintf(buf, "%d", n);
return buf;
}
Now the caller must pass an int value to be converted and an array to hold the converted result:
int i = 23;
char buf[25];
char *str = itoa(i, buf);
There are two differences between this version of itoa and our old getline function. (Well, three,
really; of course the two functions do totally different things.) One difference is that getline accepted
another extra argument which was the size of the array in the caller, so that getline could promise not
to overflow that array. Our latest version of itoa does not accept such an argument, which is a
deficiency. If the caller ever passes an array which is too small to hold all the digits of the converted
integer, itoa (actually, sprintf) will sail off the end of the array and scribble on some other part of
memory. (Needless to say, this can be a disaster.)
Another difference is that the return value of this latest version of itoa isn't terribly useful. The pointer
which this version of itoa returns is always the same as the pointer you handed it. Even if this version
of itoa didn't return anything as its formal return value, you could still get your hands on the string it
created, since it would be sitting right there in your own array (the one that you passed to itoa). In the
case of getline, we had a second thing to return as the formal return value, namely the length of the
line we'd just read.
However, this second strategy is also popular and workable. Besides our own getline function, the
standard library functions fgets and fread both use this technique.
When the limit of a single static return array within the function would be unacceptable, and when it
would be a nuisance for the caller to have to declare or otherwise allocate return arrays, a third option is
for the function to dynamically allocate some memory for the returned array by calling malloc. Here is
our last version of itoa, demonstrating this technique:
char *itoa(int n)
{
char *retbuf = malloc(25);
if(retbuf == NULL)
return NULL;
return retbuf;
}
Now the caller can go back to saying simple things like
char *p = itoa(i);
and it no longer has to worry about the possibility that a later call to itoa will overwrite the results of
the first. However, the caller now has two new things to worry about:
1. This version of itoa returns a null pointer if malloc fails to return the memory that itoa
needs. The caller should really be checking for this null pointer return each time it calls itoa,
before using the pointer.
2. If the caller calls itoa 10,000 times, we'll have allocated 25 * 10,000 = 250,000 bytes of
memory, or a quarter of a meg. Unless someone is careful to call free to deallocate all of that
memory, it will be wasted. Few programs can afford to waste that much memory. (Once upon a
time, few programs could get that much memory, period.) The ``someone'' who is going to have
to call free isn't itoa; it has no idea when the caller is done with the memory returned by a
previous call to itoa, and in fact itoa might never get called again. So it will be the caller's
responsibility to keep track of each pointer returned by itoa, and to free it when it's no longer
needed, or else memory will gradually leak away.
We can work around the first problem--if we expect that there will usually be enough memory, such that
the call to malloc will rarely if ever fail, and if all the caller would do in an out-of-memory situation is
print an error message and abort, we can move the test down into the function:
char *retbuf = malloc(25);
if(retbuf == NULL)
{

exit(EXIT_FAILURE);
}
Now the function never returns a null pointer, so the caller doesn't have to check. (When malloc fails,
the function doesn't return at all.)
In summary, we've seen three ways of ``returning'' arrays from functions, none of which is perfect. The
static array technique is usually convenient for the caller, but only for functions which it's unlikely
that the caller will be trying to call multiple times and retain multiple return values. (The static array
technique is also definitely imperfect in that it violates the notion that calling code shouldn't need to
know about the inner, implementation details of a called function.) The caller-passes-an-array technique
is useful when the caller might have a number of calls to the function active, but when that number is
small and fixed, so that the caller can easily declare and keep track of a number of return arrays (if
necessary). Finally, when there might be an arbitrary number of calls to the function, or when maximum
flexibility is otherwise needed, the function-calls-malloc technique is appropriate, but with its extra
flexibility comes some costs, the most important of which is that the caller must remember to free the
returned pointers.

Chapter 20: More About the Preprocessor
Chapter 20: More About the

Preprocessor
In this chapter we'll learn about two mildly advanced uses of the preprocessor: ``function-like''
preprocessor macros (also called ``macros with arguments'') and ``nested header files'' (also known as
``nested #include files'').
20.1: Function-Like Preprocessor Macros
20.2 Nested Header Files


So far, we've been defining simple preprocessor macros with simple values, such as
#define MAXLINE 200
and
#define DATAFILE "data.dat"
These macros always expand to constant text (in these examples, the integer constant 200 and the string literal
"data.dat", respectively) wherever they're used. However, it's also possible to define macros which expand to
text which is different each time, depending on some subsidiary text which you specify. These macros take
arguments, in much the same way that functions take arguments. In either case, the outcome (the expansion of the
macro, like the action of the function) depends in some way on the particular values passed to it as arguments. The
basic syntax of a function-like macro definition is
#define macroname( args ) expansion
There must be no space between macroname and the open parenthesis.
We will illustrate the use of function-like macros with several examples.
In a previous chapter, we used the ``bitwise'' operators &, |, and ~ to manipulate individual bits within an integer
value or ``flags word.'' In one application, we defined several simple macros whose values were ``bitmasks'':
#define DIRTY
0x01
#define OPEN
0x02
#define VERBOSE 0x04
Then we used code like
flags |= DIRTY;
to ``set the DIRTY bit,'' and code like
flags &= ~DIRTY;
to clear the DIRTY bit, and code like
if(flags & DIRTY)
to test it. With enough practice, these idioms become familiar enough that they can be read immediately, but
suppose we wanted to make them less cryptic. Using the preprocessor, we'll be able to set up macros so that we can
write
SETBIT(flags, DIRTY);
and
CLEARBIT(flags, DIRTY);
and
if(TESTBIT(flags, DIRTY))
The definition of the SETBIT() macro might look like this:
#define SETBIT(x, b) x |= b
When a function-like macro is expanded, the preprocessor keeps track of the `àrguments'' it was ``called'' with.
When we write
we're invoking the SETBIT() macro with a first argument of flags and a second argument of DIRTY. Within
the definition of the macro, those arguments were known as x and b. So in the replacement text of the macro, x
|= b, everywhere that x appears it will be further replaced by (in this case) flags, and everywhere that b
appears it will be replaced by DIRTY. So the invocation
will result in the expansion
flags |= DIRTY;
Notice that the semicolon had nothing to do with macro expansion; it appeared following the close parenthesis of
the invocation, and so shows up following the final expansion.
Similarly, we can define the CLEARBIT() and TESTBIT() macros like this:
#define CLEARBIT(x, b) x &= ~b
#define TESTBIT(x, b) x & b
Convince yourself that the invocations
CLEARBIT(flags, DIRTY);
and
if(TESTBIT(flags, DIRTY))
will result in the expansions
flags &= ~DIRTY;
and
if(flags & DIRTY)
as desired.
Just as for a regular function, parameter names such as x and b in a function-like macro definition are arbitrary;
they're just used to indicate where in the replacement text the actual argument ``values'' should be plugged in. Also,
those parameter names are not looked for within character or string constants. If you had a macro like
#define XX(a, b) printf("%s is a %s\n", a, b)
then the invocation
XX("John", "pumpkin-head");
would result in
#define XX(a, b) printf("%s is a %s\n", "John", "pumpkin-head");
It would not result in
#define XX(a, b) printf("%s is "John" %s\n", "John", "pumpkin-head");
which (in this case, anyway) would not have been at all what you wanted.
If we remember that (other than being careful not to expand macro arguments inside of string and character
constants) the preprocessor is otherwise pretty dumb and literal-minded, we can see why there must not be a space
between the macro name and the open parenthesis in a function-like macro definition. If we wrote
#define SETBIT (x, b) x |= b
the preprocessor would think we were defining a simple macro, named SETBIT, with the (rather meaningless)
replacement text (x, b) x |= b , and every time it saw SETBIT, it would replace it with (x, b) x |= b .
(It would ignore any parentheses and arguments that the invocation of SETBIT happened to be followed with; that
is, after the incorrect definition, the invocation
would expand to
(x, b) x |= b(flags, DIRTY);
where the (flags, DIRTY) part passed through without modification, along with the trailing semicolon.)
There are a few potential pitfalls associated with preprocessor macros, and with function-like ones in particular. To
illustrate these, let's look at another example. C has no built-in exponentiation operator; if you want to square
something, the easiest way is usually to multiply it by itself. Suppose that you got tired of writing
x * x
and
a * a + b * b
and
(x + 1) * (x + 1)
Knowing about function-like preprocessor macros, you might be inspired to define a SQUARE() macro:
#define SQUARE(z) z * z
Now you can write things like SQUARE(x) and SQUARE(a) + SQUARE(b), and this seems like it will be
workable and convenient. But wait: what about that third example? If you write
y = SQUARE(x + 1);
the simpleminded preprocessor will expand it to
y = x + 1 * x + 1;
Remember, the preprocessor doesn't evaluate arguments the same way a function call would, it just performs
textual substitutions. So in this last example, the ``value'' of the macro parameter z is x + 1, and everywhere that
a z had appeared in the replacement text, the preprocessor fills in x + 1. But when the rest of the compiler sees
the result, it will give multiplication higher precedence, as usual, and it will interpret the result as if you had
written
y = x + (1 * x) + 1;
which will not usually give you the result you wanted!
How can we fix this problem? We could forbid ourselves to ever ``call'' the SQUARE() macro on an argument that
wasn't a single constant or variable name, but this seems like a harsh restriction. A better solution is to play with
the definition of the macro itself: since the expansion we want is
(x + 1) * (x + 1)
we can achieve that by defining the macro like this:
#define SQUARE(z) (z) * (z)
Now
y = SQUARE(x + 1);
expands to
y = (x + 1) * (x + 1);
as we wished.
There's another problem, though: what if we write
q = 1 / SQUARE(r);
Now we get
q = 1 / (r) * (r)
and the rest of the compiler interprets this as
q = (1 / (r)) * (r)
(Multiplication and division have the same precedence, and by default they go from left to right.) What can we do
this time? We could enclose the invocation of the SQUARE() macro in extra parentheses, like this:
q = 1 / (SQUARE(r));
but that seems like a real nuisance to remember. A better solution is to build those extra parentheses into the
definition of the macro, too:
#define SQUARE(z) ((z) * (z))
Now the code 1 / SQUARE(r) expands to 1 / ((r) * (r)) and we have a macro that's safe against all
of the troublesome invocations we've tried so far.
There's a third potential problem, though: suppose we write

y = SQUARE(x++);
Even with all of our parentheses, this expands to
y = ((x++) * (x++));
and this is a distinct no-no, because we're incrementing x twice within the same expression. We might end up with
y containing the value x * x, as we wanted, but it's somewhat more likely that we'll end up with (x + 1) * x
or x * (x + 1), instead. (We're now worried not just about what the macro expands to, but what the resultant
expression evaluates to.) Furthermore, since expressions like x++ * x++ are undefined according to the
ANSI/ISO C Standard, they can actually result in anything, even complete nonsense. So SQUARE(x++) simply
isn't going to work. (The explicit parentheses, by the way, don't make the expression any less undefined.)
There's no good fix for this third problem. We are going to have to remember that when we invoke function-like
macros, the macro might expand one of its arguments multiple times, so we had better not ever give it an argument
with a side effect, such as x++, or else the side effect might end up happening multiple times, with undefined
results. (That's one reason we always use capital letters for macro names, to remind ourselves that they are special,
and that we might have to be careful when invoking them.)
The other way around the third problem is not to use a function-like preprocessor macro at all, but instead to use a
genuine function. If we defined
int square(int x)
{
return x * x;
}
then we wouldn't have any of these problems. (Of course, then we'd have the limitation that we could only use this
square function on arguments of a certain type, in this case, int. We could declare it as accepting and returning
type double, but then we might worry that it was doing needless floating-point conversions in the cases where
we handed it integer values...)
When should you use a function-like macro and when should you use a real function? In most cases, it's safer to
use real functions. Generally, you use function-like macros only when the code they expand to is quite small and
simple, and when defining and using a real function would for some reason be awkward, or when the code will be
executed so often that the overhead of calling a real function would significantly impact the program's efficiency.
As an example of how a real function might be awkward, notice that we couldn't write SETBIT() and
CLEARBIT() as conventional functions, because functions can't modify their arguments, yet SETBIT() and
CLEARBIT() are supposed to. (That is, SETBIT(flags, DIRTY) modifies flags.)
To summarize the important rules of this section, whenever defining a function-like macro, remember:
1. Put parentheses around each instance of each macro parameter in the replacement text.
2. Put parentheses around the entire replacement text.

3. Capitalize the macro name to remind yourself that it is a macro, so that you won't call it on arguments with
side effects.
Remember, too, not to put a space between the macro name and the open parenthesis in the definition.
Rewriting our first three examples to follow these rules, we'd have:
#define SETBIT(x, b)
((x) |= (b))
#define CLEARBIT(x, b) ((x) &= ~(b))
#define TESTBIT(x, b) ((x) & (b))
(It's harder to see how SETBIT() and CLEARBIT() might fail if they weren't parenthesized, but unless you're
really sure of yourself, there usually isn't a reason not to use the extra parentheses.)
A few final notes about function-like preprocessor macros: Sometimes, people try to write function-like macros
which are even more like functions in that they expand to multiple statements; however, this is considerably
trickier than it looks (at least, if it's not to fall victim to additional sets of pitfalls). Also, people sometimes wish for
macros that take a variable number of arguments (in much the same way that the printf function accepts a
variable number of arguments), but there's not yet a good way to do this, either.


Suppose you have written a little set of functions which you expect that other parts of your program (or
other parts of other people's programs) will call. And, so that it will be easier for you (and them) to call
the functions correctly, suppose that you have written a header file containing external prototype
declarations for the functions. And, suppose that the prototypes look like this:
extern int f1(int);
extern double f2(int, double);
extern int f3(int, FILE *);
You might put these three declaration in a file called funcs.h.
For now, we don't need to worry about what these three functions might do, other than to notice that f3
obviously reads from or writes to a FILE * stdio stream.
Now, suppose that you have a source file containing a function which calls f1 and/or f2. At the top of
that source file, you would put the line
#include "funcs.h"
However, if you were unlucky, the compiler would get down to the line
within funcs.h and complain, because it would not know what a FILE is and so would not know how
to think about a function that accepts a pointer to one. If the calling program (that is, the source file that
included "funcs.h") didn't call f3 or printf or fopen or any of the other stdio functions, it would
have no reason to include <stdio.h>, and FILE would remain undefined. (If, on the other hand, the
source file in question did happen to include <stdio.h>, and if it included it before it included
"funcs.h", there would be no problem.)
What's the right thing to do here? We could say that anyone who included "funcs.h" always had to
include <stdio.h>, first. But you can think of header files a little bit like you think of functions: it's
nice if they're ``black boxes'', if you don't have to worry about what's inside them, if you don't have to
worry about including them in a certain order.
Another way to think about the situation is this: since the prototype for f3 inside of funcs.h needs
stdio.h, maybe we should put the line
#include <stdio.h>
right there at the top of funcs.h! Is that legal? Can the preprocessor handle seeing an #include
directive when it's already in the middle of processing another #include directive? The answer is that
yes, it can; header files (that is, #include directives) may be nested. (They may be nested up to a depth
of at least 8, although many compilers probably allow more.) Once funcs.h takes care of its own
needs, by including <stdio.h> itself, the eventual top-level file (that is, the one you compile, the one
that includes "funcs.h") won't get error messages about FILE being undefined, and won't have to
worry about whether it includes <stdio.h> or not.
Or will it? What if the top-level source file does include <stdio.h>? Now <stdio.h> will end up
being processed twice, once when the top-level source file asks for it, and once when funcs.h asks for
it. Will everything work correctly if <stdio.h> is included twice? Again, the answer is yes; the
Standard requires that the standard header files protect themselves against multiple inclusion.
It's good that the standard header files are protected in this way. But how do they protect themselves?
Suppose that we'd like to protect our own header files (such as funcs.h) in the same sort of way. How
would we do it?
Here's the usual trick. We rewrite funcs.h like this:
#ifndef FUNCS_H
#define FUNCS_H
#include <stdio.h>
extern int f1(int);
extern double f2(int, double);
#endif
All we've done is added the #ifndef and #define lines at the top, and the #ifndef line at the
bottom. (The macro name FUNCS_H doesn't really mean anything, it's just one we don't and won't use
anywhere else, so we use the convention of having its name mimic the name of the header file we're
protecting.) Now, here's what happens: the first time the compiler processes funcs.h, it comes across
the line
#ifndef FUNCS_H
and FUNCS_H is not defined, so it proceeds. The very next thing it does is #defines the macro
FUNCS_H (with a replacement text of nothing, but that's okay, because we're never going to expand
FUNCS_H, just test whether it's defined or not). Then it processes the rest of funcs.h, as usual. But, if
that same run of the compiler ever comes across funcs.h for a second time, when it comes to the first
#ifndef FUNCS_H line again, FUNCS_H will at that point be defined, so the preprocessor will skip
down to the #endif line, which will skip the whole header file. Nothing in the file will be processed a
second time.
(You might wonder what would tend to go wrong if a header file were processed multiple times. It's okay
to issue multiple external declarations for the same function or global variable, as long as they're all
consistent, so those wouldn't cause any problems. And the preprocessor also isn't supposed to complain if
you #define a macro which is already defined, as long as it has the same value, that is, the same
replacement text. But the compiler will complain if you try to define a structure type you've already
defined, or a typedef you've already defined (see section 18.1.6), etc. So the protection against multiple
inclusion is important in the general case.)
When header files are protected against multiple inclusion by the #ifndef trick, then header files can
include other files to get the declarations and definitions they need, and no errors will arise because one
file forgot to (or didn't know that it had to) include one header before another, and no multiple-definition
errors will arise because of multiple inclusion. I recommend this technique.
In closing, though, I might mention that this technique is somewhat controversial. When header files
include other header files, it can be hard to track down the chain of who includes what and who defines
what, if for some reason you need to know. Therefore, some style guides disallow nested header files. (I
don't know how these style guides recommend that you address the issue of having to require that certain
files be included before others.)


Pointers are viewed by many as the bane of C programming, because out-of-control pointers can do a lot of damage,
and can be hard to track down. But real programs tend to make heavy use of pointers. How can we keep pointers
under control?
The big problem with pointers, of course, is that they can point anywhere, including to places they're not supposed to.
When a pointer points to the wrong place (perhaps because it was never initialized properly, such that it essentially
points to a random place), a fetch of the data it ``points'' to will result in garbage (or may cause the program to crash
with a memory access violation), and a write of some new data to the location it ``points'' to will damage some other
part of your program, or of some other program, or of the operating system (or may cause the program to crash).
Crashes, in fact, though they're frustrating and annoying, may be preferable to the alternatives, namely performing
quiet but meaningless computations or damaging other code, both of which can be even more annoying and even
harder to track down.
Our goal, then, is to make sure that our pointers are always valid, or when they are not, to make sure that we can
know that they are not. First, then, let's discuss what we mean by a ``valid pointer.''
A valid pointer (more precisely, a valid pointer value) is one that does in fact point to an object of the type that the
pointer is declared to point to. Furthermore, if the pointer will be used to store new values, the old value must be
sitting in writable memory (that is, it must not be a variable that was declared const, or a string that results from a
string literal). In contrast to valid pointers, we may distinguish among several kinds of invalid pointers: null pointers,
uninitialized pointers, pointers to memory that used to exist but has disappeared, pointers to memory that once came
from malloc but has since been freed.
The tricky thing about valid and invalid pointers is that there's no simple way in C to ask `ìs this pointer valid?'' or
`ìs this pointer invalid?''. The only questions we can ask about pointers are `ìs this pointer equal to this other
pointer?'', `ìs this pointer unequal to this other pointer?'', and, for pointers into the same array, `ìs this pointer greater
or less than this other pointer?''.
Part one of our strategy for managing pointers, then, will be to arrange that all or most invalid pointers are null
pointers. Whenever we do anything which would cause a pointer to be invalid, that is, whenever we declare one (such
that it would otherwise have a garbage initial value), or whenever we do something that causes the memory which one
of our pointers used to point to to disappear, we'll set the pointer to NULL. Having done so, we can test whether the
pointer is currently valid by checking if it's not equal to the null pointer, or contrariwise, we can test whether it's
invalid by checking if it's equal to the null pointer.
Remember that C doesn't generally do any of this automatically. It does not guarantee that all newly-allocated
pointers are initialized to null pointers, and it does not insert automatic validity checks before you try to use a pointer.
If you want to be sure that a pointer is initialized to a null pointer, you must generally set it to NULL. If you have a
pointer which you're thinking of using but which might or might not be valid (and if it's a pointer which you believe
you'd have set to NULL if it was invalid), you must precede your use of the pointer with a test of the form
if(p != NULL)
Furthermore, if you write the test if(p != NULL), it does not in the general case mean `ìs p valid?''. The test
if(p != NULL) can only be used to mean `ìs p valid?'' if you have taken care to make sure that all non-valid
pointers have been set to null.
(There is one condition under which C does guarantee that a pointer variable will be initialized to a null pointer, and
that is when the pointer variable is a global variable or a member of a global structure, or more precisely, when it is
part of a variable, array, or structure which has static duration.)
Remember, too, that the shorthand form
if(p)
is precisely equivalent to if(p != NULL). So you may be able to read if(p) as `ìf p is valid'', but again, only if
you've ensured that whenever p is not valid, it is set to null.
The degree of care with which you have to implement a pointer management strategy may be different for different
pointer variables you use. If a pointer variable is immediately set to a valid pointer value, and if nothing ever happens
which could make it become invalid, then there's no need to check it before each time you use it. Similarly, if a
pointer is set to point to different locations from time to time, but it can be shown that it will always be valid, there's
again no reason to test it all the time. However, if a particular pointer is valid some of the time and invalid other of the
time, or in particular, if it records some optional data which might or might not be present, then you'll want to be very
careful to set the pointer to NULL whenever it's not valid (or whenever the optional data is not present), and to test the
pointer before using it (that is, before fetching or writing to the location that it points to).
Everything we've just said about ``pointer variables'' is equally true, and perhaps more important, for pointer fields
within structures. When you define a structure, you will typically be allocating many instances of that structure, so
you will have many instances of that pointer. You will typically have central pieces of code which operate on
instances of that structure, meaning that each time the piece of code runs, it may be operating on a different instance
of the structure, so if the pointer field is one that isn't always valid (that is, isn't valid in all instances of the structure),
the code had better test it before using it. Similarly, the code had better set the pointer field to NULL if it ever
invalidates it.
For example, one of the first features we added to the adventure game was a long description for objects and rooms.
But the long description is optional; not all objects and rooms have one. Suppose we chose to use a char * within
struct object and struct room to point at a dynamically-allocated string containing the long description.
(This choice would be preferable to a fixed-size array of char because it may be the case that some long descriptions
will be elaborately long, and we'd neither want to limit the potential length of descriptions by having a too-small array
nor waste space for objects with short or empty descriptions by always using a too-large array.) For each instance of
an object or room structure, we'd initialize the description field to contain a null pointer. For each room or object with
a long description, we'd set the description field to contain a pointer to the appropriate (and appropriately-allocated)
string. Finally, when it came time to print the descrition, we'd use code like
if(objp->desc != NULL)
printf("%s\n", objp->desc);
else
printf("You see nothing special about the %s.\n", objp->name);
Particular care is needed when pointers point to dynamically-allocated memory, managed with the standard library
functions malloc, free, and realloc. Somehow, it's easier to make mistakes here, and their consequences tend
to be more damaging and harder to track down.

First of all, of course, you must always ensure that the allocation functions malloc and realloc succeed. These
functions return null pointers when they are unable to allocate the requested memory, so you must always check the
return value to see that it is not a null pointer, before using it. (If the return value is a null pointer, you will generally
print some kind of error message and abort at least the particular function that needed the allocated memory, or
perhaps abort the entire program.)
Don't get in the habit of assuming that a single, simple call to malloc will `àlways'' succeed. Don't make excuses
like ``this program doesn't use much memory to begin with, and I'm only allocating 10 bytes here, so how can it
possibly fail?'' For one thing, there are more reasons for malloc to fail--and return a null pointer--than that there was
no more memory. Typically, malloc will also return a null pointer if it is able to detect that you have misused some
of the memory that you have previously allocated, perhaps by writing to more of it than you asked for. In this case,
malloc is trying to tell you something, something you need to know, and although its voice is small (and although
tracking down the problem that it's complaining about may be difficult), you will only have more problems, and more
difficult to track down, if malloc returns a null pointer but you then use that pointer as if it were valid. (As an
example of how it can be alarmingly easy to misuse the memory that malloc gives you, consider this hypothetical
scrap of code for making a dynamically-allocated copy of a string:
char *copystring = malloc(strlen(originalstring))
if(copystring != NULL)
strcpy(copystring, originalstring);
/* Beware... */
Hint: what about the \0 that terminates the string?)

In a program that allocates a lot of different pieces of memory for a lot of different things, it can be a real nuisance to
have to check each pointer returned from each call to malloc to make sure it's not null. One popular shortcut is to
define a ``wrapper'' function around malloc, which calls malloc and checks the return value in one central place. For
example, the adventure game uses the function
#include <stdio.h>
#include <stdlib.h>
#include "chkmalloc.h"
void *
chkmalloc(size_t sz)
{
void *ret = malloc(sz);
if(ret == NULL)
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
return ret;
}
One way to think about chkmalloc is that it centralizes the test on malloc's return value. Another way of thinking
about it is that it is a special, alternate version of malloc that never returns NULL. (The fact that it never returns
NULL does not mean that it never fails, but just that if/when it does fail, it signifies this by calling exit instead of
returning NULL.) Aborting the entire program when a call to malloc fails may seem draconian, and there are
programs (e.g. text editors) for which it would be a completely unacceptable strategy, but it's fine for our purposes,
especially if it doesn't happen very often. (In any case, aborting the program cleanly with a message like `Òut of
memory'' is still vastly preferable to crashing horribly and mysteriously, which is what programs that don't check
malloc's return value eventually do.)
Another area of concern is that when you're calling free and realloc, there are more ways for pointers to become
invalid. For example, consider the code
/* p is known to have come from malloc() */
free(p);
After calling free, is p valid or invalid? C uses pass-by-value, so p's value hasn't changed. (The free function
couldn't change it if it tried.) But p is most definitely now invalid; it no longer points to memory which the program
can use. However, it does still point just where it used to, so if the program accidentally uses it, there will still seem to
be data there, except that the data will be sitting in memory which may now have been allocated to ``someone else''!
Therefore, if the variable p persists (that is, if it's something other than a local variable that's about to disappear when
its function returns, or a pointer field within a structure which is all about to disappear), it would probably be a good
idea to set p to NULL:
free(p);
p = NULL;
(Of course, setting p to NULL only accomplishes something if later uses of p check it before using it.)
Finally, let's think about realloc. realloc, remember, attempts to enlarge a chunk of memory which we
originally obtained from malloc. (It lets us change our mind about how much memory we had asked for.) But
realloc is not always able to enlarge a chunk of memory in-place; sometimes it must go elsewhere in memory to
find a contiguous piece of memory big enough to satisfy the enlargement request. So what about this code?
newp = realloc(oldp, newsize);
Is oldp valid or invalid after this call? It depends on whether realloc returned the old pointer value or not (that is,
on whether it was able to enlarge the memory block in-place or had to go elsewhere). Most of the time, you will use
realloc something like this:
newp = realloc(p, newsize);
if(newp != NULL)
{
/* success; got newsize */
p = newp;
}
else
{
/* failure; p still points to block of old size */
}
With a setup like this, p remains valid, and newp is a temporary variable which we don't use further after testing it
and perhaps assigning it to p.
A final issue concerns pointer aliases. If several pointers point into the same block of memory, and if that block of
memory moves or disappears, all the old pointers become invalid. If you have a sequence of code which amounts to
p2 = p;
...
free(p);
p = NULL;
then setting p to NULL may not have been sufficient, because p2 just became invalid, too, and may also need setting
to NULL. The situation is particularly tricky with realloc: suppose that you have a pointer to a chunk of memory:
char *p = malloc(10);
and another pointer which points within that chunk:
char *p2 = p + 5;
Now, if you reallocate p, and if realloc has to go elsewhere and so returns a different pointer value which you
assign to p, you've also got to fix up p2, because it just had the rug yanked out from under it, and is now invalid. To
keep p2 up-to-date, you might use code like this:
int p2offset = p2 - p;
newp = realloc(p, newsize);
if(newp != NULL)
{
/* success; got newsize */
p = newp;
p2 = p + p2offset;
}
else
{
/* failure; p and p2 still point to block of old size */
}
Before calling realloc, we record (in the int variable p2offset) how far beyond p the secondary pointer p2
used to point, so that we can generate a corresponding new value of p2 if p moves.


Since we can have pointers to int, and pointers to char, and pointers to any structures we've defined,
and in fact pointers to any type in C, it shouldn't come as too much of a surprise that we can have
pointers to other pointers. If we're used to thinking about simple pointers, and to keeping clear in our
minds the distinction between the pointer itself and what it points to, we should be able to think about
pointers to pointers, too, although we'll now have to distinguish between the pointer, what it points to,
and what the pointer that it points to points to. (And, of course, we might also end up with pointers to
pointers to pointers, or pointers to pointers to pointers to pointers, although these rapidly become too
esoteric to have any practical use.)
The declaration of a pointer-to-pointer looks like
int **ipp;
where the two asterisks indicate that two levels of pointers are involved.
Starting off with the familiar, uninspiring, kindergarten-style examples, we can demonstrate the use of
ipp by declaring some pointers for it to point to and some ints for those pointers to point to:
int i = 5, j = 6; k = 7;
int *ip1 = &i, *ip2 = &j;
Now we can set
ipp = &ip1;
and ipp points to ip1 which points to i. *ipp is ip1, and **ipp is i, or 5. We can illustrate the
situation, with our familiar box-and-arrow notation, like this:
If we say
*ipp = ip2;
we've changed the pointer pointed to by ipp (that is, ip1) to contain a copy of ip2, so that it (ip1)
now points at j:
If we say
*ipp = &k;
we've changed the pointer pointed to by ipp (that is, ip1 again) to point to k:
What are pointers to pointers good for, in practice? One use is returning pointers from functions, via
pointer arguments rather than as the formal return value. To explain this, let's first step back and consider
the case of returning a simple type, such as int, from a function via a pointer argument. If we write the
function
f(int *ip)
{
*ip = 5;
}
and then call it like this:
int i;
f(&i);
then f will ``return'' the value 5 by writing it to the location specified by the pointer passed by the caller;
in this case, to the caller's variable i. A function might ``return'' values in this way if it had multiple
things to return, since a function can only have one formal return value (that is, it can only return one
value via the return statement.) The important thing to notice is that for the function to return a value
of type int, it used a parameter of type pointer-to-int.
Now, suppose that a function wants to return a pointer in this way. The corresponding parameter will
then have to be a pointer to a pointer. For example, here is a little function which tries to allocate
memory for a string of length n, and which returns zero (``false'') if it fails and 1 (nonzero, or ``true'') if it
succeeds, returning the actual pointer to the allocated memory via a pointer:
#include <stdlib.h>
int allocstr(int len, char **retptr)
{
char *p = malloc(len + 1);
if(p == NULL)
return 0;
*retptr = p;
return 1;
}
/* +1 for \0 */
The caller can then do something like

char *string = "Hello, world!";
char *copystr;
if(allocstr(strlen(string), &copystr))
strcpy(copystr, string);
else
(This is a fairly crude example; the allocstr function is not terribly useful. It would have been just
about as easy for the caller to call malloc directly. A different, and more useful, approach to writing a
``wrapper'' function around malloc is exemplified by the chkmalloc function we've been using.)
One side point about pointers to pointers and memory allocation: although the void * type, as returned
by malloc, is a ``generic pointer,'' suitable for assigning to or from pointers of any type, the
hypothetical type void ** is not a ``generic pointer to pointer.'' Our allocstr example can only be
used for allocating pointers to char. It would not be possible to use a function which returned generic
pointers indirectly via a void ** pointer, because when you tried to use it, for example by declaring
and calling
double *dptr;
if(!hypotheticalwrapperfunc(100, sizeof(double), &dptr))
you would not be passing a void **, but rather a double **.
Another good use for pointers to pointers is in dynamically allocated, simulated multidimensional arrays,
which we'll discuss in the next chapter.
As a final example, let's look at how pointers to pointers can be used to eliminate a nuisance we've had
when trying to insert and delete items in linked lists. For simplicity, we'll consider lists of integers, built
using this structure:
struct list
{
int item;
struct list *next;
};
Suppose we're trying to write some code to delete a given integer from a list. The straightforward
solution looks like this:
/* delete node containing i from list pointed to by lp */
struct list *lp, *prevlp;
for(lp = list; lp != NULL; lp = lp->next)
{
if(lp->item == i)
{
if(lp == list)
list = lp->next;
else
prevlp->next = lp->next;
break;
}
prevlp = lp;
}
}
This code works, but it has two blemishes. One is that it has to use an extra variable to keep track of the
node one behind the one it's looking at, and the other is that it has to use an extra test to special-case the
situation in which the node being deleted is at the head of the list. Both of these problems arise because
the deletion of a node from the list involves modifying the previous pointer to point to the next node (that
is, the node before the deleted node to point to the one following). But, depending on whether the node
being deleted is the first node in the list or not, the pointer that needs modifying is either the pointer that
points to the head of the list, or the next pointer in the previous node.
To illustrate this, suppose that we have the list (1, 2, 3) and we're trying to delete the element 1. After
we've found the element 1, lp points to its node, which just happens to be the same node that the main
list pointer points to, as illustrated in (a) below:
To remove element 1 from the list, then, we must adjust the main list pointer so that it points to 2's
node, the new head of the list (as shown in (b)). If we were trying to delete node 2, on the other hand (as
illustrated in (c) above), we'd have to adjust node 1's next pointer to point to 3. The prevlp pointer
keeps track of the previous node we were looking at, since (at other than the first node in the list) that's
the node whose next pointer will need adjusting. (Notice that if we were to delete node 3, we would
copy its next pointer over to 2, but since 3's next pointer is the null pointer, copying it to node 2
would make node 2 the end of the list, as desired.)
We can write another version of the list-deletion code, which is (in some ways, at least) much cleaner, by
using a pointer to a pointer to a struct list. This pointer will point at the pointer which points at
the node we're looking at; it will either point at the head pointer or at the next pointer of the node we
looked at last time. Since this pointer points at the pointer that points at the node we're looking at (got
that?), it points at the pointer which we need to modify if the node we're looking at is the node we're
deleting. Let's see how the code looks:
struct list **lpp;
for(lpp = &list; *lpp != NULL; lpp = &(*lpp)->next)
{
if((*lpp)->item == i)
{
*lpp = (*lpp)->next;
break;
}
}
}
That single line
*lpp = (*lpp)->next;
updates the correct pointer, to splice the node it refers to out of the list, regardless of whether the pointer
being updated is the head pointer or one of the next pointers. (Of course, the payoff is not absolute,
because the use of a pointer to a pointer to a struct list leads to an algorithm which might not be
nearly as obvious at first glance.)

To illustrate the use of the pointer-to-pointer lpp graphically, here are two more figures illustrating the
situation just before deleting node 1 (on the left) or node 2 (on the right).
In both cases, lpp points at a struct node pointer which points at the node to be deleted. In both
cases, the pointer pointed to by lpp (that is, the pointer *lpp) is the pointer that needs to be updated. In
both cases, the new pointer (the pointer that *lpp is to be updated to) is the next pointer of the node
being deleted, which is always (*lpp)->next.
One other aspect of the code deserves mention. The expression
(*lpp)->next
describes the next pointer of the struct node which is pointed to by *lpp, that is, which is pointed
to by the pointer which is pointed to by lpp. The expression
lpp = &(*lpp)->next
sets lpp to point to the next field of the struct list pointed to by *lpp. In both cases, the
parentheses around *lpp are needed because the precedence of * is lower than ->.
As a second, related example, here is a piece of code for inserting a new node into a list, in its proper
order. This code uses a pointer-to-pointer-to-struct list for the same reason, namely, so that it
doesn't have to worry about treating the beginning of the list specially.
/* insert node newlp into list */
struct list **lpp;
for(lpp = &list; *lpp != NULL; lpp = &(*lpp)->next)
{
struct list *lp = *lpp;
if(newlp->item < lp->item)
{
newlp->next = lp;
*lpp = newlp;
break;
}
}
}

Chapter 23: Two-Dimensional (and

Multidimensional) Arrays
C does not have true multidimensional arrays. However, because of the generality of C's type system,
you can have arrays of arrays, which are almost as good. (This should not surprise you; you can have
arrays of anything, so why not arrays of arrays?)
We'll use two-dimensional arrays as examples in this section, although the techniques can be extended to
three or more dimensions. Here is a two-dimensional array:
int a2[5][7];
Formally, a2 is an array of 5 elements, each of which is an array of 7 ints. (We usually think of it as
having 5 rows and 7 columns, although this interpretation is not mandatory.)
Just as we declared a two-dimensional array using two pairs of brackets, we access its individual
elements using two pairs of brackets:
int i, j;
for(i = 0; i < 5; i++)
{
for(j = 0; j < 7; j++)
a2[i][j] = 0;
}
Make sure that you remember to put each subscript in its own, correct pair of brackets. Neither
int a2[5, 7];
/* XXX WRONG */
a2[i, j] = 0;
/* XXX WRONG */
a2[j][i] = 0;
/* XXX WRONG */
nor
nor
would do anything remotely like what you wanted.
23.1: Multidimensional Arrays and Functions

23.2: Dynamically Allocating Multidimensional Arrays


The most straightforward way of passing a multidimensional array to a function is to declare it in exactly
the same way in the function as it was declared in the caller. If we were to call
func(a2);
then we might declare
func(int a[5][7])
{
...
}
and it's clear that the array type which the caller passes is the same as the type which the function func
accepts.
If we remember what we learned about simple arrays and functions, however, two questions arise. First,
in our earlier function definitions, we were able to leave out the (single) array dimension, with the
understanding that since the array was really defined in the caller, we didn't have to say (or know) how
big it is. The situation is the same for multidimensional arrays, although it may not seem so at first. The
hypothetical function func above accepts a parameter a, where a is an array of 5 things, where each of
the 5 things is itself an array. By the same argument that applies in the single-dimension case, the
function does not have to know how big the array a is, overall. However, it certainly does need to know
what a is an array of. It is not enough to know that a is an array of `òther arrays''; the function must
know that a is an array of arrays of 5 ints. The upshot is that although it does not need to know how
many ``rows'' the array has, it does need to know the number of columns. That is, if we want to leave out
any dimensions, we can only leave out the first one:
func(int a[][7])
{
...
}
The second dimension is still required. (For a three- or more dimensional array, all but the first
dimension are required; again, only the first dimension may be omitted.)
The second question we might ask concerns the equivalence between pointers and arrays. We know that
when we pass an array to a function, what really gets passed is a pointer to the array's first element. We
know that when we declare a function that seems to accept an array as a parameter, the compiler quietly
compiles the function as if that parameter were a pointer, since a pointer is what it will actually receive.
What about multidimensional arrays? What kind of pointer is passed down to the function?
The answer is, a pointer to the array's first element. And, since the first element of a multidimensional
array is another array, what gets passed to the function is a pointer to an array. If you want to declare the
function func in a way that explicitly shows the type which it receives, the declaration would be
func(int (*a)[7])
{
...
}
The declaration int (*a)[7] says that a is a pointer to an array of 7 ints. Since declarations like
this are hard to write and hard to understand, and since pointers to arrays are generally confusing, I
recommend that when you write functions which accept multidimensional arrays, you declare the
parameters using array notation, not pointer notation.
What if you don't know what the dimensions of the array will be? What if you want to be able to call a
function with arrays of different sizes and shapes? Can you say something like
func(int x, int y, int a[x][y])
{
...
}
where the array dimensions are specified by other parameters to the function? Unfortunately, in C, you
cannot. (You can do so in FORTRAN, and you can do so in the extended language implemented by gcc,
and you will be able to do so in the new version of the C Standard (``C9X'') to be completed in 1999, but
you cannot do so in standard, portable C, today.)
Finally, we might explicitly note that if we pass a multidimensional array to a function:
int a2[5][7];
func(a2);
we can not declare that function as accepting a pointer-to-pointer:
func(int **a)
{
...
}
/* WRONG */
As we said above, the function ends up receiving a pointer to an array, not a pointer to a pointer.

23.2: Dynamically Allocating Multidimensional

Arrays
We've seen that it's straightforward to call malloc to allocate a block of memory which can simulate an
array, but with a size which we get to pick at run-time. Can we do the same sort of thing to simulate
multidimensional arrays? We can, but we'll end up using pointers to pointers.
If we don't know how many columns the array will have, we'll clearly allocate memory for each row (as
many columns wide as we like) by calling malloc, and each row will therefore be represented by a
pointer. How will we keep track of those pointers? There are, after all, many of them, one for each row.
So we want to simulate an array of pointers, but we don't know how many rows there will be, either, so
we'll have to simulate that array (of pointers) with another pointer, and this will be a pointer to a pointer.
This is best illustrated with an example:
#include <stdlib.h>
int **array;
array = malloc(nrows * sizeof(int *));
if(array == NULL)
{
exit or return
}
{
array[i] = malloc(ncolumns * sizeof(int));
if(array[i] == NULL)
{
exit or return
}
}
array is a pointer-to-pointer-to-int: at the first level, it points to a block of pointers, one for each row.
That first-level pointer is the first one we allocate; it has nrows elements, with each element big enough
to hold a pointer-to-int, or int *. If we successfully allocate it, we then fill in the pointers (all nrows
of them) with a pointer (also obtained from malloc) to ncolumns number of ints, the storage for
that row of the array. If this isn't quite making sense, a picture should make everything clear:
Once we've done this, we can (just as for the one-dimensional case) use array-like syntax to access our
simulated multidimensional array. If we write
array[i][j]
we're asking for the i'th pointer pointed to by array, and then for the j'th int pointed to by that inner
pointer. (This is a pretty nice result: although some completely different machinery, involving two levels
of pointer dereferencing, is going on behind the scenes, the simulated, dynamically-allocated twodimensional `àrray'' can still be accessed just as if it were an array of arrays, i.e. with the same pair of
bracketed subscripts.)
If a program uses simulated, dynamically allocated multidimensional arrays, it becomes possible to write
``heterogeneous'' functions which don't have to know (at compile time) how big the `àrrays'' are. In
other words, one function can operate on `àrrays'' of various sizes and shapes. The function will look
something like
func2(int **array, int nrows, int ncolumns)
{
}
This function does accept a pointer-to-pointer-to-int, on the assumption that we'll only be calling it with
simulated, dynamically allocated multidimensional arrays. (We must not call this function on arrays like
the ``true'' multidimensional array a2 of the previous sections). The function also accepts the dimensions
of the arrays as parameters, so that it will know how many ``rows'' and ``columns'' there are, so that it can
iterate over them correctly. Here is a function which zeros out a pointer-to-pointer, two-dimensional
`àrray'':
void zeroit(int **array, int nrows, int ncolumns)
{
int i, j;
{
for(j = 0; j < ncolumns; j++)
array[i][j] = 0;
}
}
Finally, when it comes time to free one of these dynamically allocated multidimensional `àrrays,'' we
must remember to free each of the chunks of memory that we've allocated. (Just freeing the top-level
pointer, array, wouldn't cut it; if we did, all the second-level pointers would be lost but not freed, and
would waste memory.) Here's what the code might look like:
free(array[i]);
free(array);


The pointers we have looked at so far have all been pointers to various types of data objects, but it is also
possible to have pointers to functions. Pointers to functions are useful for approximately the same
reasons as pointers to data: when you want an extra level of indirection, when you'd like the same piece
of code to call different functions depending on circumstances.
24.1 Declaring, Assigning, and Using Function Pointers
24.2 What are Function Pointers Good For?
24.3 Function Pointers and Prototypes

24.1 Declaring, Assigning, and Using Function

Pointers
Just as for data pointers, we can think of three steps involved in using function pointers. First, we must
declare a variable which can hold a pointer to a function, and this ends up being a somewhat complex
declaration. A simple function pointer declaration looks like this:
int (*pfi)();
This declares pfi as a pointer to a function which will return an int. As in other declarations, the *
indicates that a pointer is involved, and the parentheses () indicate that a function is involved. But what
about the extra parentheses around (*pfi)? They're needed because there are precedence relationships
in declarations just as there are in expressions, and when the default precedence doesn't give you what
you want, you have to override it with explicit parentheses. In declarations, the () indicating functions
and the [] indicating arrays ``bind'' more tightly than the *'s indicating pointers. Without the extra
parentheses, the declaration above would look like
int *pfi();
/* WRONG, for pointer-to-function */
and this would declare a function returning a pointer to int. With the explicit parentheses, however,
int (*pfi)() tells us that pfi is a pointer first, and that what it's a pointer to is a function, and
what that function returns is an int.
It's common to use typedefs (see section 18.1.6) with complicated types such as function pointers. For
example, after defining
typedef int (*funcptr)();
the identifier funcptr is now a synonym for the type ``pointer to function returning int''. This typedef
would make declaring pointers such as pfi considerably easier:
funcptr pfi;
(In section 18.1.6, we mentioned that typedefs were a little bit like preprocessor define directives, but
better. Here we see another reason: there's no way we could define a preprocessor macro which would
expand to a correct function pointer declaration, but the funcptr type we just defined using typedef
will work just fine.)
Once declared, a function pointer can of course be set to point to some function. If we declare some
functions:
extern int f1();

extern int f2();
extern int f3();
then we can set our pointer pfi to point to one of them:
pfi = &f1;
or to one or another of them depending on some condition:
if(condition)
pfi = &f2;
else
pfi = &f3;
(Of course, we're not restricted to these two forms; we can assign function pointers under any
circumstances we wish. The second example could be rendered more compactly using the conditional
operator: pfi = condition ? &f2 : &f3 .)
In these examples, we've used the & operator as we always have, to generate a pointer. However, when
generating pointers to functions, the & is optional, because when you mention the name of a function but
are not calling it, there's nothing else you could possibly be trying to do except generate a pointer to it.
So, most programmers write
pfi = f1;
or
if(condition)
pfi = f2;
else
pfi = f3;
(or, equivalently, using the conditional operator, pfi = condition ? f2 : f3 ).
(The fact that a function pointer is generated automatically when a function appears in an expression but
is not being called is very similar to, and in fact related to, the fact that a pointer to the first element of an
array is generated automatically when an array appears in an expression.)
Finally, once we have a function pointer variable which does point to a function, we can call the function
that it points to. Broken down to a near-microscopic level, this, too, is a three-step procedure. First, we
write the name of the function pointer variable:
pfi
This is a pointer to a function. Then, we put the * operator in front, to ``take the contents of the pointer'':
*pfi
Now we have a function. Finally, we append an argument list in parentheses, along with an extra set of
parentheses to get the precedence right, and we have a function call:
(*pfi)(arg1, arg2)
The extra parentheses are needed here for almost exactly the same reason as they were in the declaration
of pfi. Without them, we'd have
*pfi(arg1, arg2)
/* WRONG, for pointer-to-function */
and this would say, ``call the function pfi (which had better return a pointer), passing it the arguments
arg1 and arg2, and take the contents of the pointer it returns.'' However, what we want to do is take
the contents of pfi (which is a pointer to a function) and call the pointed-to function, passing it the
arguments arg1 and arg2. Again, the explicit parentheses override the default precedence, arranging
that we apply the * operator to pfi and then do the function call.
Just to confuse things, though, parts of the syntax are optional here as well. There's nothing you can do
with a function pointer except assign it to another function pointer, compare it to another function
pointer, or call the function that it points to. If you write
pfi(arg1, arg2)
it's obvious, based on the parenthesized argument list, that you're trying to call a function, so the
compiler goes ahead and calls the pointed to function, just as if you'd written
(*pfi)(arg1, arg2)
When calling the function pointed to by a function pointer, the * operator (and hence the extra set of
parentheses) is optional. I prefer to use the explicit *, because that's the way I learned it and it makes a
bit more sense to me that way, but you'll also see code which leaves it out.

In this section we'll list several situations where function pointers can be useful.
A common problem is sorting, that is, rearranging a set of items into a desired sequence. It's especially
common for the set of items to be contained in an array. Suppose you were writing a function to sort an
array of strings. Roughly speaking, the algorithm for sorting the elements of an array looks like this:
for all pairs of elements in the array
if the pair is out of order
swap them
In C, the outline of the function might therefore look like this:
stringsort(char *strings[], int nstrings)
{
for( all pairs of elements i, j in strings )
{
if(strcmp(strings[i], strings[j]) > 0)
{
char *tmp = strings[i];
strings[i] = strings[j];
strings[j] = tmp;
}
}
}
Remember, the library function strcmp compares two strings and returns either a negative number, zero,
or a positive number depending on whether the first string was ``less than,'' the same as, or ``greater than''
the second string. If you haven't seen it before, the series of three assignments involving the temporary
variable tmp is a standard idiom for swapping two items. (The more obvious pair of assignments
strings[j] = strings[i];
would just about as obviously not work, because the first line would obliterate strings[i] before the
second line had a chance to assign it to string[j].)
Naturally, a real sort function implementing a real sorting algorithm will be a bit more elaborate; in
particular, the details of how we choose which pairs of elements to compare (and in what order) can make
a huge difference in how efficiently the function completes its job. We'll show a complete example in a
minute, but for now, our more important focus is on the comparison step. The final ordering of the strings
will depend on the strcmp function's definition of what it means for one string to be ``less than'' or
``greater than'' another. How strcmp compares strings (as we saw in chapter 8) is to look at them a
character at a time, based on their values in the machine's character set (which is how C always treats
characters). In ASCII, the character set that most computers use, the codes representing the letters are in
order, so strcmp gives you something pretty close to alphabetical order, with the significant difference
that all the upper-case letters come before the lower-case letters. So a string-sorting function built around
strcmp would sort the words ``Zeppelin,'' `àble,'' ``baker,'' and ``Charlie'' into the order
Charlie
Zeppelin
able
baker
strcmp is quite useless for sorting strings which consist entirely of digits, because it goes from left to
right, stopping when it finds the first difference, without regard to whether it was comparing the ten's digit
of one number to the hundred's of another. For example, a strcmp-based sort would sort the strings "1",
"2", "3", "4", "12", "24", and "234" into the order
1
12
2
234
24
3
4
Depending on circumstances, we might want our string sorting function to sort into the order that strcmp
uses, or into ``dictionary'' order (that is, with all the a's together, both upper-case and lower-case), or into
numeric order. We could pass our stringsort function a flag or code telling it which comparison
strategy to use, although that would mean that whenever we invented a new comparison strategy, we
would have to define a new code or flag value and rewrite stringsort. But, if we observe that the final
ordering depends entirely on the behavior of the comparison function, a neater implementation is if we
write our stringsort function to accept a pointer to the function which we want it to use to compare
each pair of strings. It will go through its usual routine of comparing and exchanging, but whenever it
makes the comparisons, the actual function it calls will be the function pointed to by the function pointer
we hand it. Making it sort in a different order, according to a different comparison strategy, will then not
require rewriting stringsort at all, but instead will just involve passing it a pointer to a different
comparison function (after perhaps writing that function).
Here is a complete implementation of stringsort, which also accepts a pointer to the string
comparison function:
void stringsort(char *strings[], int nstrings, int (*cmpfunc)())
{
int i, j;
int didswap;
do
{
didswap = 0;
for(i = 0; i < nstrings - 1; i++)
{
j = i + 1;
if((*cmpfunc)(strings[i], strings[j]) > 0)
{
strings[j] = tmp;
didswap = 1;
}
}
} while(didswap);
}
(This code uses a fairly simpleminded sorting algorithm. It repeatedly compares adjacent pairs of
elements, keeping track of whether it had to exchange any. If it makes it all the way through the array
without exchanging any, it's done; otherwise, it has at least made progress, and it goes back for another
pass. This is not the world's best algorithm; in fact it's not far from the infamous ``bubble sort.'' Although
our focus here is on function pointers, not sorting, in a minute we'll take time out and look at a simple
improvement to this algorithm which does make it a decent one.)
Now, if we have an array of strings
char *array1[] = {"Zeppelin", "able", "baker", "Charlie"};
we can sort it into strcmp order by calling
stringsort(array1, 4, strcmp);
Notice that in this call, we are not calling strcmp immediately; we are generating a pointer to strcmp,
and passing that pointer as the third argument in our call to stringsort.
If we wanted to sort these words in ``dictionary'' order, we could write a version of strcmp which
ignores case when comparing letters:
#include <ctype.h>
int dictstrcmp(char *str1, char *str2)

{
while(1)
{
char c1 = *str1++;
char c2 = *str2++;
if(isupper(c1))
c1 = tolower(c1);
if(isupper(c2))
c2 = tolower(c2);
if(c1 != c2)
return c1 - c2;
if(c1 == '\0')
return 0;
}
}
(The functions isupper and tolower are both from the standard library and are declared in
<ctype.h>. isupper returns ``true'' if its argument is an upper-case letter, and tolower converts its
argument to a lower-case letter.)
With dictstrcmp in hand, we can sort our array a different way:
stringsort(array1, 4, dictstrcmp);
(Some C libraries contain case-independent versions of strcmp called stricmp or strcasecmp.
Both of these would do the same thing as our dictstrcmp, although neither of them is standard.)
We can also write another string-comparison function which treats the strings as numbers, and compares
them numerically:
int numstrcmp(char *str1, char *str2)
{
int n1 = atoi(str1);
int n2 = atoi(str2);
if(n1 < n2)
return -1;
else if(n1 == n2) return 0;
else
return 1;
}
Then, we can sort an array of numeric strings correctly:

char *array2[] = {"1", "234", "12", "3", "4", "24", "2"};
stringsort(array2, 7, numstrcmp);
(As an aside, you will occasionaly see code which is supposed to compare two integers and return a
negative, zero, or positive result--i.e. just like the numstrcmp function above--but which does so by the
seemingly simpler technique of just saying
return n1 - n2;
It turns out that this trick has a serious drawback. Suppose that n1 is 32000, and n2 is -32000. Then n1 n2 is 64000, which is not guaranteed to fit in an int, and will overflow on some machines, producing an
incorrect result. So the explicit comparison code such as numcmp used is considerably safer.)
Finally, since we've started looking at sorting functions, let's look at a simple improvement to the string
sorting function we've just been using. It has been plodding along comparing adjacent elements, so when
an element is far out of place, it takes many passes to percolate it to the correct position (one cell at a
time). The improvement, based on the work of Donald L. Shell, is to compare pairs of more widelyseparated elements at first, then proceed to compare more and more closely-situated elements, until on the
last pass (or passes) we compare adjacent elements, as before. Since the earlier passes will have done
more of the work (and more quickly), the later passes will just have to do the ``final cleanup.'' Here is the
improvement:
void stringsort(char *strings[], int nstrings, int (*cmpfunc)())
{
int i, j, gap;
int didswap;
for(gap = nstrings / 2; gap > 0; gap /= 2)
{
do
{
didswap = 0;
for(i = 0; i < nstrings - gap; i++)
{
j = i + gap;
if((*cmpfunc)(strings[i], strings[j]) > 0)
{
strings[j] = tmp;
didswap = 1;
}
}
} while(didswap);
}
}
The inner loops are the same as before, except that where we had before always been computing j = i
+ 1, now we compute j = i + gap, where gap is a new variable (controlled by a third, outer loop)
which starts out large (nstrings / 2), and then diminishes until it's 1. Although this code contains
three nested loops instead of two, it will end up making far fewer trips through the inner loop, and so will
execute faster.
The Standard C library contains a general-purpose sort function, qsort (declared in <stdlib.h>),
which can sort any type of data (not just strings). It's a bit trickier to use (due to this generality), and you
almost always have to write an auxiliary comparison function. For example, due to the generic way in
which qsort calls your comparison function, you can't use strcmp directly even when you're sorting
strings and would be satisfied with strcmp's ordering. Here is an auxiliary comparison function and the
corresponding call to qsort which would behave like our earlier call to stringsort(array1, 4,
strcmp) :
/* compare strings via pointers */
int pstrcmp(const void *p1, const void *p2)
{
return strcmp(*(char * const *)p1, *(char * const *)p2);
}
...
qsort(array1, 4, sizeof(char *), pstrcmp);
When you call qsort, it calls your comparison function with pointers to the elements of your array.
Since array1 is an array of pointers, the comparison function ends up receiving pointers to pointers. But
qsort doesn't know that the array is an array of pointers; it's written so that it can sort arrays of anything.
(That's why qsort's third argument is the size of the array element.) Since qsort doesn't know what the
type of the elements being sorted is, it uses void pointers to those elements when it calls the comparison
function. (The use of void pointers here recalls their use with malloc, where the situation is similar:
malloc returns pointers to void because it doesn't know what type we'll use the pointers to point to.) In
the ``wrapper'' function pstrcmp, we use the explicit cast (char * const *) to convert the void
pointers which the function receives into pointers to (pointers to char) so that when we use one *
operator to find out what they point to, we get one of the character pointers (char *) which we know our
array1 array actually contains. We pass the resulting two char *'s to strcmp to do most of the work,
but we have to do a bit of work first to recover the correct pointers. (The extra const in the cast (char
* const *) keeps the compiler from complaining, since the pointers being passed in to the
comparison function are actually const void *, meaning that although it may not be clear what they
point to, it's guaranteed that we won't be using them to modify whatever it is they point to. We need to
keep a level of const-ness in the converted pointers so that the compiler doesn't worry that we're going
to accidentally use the converted pointers to modify what we shouldn't.)
That was a rather elaborate first example of what function pointers can be used for! Let's move on to some
others.
Suppose you wanted a program to plot equations. You would give it a range of x and y values (perhaps -10
to 10 along each axis), and ask it to plot y = f(x) over that range (e.g. for -10 <= x <= 10). How would you
tell it what function to plot? By a function pointer, of course! The plot function could then step its x
variable over the range you specified, calling the function pointed to by your pointer each time it needed
to compute y. You might call
plot(-10, 10, -10, 10, sin)
to plot a sine wave, and
plot(-10, 10, -10, 10, sqrt)
to plot the square root function.
(If you were to try to write this plot function, your first question would be how to draw lines or
otherwise do graphics in C at all, and unfortunately this is one of the questions which the C language
doesn't answer. There are potentially different ways of doing graphics, with different system functions to
call, for every different kind of computer, operating system, and graphics device.)
One of the simplest (and allegedly least user-friendly) styles of user interfaces is the command line
interface, or CLI. The user types a command, hits the RETURN key, and the system interprets the
command. Often, the first ``word'' on the line is the command name, and any remaining words are
`àrguments'' or `òption flags.'' The various shells under Unix, COMMAND.COM under MS-DOS, and
the adventure game we've been writing are all examples of CLI's. If you sit down to write some code
implementing a CLI, it's simple enough to read a line of text typed by the user, and simple enough to
break it up into words. But how do you map the first word on the line to the code which implements that
command? A straightforward, rather simpleminded way is to use a giant if/else chain:
if(strcmp(command, "agitate") == 0)
{ code for `àgitate'' command }
else if(strcmp(command, "blend") == 0)
{ code for ``blend'' command }
else if(strcmp(command, "churn") == 0)
{ code for ``churn'' command }
...
else
fprintf(stderr, "command not found\n");
This works, but can become unwieldy. Another technique is to write several separate functions, each
implementing one command, and then to build a table (typically an array of structures) associating the
command name as typed by the user with the function implementing that command. In the table, the
command name is a string and the function is represented by a function pointer. With this table in hand,
processing the user's command becomes a relatively simple matter of searching the table for the matching
command string and then calling the corresponding pointed-to function. (We'll see an example of this
technique in this week's assignment.)
Our last example concerns larger, more elaborate systems which manipulate many types of data. (Here, by
``types of data,'' I am referring to data structures used by the program, presumably implemented as
structs, but in any case more elaborate than C's basic data types.) It is often the case that similar
operations must be performed on dissimilar data types. The conventional way of implementing such
operations is to have the code for each operation look at the data type it's operating on and adjust its
behavior accordingly; in the worst case, each piece of code (for each operation) will have a long switch
statement or if/else chain switching among n separate, different ways of performing the operation on n
different data types. If a new data type is ever added, new cases must be added to all segments of the code
implementing all of the operations.
Another way of organizing such code is to place one or more function pointers right there in the data
structures describing each data type. These pointers point to functions which are streamlined and
optimized for performing the operations on just one data type. (Each piece of data would obviously have
its pointer(s) set to point to the function(s) for its own data type.) This idea is one of the cornerstones of
Object-Oriented Programming. We could use a version of it in our adventure game, too: rather than
writing new, global pieces of code implementing each new command verb we want to let the user type,
and then making each of those pieces of code check all sorts of attributes to ensure that the command can't
be used on inappropriate objects (``break water with cup'', ``light candle with bucket,'' etc.) we could
attach special-purpose pieces of code to the individual objects themselves (via function pointers, of
course) and arrange that the code would only fire up if the player were trying to use the relevant object.


It's generally a good idea to have a function prototype in scope whenever you call a function. Function
prototypes allow the compiler to generate correct code for function calls, and to verify that you've called
a function with the correct number and type of arguments. Standard library functions are prototyped by
including the relevant standard header files, and it's generally recommended that you write prototype
declarations for your own functions, too, usually placing those declarations in your own header files. But
what prototype can be used for an indirect function call using a function pointer, such as
(*pfi)(arg1, arg2)
In general, it won't be known until run time what function pfi actually points to, so there's no way for
the compiler to check the call against the prototype of the actually-called function. (We may know that
pfi points to f1, f2, or f3, and we may have supplied prototypes for those functions, but that's
immaterial.)
We've seen that when you declare a function pointer, you must declare what the return value of the
pointed-to function will be. It's also possible to specify what the prototype of the pointed-to function will
be. Here's our earlier declaration of pfi, beefed up with a prototype for the arguments:
int (*pfi)(int, int);
Now we know that pfi is a pointer to a function, that the function (whatever it is) accepts two int
arguments, and that the function returns an int. Having specified this, the compiler will now be able to
do some more checking for us. If we call
(*pfi)(1, 2, 3)
the compiler will complain, because it knows that the function pointed to by pfi is supposed to receive
two arguments, but we've passed three. The compiler is also in a position to verify that we actually set
pfi to point to functions that accept two int arguments. Our examples so far were in terms of functions
which we declared as
extern int f1();
extern int f2();
extern int f3();
that is, as functions taking unspecified arguments and returning int. (Remember, empty parentheses in
an external function declaration indicate that the function accepts unspecified arguments, while empty
parentheses in a function definition indicate that the function accepts no arguments.) So the compiler
won't be able to check the assignments unless we also provide prototypes:
extern int f1(int, int);

Now, when we assign
pfi = f1;
the compiler can verify that the function pointer being assigned is to a function which accepts two int
arguments, as pfi expects. If, on the other hand, we declared and assigned
extern int x(int);
pfi = x;
the compiler would complain, because pfi is supposed to point to a function which accepts two int
arguments, and the eventual call to (*pfi)() is going to be verified to be a call passing two
arguments, so assigning pfi to point to x (which accepts a single argument) is incorrect.

Chapter 25: Variable-Length Argument Lists
Chapter 25: Variable-Length Argument

Lists
All of the functions we've written so far have taken a fixed number of arguments, and we've been careful
to call the functions always passing them the correct number of arguments, and we've been tending to use
explicit function prototype declarations so that the compiler can verify that we call functions with the
correct number of arguments. But what about printf? Sometimes we call it with one argument, but
often we pass it additional arguments. Why doesn't the compiler complain? How can printf access the
extra arguments we pass to it, when it can't know how many of them there will be or what types they will
have, such that it can't possibly declare conventional function parameters for them?
In this chapter we'll discuss the variable-length argument lists (often discussed under the shorthand term
``varargs'') which allow functions such as printf to be written.
25.1 Declaring ``varargs'' Functions
25.2 Writing a ``varargs'' Function
25.3 Special Issues with Varargs Functions


The ANSI/ISO C Standard requires that all functions which accept a variable number of arguments be
declared explicitly to do so, and also that a function prototype be `ìn scope'' (that is, available) whenever
a varargs function is called. You may remember that, under certain circumstances, unknown functions
can be called at will, with the compiler assuming that they take ``normal'' arguments and return int.
However, that exception does not apply here. (In other words, for varargs functions, function prototypes
aren't just a good idea, they're the law.) The reason that prototypes are strictly required is that varargs
functions may use special calling sequences; that is, the compiler may have to generate special code for
these calls.
The presence of a variable-length argument list is indicated by an ellipsis in the prototype. For example,
the prototype for printf, as found in <stdio.h>, looks something like this:
extern int printf(const char *, ...);
Those three dots ... don't mean that I left something out; they are the ellipsis notation; this is the syntax
that C uses to indicate the presence of a variable-length argument list. This prototype says that printf's
first argument is of type const char *, and that it takes a variable (and hence unspecified) number of
additional arguments.
The ellipsis notation must follow the ordinary, fixed arguments, and there must be at least one fixed
argument. It is impossible in Standard C to define a function which accepts only variable arguments. (In
general, this is not too much of a restriction, because the function must always be able to determine, for
itself, how many arguments there are in the list, invariably by inspecting the first argument(s) in the list.
In other words, the function will generally need at least one well-defined, fixed argument anyway, to get
a toehold on the problem of figuring out what the other arguments are.)

http://www.eskimo.com/~scs/cclass/int/sx11a.html [22/07/2003 5:35:23 PM]

In ANSI C, the head of a varargs function looks just like its prototype. We will illustrate by writing our
own, stripped-down version of printf. The bare outline of the function definition will look like this:
void myprintf(const char *fmt, ...)
{
}
printf's job, of course, is to print its format string while looking for % characters and treating them
specially. So the main loop of printf will look like this:
#include <stdio.h>
{
const char *p;
for(p = fmt; *p != '\0'; p++)
{
if(*p != '%')
putchar(*p);
else
{
handle it specially
}
}
}
In this stripped-down version, we won't worry about width and precision specifiers and other modifiers;
we'll always look at the very next character after the % and assume that it's the primary format character.
Continuing to flesh out our outline, we get this:
#include <stdio.h>
{
const char *p;
for(p = fmt; *p != '\0'; p++)
{
if(*p != '%')
{
putchar(*p);
continue;
}
switch(*++p)
{
case 'c':
fetch and print a character
break;
case 'd':
fetch and print an integer
break;
case 's':
fetch and print a string
break;
case 'x':
print an integer, in hexadecimal
break;
case '%':
print a single %
break;
}
}
}
(For clarity, we've rearranged the former if/else statement slightly. If the character we're looking at is
not %, we print it out and continue immediately with the next iteration of the for loop. This is a good
example of the use of the continue statement. Everything else in the body of the loop then takes care
of the case where we are looking at a %.)
Printing these various argument types out will be relatively straightforward. The $64,000 question, of
course, is how to fetch the actual arguments. The answer involves some specialized macros defined for
us by the standard header <stdarg.h>. The macros we will use are va_list, va_start(),
va_arg(), and va_end(). va_list is a special ``pointer'' type which allows us to manipulate a
variable-length argument list. va_start() begins the processing of an argument list, va_arg()
fetches arguments from it, and va_end() finishes processing. (Therefore, va_list is a little bit like
the stdio FILE * type, and va_start is a bit like fopen.)
Here is the final version of our myprintf function, illustrating the fetching, formatting, and printing of
the various argument types. (For simplicity--of presentation, if nothing else--the formatting step is
deferred to a version of the nonstandard but popular itoa function.)

#include <stdio.h>
#include <stdarg.h>
extern char *itoa(int, char *, int);
{
const char *p;
va_list argp;
int i;
char *s;
char fmtbuf[256];
va_start(argp, fmt);
for(p = fmt; *p != '\0'; p++)
{
if(*p != '%')
{
putchar(*p);
continue;
}
switch(*++p)
{
case 'c':
i = va_arg(argp, int);
putchar(i);
break;
case 'd':
s = itoa(i, fmtbuf, 10);
fputs(s, stdout);
break;
case 's':
s = va_arg(argp, char *);
fputs(s, stdout);
break;
case 'x':
s = itoa(i, fmtbuf, 16);
fputs(s, stdout);
break;
case '%':
putchar('%');
break;
}
}
va_end(argp);
}
Looking at the new lines, we have:
#include <stdarg.h>
This header file is required in any file which uses the variable argument list (va_) macros.
va_list argp;
This line declares a variable, argp, which we use while manipulating the variable-length argument list.
The type of the variable is va_list, a special type defined for us by <stdarg.h>.
This line initializes argp and initiates the processing of the argument list. The second argument to
va_start() is simply the name of the function's last fixed argument. va_start() uses this to
figure out where the variable arguments begin.
And here's the heart of the matter. va_arg() fetches the next argument from the argument list. The
second argument to va_arg() is the type of the argument we expect. Notice carefully that we must
supply this argument, which implies that we must somehow know what type of argument to expect next.
The variable-length argument list machinery does not know. In this case, we know what the type of the
next argument should be because it's supposed to match the format character we're processing. We can
see, then, why such havoc results when printf's arguments do not match its format string: printf
tells the va_arg machinery to grab an argument of one type, with the type determined by one of the
format specifiers, but since the va_arg machinery doesn't know what the actual argument type is,
there's no way for it to do any automatic conversion. If the actual argument has the right type for the
va_arg call which grabs it (as of course it's supposed to), it works, otherwise it doesn't.
(You may have noticed that we fetched the character to print for %c as an int, not a char. That's
deliberate, and is explained in the next section.)
s = va_arg(argp, char *);
Here's another invocation of va_arg(), this time fetching a string, represented as a character pointer,
or char *.
va_end(argp);
Finally, when we're all finished processing the argument list, we call va_end(), which performs any
necessary cleanup.


When a function with a variable-length argument list is called, the variable arguments are passed using C's
old ``default argument promotions.'' These say that types char and short int are automatically
promoted to int, and type float is automatically promoted to double. Therefore, varargs functions will
never receive arguments of type char, short int, or float. Furthermore, it's an error to ``pass'' the
type names char, short int, or float as the second argument to the va_arg() macro. Finally, for
vaguely related reasons, the last fixed argument (the one whose name is passed as the second argument to the
va_start() macro) should not be of type char, short int, or float, either.
A frequently-asked question is, ``How can I determine how many arguments my function was actually called
with?'' The answer, as discussed above, is that you (or your code) must figure it out somehow, generally by
looking at the arguments themselves (or, in the case of printf, by using clues which are designed into the
first, fixed arguments). There is no Standard way of asking the compiler or run-time system how many
arguments were actually passed, or what their types are.
The macros va_start() and va_arg() are referred to as macros because they can't possibly be
functions. va_start() initializes (that is, sets the value of) its first argument, and it uses its second
argument not as a value but as a location. Even more unusually, va_arg() accepts a type name as its
second argument, which no function in C, indeed no anything in C other than sizeof, can ever do. Finally,
va_arg() has no one return type (as it would have to if it were a function); the type it ``returns'' is
determined by its second argument.
The va_list type, whatever it is, is a mostly normal type. In particular, you can pass it on to other
functions, and it is frequently quite useful to do so. For example, suppose that you want to write an error
function, which will print nicely annotated error messages complete with filenames, line numbers, severity
indicators, and the like. Furthermore, suppose that the rest of your program will find it useful, when calling
this error function, to embed % sequences in the string to be printed, requesting that extra arguments be
interpolated, just like printf. The obvious question is, will the error function have to duplicate all of
printf's code for parsing format strings and formatting variable arguments (which isn't impossible, as
we've seen), or can it somehow call on printf or a related function to do most of the work?
To be precise, here's the outline of the error function we'd like to write:
extern char *filename;
extern int lineno;
/* current input file name */

/* current line number */
void error(char *msg, ...)

{
fprintf(stderr, "%s, line %d: error:", filename, lineno);
fprintf(stderr, msg, what goes here? );
fprintf(stderr, "\n");
}
The tricky line is the second call to fprintf. We have the string, msg, we want to print, possibly
containing % characters. How do we pass down to fprintf the extra arguments which our caller passed to
us?
The answer is that we don't; there's no way to say ``call this function with the same arguments I got, however
many of them there are.'' (The run-time system simply doesn't have enough information to do this sort of
thing, which is why it can't tell you how many arguments you got called with, either.) However, there's a
variant version of fprintf which is designed for just this sort of situation. The variant is called
vfprintf (where the v stands for ``varargs''), and a call to it looks something like this:
void error(char *msg, ...)
{
va_list argp;
fprintf(stderr, "%s, line %d: error:", filename, lineno);
va_start(argp, msg);
vfprintf(stderr, msg, argp);
va_end(argp);
}
We declare a local variable of type va_list and call va_start(), as before. However, all we do with
our argp variable is pass it to vfprintf as its third argument. vfprintf then does all the work--if we
could look inside it, it would look a lot like our version of printf above, except that vfprintf does not
call va_start(), because its caller already has. Notice that vfprintf does not accept a variable number
of arguments; it accepts exactly three arguments, the third of which is essentially a ``pointer'' to the extra
arguments it will need.
There are also ``varargs'' versions of printf and sprintf, namely vprintf and vsprintf. These
follow the same pattern, accepting a single last argument of type va_list in lieu of an actual variablelength argument list.
(Notice that the error function above also called va_end(). This makes sense, since error was the one
who called va_start(). The above pattern works, but more complicated ones may not. For example, it's
not guaranteed that you can pick a few arguments off of a va_list, pass the va_list to a subfunction to
pick a few more off, and then pick the last ones off yourself. Also, there's no direct way to ``rewind'' a
va_list, although it's permissible to call va_end() and then va_start() again, to start over again.)

Copyright
This collection of hypertext pages is Copyright 1996 by Steve Summit. This material may be freely
http://www.eskimo.com/~scs/cclass/int/copyright.html [22/07/2003 5:35:30 PM]
C Programming Notes
C Programming Notes
A Short Introduction to Programming
Steve Summit
At its most basic level, programming a computer simply means telling it what to do, and this vapidsounding definition is not even a joke. There are no other truly fundamental aspects of computer
programming; everything else we talk about will simply be the details of a particular, usually artificial,
mechanism for telling a computer what to do. Sometimes these mechanisms are chosen because they
have been found to be convenient for programmers (people) to use; other times they have been chosen
because they're easy for the computer to understand. The first hard thing about programming is to learn,
become comfortable with, and accept these artificial mechanisms, whether they make ``sense'' to you or
not.
In fact, you shouldn't worry if some (or even many) of the mechanisms used for programming a
computer don't make sense. It doesn't make sense that the cold water faucet has to be on the right side
and the hot one has to be on the left; that's just the convention we've settled on. Similarly, many
computer programming mechanisms are quite arbitrary, and were chosen not because of any theoretical
motivation but simply because we needed an unambiguous way to say something to a computer.
In this introduction to programming, we'll talk about several things: skills needed in programming, a
simplified programming model, elements of real programming languages, computer representation of
numbers, characters and strings, and compiler terminology.
[A German translation of this essay is also available.]
Skills Needed in Programming

Simplified Programming Model
Real Programming Model
Elements of Real Programming Languages
Computer Representation of Numbers
http://www.eskimo.com/~scs/cclass/progintro/top.html (1 of 2) [22/07/2003 5:35:34 PM]
C Programming Notes
Characters, Strings, and Numbers

Compiler Terminology
Read Sequentially
http://www.eskimo.com/~scs/cclass/progintro/top.html (2 of 2) [22/07/2003 5:35:34 PM]

I'm not going to claim that programming is easy, but I am going to say that it is not hard for the reasons
people usually assume it is. Programming is not a deeply theoretical subject like Chemistry or Physics;
you don't need an advanced degree to do well at it. (There are important principles of Computer Science,
but it's possible to get a degree after studying them and to have only vague ideas of how to apply them to
practical programming. Contrariwise, we'll experience many important Computer Science lessons by the
seat of our pants, without bewildering ourselves with abstract notation; there are plenty of successful
programmers who don't have Computer Science degrees.)
Comparing programming to some physical tasks, programming does not require some innate talent or
skill, like gymnastics or painting or singing. You don't have to be strong or coordinated or graceful or
have perfect pitch. Programming does, however, require care and craftsmanship, like carpentry or
metalworking. If you've ever taken a shop class, you may remember that some students seemed to be
able to turn out beautiful projects effortlessly, while other students were all thumbs and made the exact
mistakes that the teacher told them not to make. What distinguished the successful students was not that
they were better or smarter, but just that they paid more attention to what was going on and were more
careful and deliberate about what they were doing. (Perhaps care and attention are innate skills too, like
gymnastic ability; I don't know.)
Some things you do need are (1) attention to detail, (2) stupidity, (3) good memory, and (4) an ability to
think abstractly, and on several levels. Let's look at these qualities in a bit more detail:
1. attention to detail
In programming, the details matter. Computers are incredibly stupid (more on this in a minute).
You can't be vague; you can't describe your program 3/4 of the way and then say ``Ya know what
I mean?'' and have the compiler figure out the rest. You have to dot your i's and cross your t's. If
the language says you have to declare variables before using them, you have to. If the language
says you have to use parentheses here and square brackets there and squiggly braces some third
place, you have to.
2. stupidity
Computers are incredibly stupid. They do exactly what you tell them to do: no more, no less. If
you gave a computer a bottle of shampoo and told it to read the directions and wash its hair, you'd
better be sure it was a big bottle of shampoo, because the computer is going to wet hair, lather,
rinse, repeat, wet hair, lather, rinse, repeat, wet hair, lather, rinse, repeat, wet hair, lather, rinse,
repeat, ...
I saw an ad by a microprocessor manufacturer suggesting the ``smart'' kinds of appliances we'd
have in the future and comparing them to ``dumb'' appliances like toasters. I believe they had it
backwards. A toaster (an old-fashioned one, anyway) has two controls, and one of them is
http://www.eskimo.com/~scs/cclass/progintro/sx1.html (1 of 3) [22/07/2003 5:35:36 PM]
optional: if you don't set the darkness control, it'll do the best it can. You don't have to tell it how
many slices of bread you're toasting, or what kind. (``Modern'' toasters have begun to reverse this
trend...) Compare this user interface to most microwave ovens: they won't even let you enter the
cooking time until you've entered the power level.
When you're programming, it helps to be able to ``think'' as stupidly as the computer does, so that
you're in the right frame of mind for specifying everything in minute detail, and not assuming that
the right thing will happen unless you tell it to.
(This is not to say that you have to specify everything; the whole point of a high-level
programming language like C is to take some of the busiwork burden off the programmer. A C
compiler is willing to intuit a few things--for example, if you assign an integer variable to a
floating-point variable, it will supply a conversion automatically. But you have to know the rules
for what the compiler will assume and what things you must specify explicitly.)
3. good memory
There are a lot of things to remember while programming: the syntax of the language, the set of
prewritten functions that are available for you to call and what parameters they take, what
variables and functions you've defined in your program and how you're using them, techniques
you've used or seen in the past which you can apply to new problems, bugs you've had in the past
which you can either try to avoid or at least recognize by their symptoms. The more of these
details you can keep in your head at one time (as opposed to looking them up all the time), the
more successful you'll be at programming.
4. ability to abstract, think on several levels
This is probably the most important skill in programming. Computers are some of the most
complex systems we've ever built, and if while programming you had to keep in mind every
aspect of the functioning of the computer at all levels, it would be a Herculean task to write even a
simple program.
One of the most powerful techniques for managing the complexity of a software system (or any
complex system) is to compartmentalize it into little ``black box'' processes which perform useful
tasks but which hide some details so you don't have to think about them all the time.
We compartmentalize tasks all the time, without even thinking about it. If I tell you to go to the
store and pick up some milk, I don't tell you to walk to the door, open the door, go outside, open
the car door, get in the car, drive to the store, get out of the car, walk into the store, etc. I
especially don't tell you, and you don't even think about, lifting each leg as you walk, grasping
door handles as you open them, etc. You never (unless perhaps if you're gravely ill) have to worry
about breathing and pumping your blood to enable you to perform all of these tasks and subtasks.
We can carry this little example in the other direction, as well. If I ask you to make some ice
cream, you might realize that we're out of milk and go and get some without my asking you to. If
I ask you to help put on a party for our friends, you might decide to make ice cream as part of that
larger task. And so on.

Compartmentalization, or abstraction, is a vital skill in programming, or in managing any
complex system. Despite what I said in point 3 above, we can only keep a small number of things
in our head at one time. A large program might have 100,000 or 1,000,000 or 10,000,000 lines of
code. If it were necessary to understand all of the lines together and at once to understand the
program, the program would be impossible to write or understand. Only if it is possible to think
about small pieces in isolation will it ever be possible to work with a large program.
Compartmentalization, powerful though it is, is not automatic, and not necessarily an instant cure
for all of our organizational problems. We carry a lot of assumptions around about how various
things work, and things work well only as long as these assumptions hold. To return to the
previous example, if I ask you to go the store and get some milk, I'm assuming that you know
which kind to get, where the store is, how to get there, how to drive if you need to, etc. If some of
these assumptions weren't valid, or if there were several options for any of them, we might have to
modify the way I gave you instructions. I might have to tell you to drive to the store, or to go to
Safeway, or to get some two percent milk.
Therefore, we can't simply compartmentalize all of our processes and subprocesses and forget
about complexity problems forever. We have to remember at least some of the assumptions
surrounding the compartmentalization scheme. We have to remember what we can and can't
expect from the processes (people, computer programs, etc.) which we call on to do tasks for us.
We have to make sure that we keep our end of the bargain and don't fall down on any of the
commitments and promises we've made on the tasks we've been asked to do and which others are
assuming we'll keep.
Thinking about the mechanics of a design hierarchy, while also using that hierarchy to avoid
having to think about every detail of it at every level all of the time, is one of the things I mean by
``thinking on several levels.'' It's tricky to do (obviously, it's tricky even to describe), but it's the
only way to cut through large, complex problems.
What's hard about programming (besides maybe having trouble with the four traits above) is mostly
picky little detail and organizational problems, and people problems. A large program is a terribly
complex system; a large programming project worked on by many people has to work very hard at
peripheral, picayune tasks like documentation and communication if the project is to avoid drowning in a
flood of little details and bugs.


Imagine an ordinary pocket calculator which can add, subtract, multiply, and divide, and which has a few
memory registers which you can store numbers in. At a grossly oversimplified level (so simple that we'll
abandon it in just a minute), we can think of a computer as a calculator which is able to push its own
buttons. A computer program is simply the list of instructions that tells the computer which buttons to
push. (Actually, there are ``keystroke programmable'' calculators which you program in just about this
way.)
Imagine using such a calculator to perform the following task:
Given a list of numbers, compute the average of all the numbers, and also find the largest
number in the list.
You can imagine giving the calculator a list of instructions for performing this task, or you can imagine
giving a list of instructions to a very stupid but very patient person who is able to follow instructions
blindly but accurately, as long as the instructions consist of pushing buttons on the calculator and making
simple yes/no decisions. (For our purposes just now, either imaginary model will work.)
Your instructions might look something like this:
``We're going to use memory register 1 to store the running total of all the numbers,
memory register 2 to store how many numbers we've seen, and register 3 to store the
largest number we've seen. For each number in the input list, add it to register 1. Add 1 to
register 2. If the number you just read is larger than the number in register 3, store it in
register 3. When you've read all the numbers, divide register 1 by register 2 to compute the
average, and also retrieve the largest number from register 3.''
There are several things to notice about the above list of instructions:
1. The first sentence, which explains what the registers are used for, is more for our benefit than the
entity who will be pushing the buttons on the calculator. The entity pushing the buttons doesn't care what
the numbers mean, it just manipulates them as directed. Similarly, the words ``to compute the average''
and ``largest'' in the last sentence are extraneous; they don't tell the entity pushing the button anything it
needs to know (or that it can even understand).
2. The instructions use the word `ìt'' several times. Even in English, where we're used to a certain
amount of ambiguity which we can usually work out from the context, pronouns like `ìt'' can cause
problems in sentences, because sometimes it isn't obvious what they mean. (For example, in the
preceding sentence, does ``they'' refer to ``pronouns,'' ``problems,'' or ``sentences?'') In programming,
you can never get away with ambiguity; you have to be quite precise about which `ìt'' you're referring to.
3. The instructions are pretty vague about the details of reading the next number in the input list and
detecting the end of the list.
4. The ``program'' contains several bugs! It uses registers 1, 2, and 3, but we never say what to store in
them in the first place. Unless they all happen to start out containing zero, the average or maximum value
computed by the ``program'' will be incorrect. (Actually, if all of the numbers in the list are negative,
having register 3 start out as 0 won't work, either.)
Here is a somewhat more detailed version of the ``program,'' which removes some of the extraneous
information and ambiguity, makes the input list handling a bit more precise, and fixes at least some of the
bugs. (To make the concept of ``the number just read from the list'' unambiguous, this ``program'' stores
it in register 4, rather than referring to it by `ìt.'' Also, for now, we're going to assume that the numbers
in the input list are non-negative.)
``Store 0 in registers 1, 2, and 3. Read the next number from the list. If you're at the end of
the list, you're done. Otherwise, store the number in register 4. Add register 4 to register 1.
Add 1 to register 2. If register 4 is greater than register 3, store register 4 in register 3.
When you're done, divide register 1 by register 2 and print the result, and print the contents
of register 3.''
When we add the initialization step (storing 0 in the registers), we realize that it's not quite obvious
which steps happen once only and which steps happen once for each number in the input list (that is,
each time through the processing loop). Also, we've assumed that the calculator can do arithmetic
operations directly into memory registers. To make the loop boundaries explicit, and the calculations
even simpler (assuming that all the calculator can do is store or recall memory registers from or to the
display, and do calculations in the display), the instructions would get more elaborate still:
``Store 0 in register 1. Store 0 in register 2. Store 0 in register 3. Here is the start of the
loop: read the next number from the list. If you're at the end of the list, you're done.
Otherwise, store the number in register 4. Recall from register 1, recall from register 4, add
them, store in register 1. Recall from register 2, add 1, store in register 2. Recall from
register 3, recall from register 4, if greater store in register 3. Go back to the beginning of
the loop. When you're done: recall from register 1, recall from register 2, divide them,
print; recall from register 3, print.''
We could continue to ``dumb down'' this list of instructions even further, but hopefully you're getting the
point: the instructions we use when programming computers have to be very precise, and at a level of
pickiness and detail which we don't usually use with each other. (Actually, things aren't quite as bad as
these examples might suggest. The ``dumbing down'' we've been doing has been somewhat in the
direction of assembly language, which wise programmers don't use much any more. In a higher-level
language such as C, you don't have to worry so much about register assignment and individual arithmetic
operators.)
Real computers can do quite a bit more than 4-function pocket calculators can; for one thing, they can
manipulate strings of text and other kinds of data besides numbers. Let's leave pocket calculators behind,
and start looking at what real computers (at least under the control of programming languages like C) can
do.


A computer program consists of two parts: code and data. The code is the set of instructions for
performing a task, and the data is the set of ``registers'' or ``memory locations'' which contain the
intermediate results which are used as the program performs its calculations.
Note that the code is relatively static while the data is dynamic. Once you've gotten a program working,
its code won't change, but every time you run it, it will typically be working with different data, so the
memory locations will take on different values.
Once you've written a program, you've defined a new thing that your computer can do. The applications
(text and graphic editors, spreadsheets, games, etc.) which your computer may already have are ``just''
programs, written by programmers using programming languages such as the one you're about to learn.

http://www.eskimo.com/~scs/cclass/progintro/sx3.html [22/07/2003 5:35:40 PM]
Elements of Real Programming

Languages
There are several elements which programming languages, and programs written in them, typically
contain. These elements are found in all languages, not just C. If you understand these elements and what
they're for, not only will you understand C better, but you'll also find learning other programming
languages, and moving between different programming languages, much easier.
1. There are variables or objects, in which you can store the pieces of data that a program is working on.
Variables are the way we talk about memory locations (data), and are analogous to the ``registers'' in our
pocket calculator example. Variables may be global (that is, accessible anywhere in a program) or local
(that is, private to certain parts of a program).
2. There are expressions, which compute new values from old ones.
3. There are assignments which store values (of expressions, or other variables) into variables. In many
languages, assignment is indicated by an equals sign; thus, we might have
b = 3
or
c = d + e + 1
The first sets the variable b to 3; the second sets the variable c to the sum of the variables d plus e plus
1.
The use of an equals sign can be mildly confusing at first. In mathematics, an equals sign indicates
equality: two things are stated to be inherently equal, for all time. In programming, there's a time
element, and a notion of cause-and-effect: after the assignment, the thing on the left-hand side of the
assignment statement is equal to what the stuff on the right-hand side was before. To remind yourself of
this meaning, you might want to read the equals sign in an assignment as ``gets'' or ``receives'': a = 3
means `à gets 3'' or `à receives 3.''
(A few programming languages use a left arrow for assignment
a <-- 3
to make the ``receives'' relation obvious, but this notation is not too popular, if for no other reason than
that few character sets have left arrows in them, and the left arrow key on the keyboard usually moves
the cursor rather than typing a left arrow.)
If assignment seems natural and unconfusing so far, consider the line
i = i + 1
What can this mean? In algebra, we'd subtract i from both sides and end up with
0 = 1
which doesn't make much sense. In programming, however, lines like
i = i + 1
are extremely common, and as long as we remember how assignment works, they're not too hard to
understand: the variable i receives (its new value is), as always, what we get when we evaluate the
expression on the right-hand side. The expression says to fetch i's (old) value, and add 1 to it, and this
new value is what will get stored into i. So i = i + 1 adds 1 to i; we say that it increments i.
(We'll eventually see that, in C, assignments are just another kind of expression.)
4. There are conditionals which can be used to determine whether some condition is true, such as
whether one number is greater than another. (In some languages, including C, conditionals are actually
expressions which compare two values and compute a ``true'' or ``false'' value.)
5. Variables and expressions may have types, indicating the nature of the expected values. For instance,
you might declare that one variable is expected to hold a number, and that another is expected to hold a
piece of text. In many languages (including C), your declarations of the names of the variables you plan
to use and what types you expect them to hold must be explicit.
There are all sorts of data types handled by various computer languages. There are single characters,
integers, and ``real'' (floating point) numbers. There are text strings (i.e. strings of several characters),
and there are arrays of integers, reals, or other types. There are types which reference (point at) values of
other types. Finally, there may be user-defined data types, such as structures or records, which allow the
programmer to build a more complicated data structure, describing a more complicated object, by
accreting together several simpler types (or even other user-defined types).
6. There are statements which contain instructions describing what a program actually does. Statements
may compute expressions, perform assignments, or call functions (see below).
7. There are control flow constructs which determine what order statements are performed in. A certain
statement might be performed only if a condition is true. A sequence of several statements might be
repeated over and over, until some condition is met; this is called a loop.
8. An entire set of statements, declarations, and control flow constructs can be lumped together into a
function (also called routine, subroutine, or procedure) which another piece of code can then call as a
unit.
When you call a function, you transfer control to it and wait for it to do its job, after which it returns to
you; it may also return a value as a result of what it has done. You may also pass values to the function
on which it will operate or which otherwise direct its work.
Placing code into functions not only avoids repetition if the same sequence of actions must be performed
at several places within a program, but it also makes programs easy to understand, because you can see
that some function is being called, and performing some (presumably) well-defined subtask, without
always concerning yourself with the details of how that function does its job. (If you've ever done any
knitting, you know that knitting instructions are often written with little sub-instructions or patterns
which describe a sequence of stitches which is to be performed multiple times during the course of the
main piece. These sub-instructions are very much like function calls in programming.)
9. A set of functions, global variables, and other elements makes up a program. An additional wrinkle is
that the source code for a program may be distributed among one or more source files. (In the other
direction, it is also common for a suite of related programs to work closely together to perform some
larger task, but we'll not worry about that ``large scale integration'' for now.)
10. In the process of specifying a program in a form suitable for a compiler, there are usually a few
logistical details to keep track of. These details may involve the specification of compiler parameters or
interdependencies between different functions and other parts of the program. Specifying these details
often involves miscellaneous syntax which doesn't fall into any of the other categories listed here, and
which we might lump together as ``boilerplate.''
Many of these elements exist in a hierarchy. A program typically consists of functions and global
variables; a function is made up of statements; statements usually contain expressions; expressions
operate on objects. (It is also possible to extend the hierarchy in the other direction; for instance,
sometimes several interrelated but distinct programs are assembled into a suite, and used in concert to
perform complex tasks. The various `òffice'' packages--integrated word processor, spreadsheet, etc.--are
an example.)
As we mentioned, many of the concepts in programming are somewhat arbitrary. This is particularly so
for the terms expression, statement, and function. All of these could be defined as `àn element of a
program that actually does something.'' The differences are mainly in the level at which the ``something''
is done, and it's not necessary at this point to define those ``levels.'' We'll come to understand them as we
begin to write programs.

An analogy may help: Just as a book is composed of chapters which are composed of sections which are
composed of paragraphs which are composed of sentences which are composed of words (which are
composed of letters), so is a program composed of functions which are composed of statements which
are composed of expressions (which are in fact composed of smaller elements which we won't bother to
define). Analogies are never perfect, though, and this one is weaker than most; it still doesn't tell us
anything about what expressions, statements, and functions really are. If `èxpression'' and ``statement''
and ``function'' seem like totally arbitrary words to you, use the analogy to understand that what they are
is arbitrary words describing arbitrary levels in the hierarchical composition of a program, just as
``sentence,'' ``paragraph,'' and ``chapter'' are different levels of structure within a book.
The preceding discussion has been in very general terms, describing features common to most
``conventional'' computer languages. If you understand these elements at a relatively abstract level, then
learning a new computer language becomes a relatively simple matter of finding out how that language
implements each of the elements. (Of course, you can't understand these abstract elements in isolation; it
helps to have concrete examples to map them to. If you've never programmed before, most of this section
has probably seemed like words without meaning. Don't spend too much time trying to glean all the
meaning, but do come back and reread this handout after you've started to learn the details of a particular
programming language such as C.)
Finally, there's no need to overdo the abstraction. For the simple programs we'll be writing, in a language
like C, the series of calculations and other operations that actually takes place as our program runs is a
simpleminded translation (into terms the computer can understand) of the expressions, statements,
functions, and other elements of the program. Expressions are evaluated and their results assigned to
variables. Statements are executed one after the other, except when the control flow is modified by
if/then conditionals and loops. Functions are called to perform subtasks, and return values to their callers,
which have been waiting for them.


Most computers represent integers as binary numbers (see the ``math refresher'' handout) with a certain
number of bits. A computer with 16-bit integers can represent integers from 0 to 65,535 (that is, from 0
to 2<sup>16</sup>-1), or if it chooses to make half of them negative, from -32,767 to 32,767. (We
won't get into the details of how computers handle negative numbers right now.) A 32-bit integer can
represent values from 0 to 4,294,967,295, or +-2,147,483,647.
Most of today's computers represent real (i.e. fractional) numbers using exponential notation. (Again, see
``math refresher'' handout. Actually, deep down inside, computers usually use powers of 2 instead of
powers of 10, but the difference isn't important to us right now.) The advantage of using exponential
notation for real numbers is that it lets you trade off the range and precision of values in a useful way.
Since there's an infinitely large number of real numbers (and in three directions: very large, very small,
and very negative), it will never be possible to represent all of them (without using potentially infinite
amounts of space).
Suppose you decide to give yourself six decimal digits' worth of storage (that is, you decide to devote an
amount of memory capable of holding six digits) for each value. If you put three digits to the left and
three to the right of the decimal point, you could represent numbers from 999.999 to -999.999, and as
small as 0.001. (Furthermore, you'd have a resulution of 0.001 everywhere: you could represent 0.001
and 0.002, as well as 999.998 and 999.999.) This would be a workable scheme, although 0.001 isn't a
very small number and 999.999 isn't a very big one.
If, on the other hand, you used exponential notation, with four digits for the base number and two digits
for the exponent, you could represent numbers from 9.999 x 10<sup>99</sup> to -9.999 x
10<sup>99</sup>, and as small as 1 x 10<sup>-99</sup> (or, if you cheat, 0.001 x 10<sup>99</sup>). You can now represent both much larger numbers and much smaller; the tradeoff is that the
absolute resolution is no longer constant, and gets smaller as the absolute value of the numbers gets
larger. The number 123.456 can only be represented as 123.4, and the number 123,456 can only be
represented as 123,400. You can't represent 999.999 any more; you have to settle for 999.9 (9.999 x
10<sup>2</sup>) or 1000 (1.000 x 10<sup>3</sup>). You can't distinguish between 999.998 and
999.999 any more.
Since superscripts are difficult to type, computer programming languages usually use a slightly different
notation. For example, the number 1.234 x 10<sup>5</sup> might be indicated by 1.234e5, where
the letter e replaces the ``times ten to the'' part.
You will often hear real, exponential numbers numbers referred to on computers as ``floating point
numbers'' or simply ``floats,'' and you will also hear the term ``double'' which is short for ``doubleprecision floating point number.'' Some computers also use ``fixed point'' real numbers, (which work
along the lines of our ``three to the left, three to the right'' example of a few paragraphs back), but those
are comparatively rare and we won't need to discuss them.

It's important to remember that the precision of floating-point numbers is usually limited, and this can
lead to surprising results. The result of a division like 1/3 cannot be represented exactly (it's an infinitely
repeating fraction, 0.333333...), so the computation (1 / 3) x 3 tends to yield a result like 0.999999...
instead of 1.0. Furthermore, in base 2, the fraction 1/10, or 0.1 in decimal, is also an infinitely repeating
fraction, and cannot be represented exactly, either, so (1 / 10) x 10 may also yield 0.999999.... For these
reasons and others, floating-point calculations are rarely exact. When working with computer floating
point, you have to be careful not to compare two numbers for exact equality, and you have to ensure that
``round off error'' doesn't accumulate until it seriously degrades the results of your calculations.


The earliest computers were number crunchers only, but almost all more recent computers have the
ability to manipulate alphanumeric data as well. The computer, and our programming languages, tend to
maintain a strict distinction between numbers on the one hand and alphanumeric data on the other, so we
have to maintain that distinction in our own minds as well.
One fundamental component of a computer's handling of alphanumeric data is its character set. A
character set is, not surprisingly, the set of all the characters that the computer can process and display.
(Each character generally has a key on the keyboard to enter it and a bitmap on the screen which displays
it.) A character set consists of letters, numbers, punctuation, etc., but the point of this discussion is not so
much what the characters are but that we have to be careful to distinguish between characters, strings,
and numbers.
A character is, well, a single character. If we have a variable which contains a character value, it might
contain the letter À', or the digit `2', or the symbol `&'.
A string is a set of zero or more characters. For example, the string `ànd'' consists of the characters à',
`n', and `d'. The string ``K2!'' consists of the characters `K', `2', and `!'. The string ``.'' consists of the
single character `.', and the empty string ``'' consists of no characters at all. Not to belabor the point, but
the string ``123'' consists of the characters `1', `2', and `3', and the string ``4'' consists of the single
character `4'.
The last two examples illustrate some important and perhaps surprising or annoying distinctions. The
character `4' and the string ``4'' are conceptually different, and neither of them is quite the same as the
number 4. The string ``123'' consists of three characters, and it looks like the number 123 to us, but as far
as the computer is concerned it is just a string. The number 123 is, when used for ordinary numeric
purposes, not represented internally as a string of three characters (instead, it is typically represented as a
16- or 32-bit integer). When we have a string which contains a numeric value which we wish to
manipulate as a number, we must typically ask for the string to be explicitly converted to that number
somehow. Similarly, we may have reason to convert a number to a string of digits making up its decimal
representation.
We may also find ourselves needing to convert back and forth between characters and the numeric codes
which are assigned to each character in a character set. (For example, in the ASCII character set,the
character À' is code 65, the character `.' is code 46, and the character `4' is, perhaps surprisingly, code
52.)

C is a compiled language. This means that the programs you write are translated, by a program called a
compiler, into executable machine-language programs which you can actually run. Executable machinelanguage programs are self-contained and run very quickly. Since they are self-contained, you don't need
copies of the source code (the original programming-language text you composed) or the compiler in
order to run them; you can distribute copies of just the executable and that's all someone else needs to run
it. Since they run relatively quickly, they are appropriate for programs which will be written once and run
many times.
A compiler is a special kind of progam: it is a program that builds other programs. What happens is that
you invoke the compiler (as a program), and it reads the programming language statements that you have
written and turns them into a new, executable program. When the compiler has finished its work, you
then invoke your program (the one the compiler just built) to see if it works.
The main alternative to a compiled computer language or program is an interpreted one, such as BASIC.
An interpreted language is interpreted (by, not surprisingly, a program called an interpreter) and its
actions performed immediately. If you gave a copy of an interpreted program to someone else, they
would also need a copy of the interpreter to run it. No standalone executable machine-language binary
program is produced.
In other words, for each statement that you write, a compiler translates into a sequence of machine
language instructions which does the same thing, while an interpreter simply does it (where `ìt'' is
whatever the statement that you wrote is supposed to do).
The big advantage of an interpreted language is that your program runs right away; you don't have to
perform--and wait for--the separate tasks of compiling and then running your program. (Actually, on a
modern computer, neither compiling nor interpreting takes much time, so some of these distinctions
become less important.)
Actually, whether a language is compiled or interpreted is not always an inherent part of the language.
There are interpreters for C, and there are compilers for BASIC. However, most languages were designed
with one or the other mechanism in mind, and there are usually a few difficulties when trying to compile
a language which is traditionally interpreted, or vice versa.
The distinction between compilation and interpretation, while it is very significant and can make a big
difference, is not one to get worked up over. Most of the time, once you get used to the details of how
you get your programs to run, you don't need to worry about the distinction too much. But it is a useful
distinction to have a basic understanding of, and to keep in the back of your mind, because it will help
you understand why certain aspects of computer programming (and particular languages) work the way
they do.
When you're working with a compiled language, there are several mechanical details which you'll want
to be aware of. You create one or more source files which are simple text files containing your program,
written in whatever language you're using. You typically use a text editor to work with source files
(typically you don't want to use a full-fledged text editor, since the compiler won't understand its
formatting codes). You supply each source file (you may have one, or more than one) to the compiler,
which creates an object file containing machine-language instructions corresponding to your program.
Your program is not ready to run yet, however: if you called any functions which you didn't write (such
as the standard library functions provided as part of a programming language environment), you must
arrange for them to be inserted into your program, too. The task of combining object files together, while
also locating and inserting any library functions, is the job of the linker. The linker puts together the
object files you give it, noticing if you call any functions which you haven't supplied and which must
therefore be library functions. It then searches one or more library files (a library file is simply a
collection of object files) looking for definitions of the still-unresolved functions, and pulls in any that it
finds. When it's done, it either builds the final, executable file, or, if there were any errors (such as a
function called but not defined anywhere) complains.
If you're using some kind of an integrated programming environment, many of these steps may be taken
care of for you so automatically and seamlessly that you're hardly aware of them.

C Programming Notes
C Programming Notes
A Brief Refresher on Some Math Often Used in Computing
Steve Summit
There are a few mathematical concepts which figure prominently in programming. None of these involve
higher math (or even algebra, and as we'll see, knowing algebra too well can make one particular
programming idiom rather confusing). You don't have to understand these with deep mathematical
sophistication, but keeping them in mind will help some things make more sense.
Integers vs. real numbers

Exponential Notation
Binary Numbers
Boolean Algebra
Read Sequentially
http://www.eskimo.com/~scs/cclass/mathintro/top.html [22/07/2003 5:35:59 PM]

An integer is a number without a fractional part, a number you could use to count things (although
integers may also be negative). Mathematicians may distinguish between natural numbers and cardinal
numbers, and linguists may distinguish between cardinal numbers and ordinal numbers, but these
distinctions do not concern us here.
A real number is, for our purposes, simply a number with a fractional part. Since computers do not
typically implement real numbers exactly, it is not necessary or even meaningful to distinguish between
rational, irrational, and transcendental numbers.

http://www.eskimo.com/~scs/cclass/mathintro/sx1.html [22/07/2003 5:36:01 PM]
Exponential or Scientific Notation is simply a method of writing a number as a base number times some
power of ten. For example, we could write the number 2,000,000 as 2 x 10<sup>6</sup>, the number
0.00023 as 2.3 x 10<sup>-4</sup>, and the number 123.456 as 1.23456 x 10<sup>2</sup>.

Binary Numbers
Binary Numbers
Our familiar decimal number system is based on powers of 10. The number 123 is actually 100 + 20 + 3
or 1 x 10<sup>2</sup> + 2 x 10<sup>1</sup> + 3 x 10<sup>0</sup>.
The binary number system is based on powers of 2. The number 100101<sub>2</sub> (that is,
``100101 base two'') is 1 x 2<sup>5</sup> + 0 x 2<sup>4</sup> + 0 x 2<sup>3</sup> + 1 x
2<sup>2</sup> + 0 x 2<sup>1</sup> + 1 x 2<sup>0</sup> or 32 + 4 + 1 or 37.
We usually speak of the individual numerals in a decimal number as digits, while the ``digits'' of a binary
number are usually called ``bits.''
Besides decimal and binary, we also occasionally speak of octal (base 8) and hexadecimal (base 16)
numbers. These work similarly: The number 45<sub>8</sub> is 4 x 8<sup>1</sup> + 5 x
8<sup>0</sup> or 32 + 5 or 37. The number 25<sub>16</sub> is 2 x 16<sup>1</sup> + 5 x
16<sup>0</sup> or 32 + 5 or 37. (So 37<sub>10</sub>, 100101<sub>2</sub>,
45<sub>8</sub>, and 25<sub>16</sub> are all the same number.)

Boolean Algebra
Boolean Algebra
Boolean algebra is a system of algebra (named after the mathematician who studied it, George Boole)
based on only two numbers, 0 and 1, commonly thought of as ``false'' and ``true.'' Binary numbers and
Boolean algebra are natural to use with modern digital computers, which deal with switches and
electrical currents which are either on or off. (In fact, binary numbers and Boolean algebra aren't just
natural to use with modern digital computers, they are the fundamental basis of modern digital
computers.)
There are four arithmetic operators in Boolean algebra: NOT, AND, OR, and EXCLUSIVE OR.
NOT takes one operand (that is, applies to a single value) and negates it: NOT 0 is 1, and NOT 1 is 0.
AND takes two operands, and yields a true value if both of its operands are true: 1 AND 1 is 1, but 0
AND 1 is 0, and 0 AND 0 is 0.
OR takes two operands, and yields a true value if either of its operands (or both) are true: 0 OR 0 is 0, but
0 OR 1 is 1, and 1 OR 1 is 1.
EXCLUSIVE OR, or XOR, takes two operands, and yields a true value if one of its operands, but not
both, is true: 0 XOR 0 is 0, 0 XOR 1 is 1, and 1 XOR 1 is 0.
It is also possible to take strings of 0/1 values and apply Boolean operators to all of them in parallel;
these are sometimes called ``bitwise'' operations. For example, the bitwise OR of 0011 and 0101 is 0111.
(If it isn't obvious, what happens here is that each bit in the answer is the result of applying the
corresponding operation to the two corresponding bits in the input numbers.)

Handouts

Available Compiler Options
Compiling C Programs: Step by Step
Common Compiler Error Messages and Other Problems
Scientific Programming (in C)
World Wide Web CGI (Common Gateway Interface) Programming in C
Class notes:
Introductory C Programming Class Notes
Intermediate C Programming Class Notes
Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Assignments:
introductory class
intermediate class
http://www.eskimo.com/~scs/cclass/handouts/ [22/07/2003 5:36:12 PM]
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html

Steve Summit
In order to learn C you must write and run C programs, and in order to run C programs you will generally
need a ``C compiler.'' As explained elsewhere, a compiler is a special program which builds other
programs. Specifically, a compiler takes a program which you have written in a higher-level language
such as C, and translates or compiles that program into an equivalent machine-language program which
is suitable for execution on a computer's CPU. There are many different kinds of compilers, depending
on the input language to be compiled and the type of CPU for which code is to be generated.
The easiest compiler to use is one that is already set up and running somewhere. If you have access to a
timesharing system (such as an ISP with a ``shell prompt,'' or a multiuser system at a university), there is
a good chance that that system already has a C compiler. Similarly, if you have access (at work or in a
lab) to an engineering workstation, there is a good chance that it has a C compiler. Most Unix systems
have C compilers, and the freeware Linux operating system (a Unix clone which runs on, among other
machines, PC compatibles) comes with an excellent free C compiler, gcc (which we'll have more to say
about later).
Assuming that you don't have access to an existing C compiler but that you do have access to a personal
computer under your control, you can obtain and install your own C compiler for that machine. You have
several options, ranging from expensive to cheap to free, from simple to sophisticated, and from easy to
difficult. (Of course, since nothing in life is truly free, you probably won't be able to find an option which
is simultaneously free, easy to use, and perfectly matched to your own preferred level of simplicity or
sophistication.)
Commercial Compilers
Inexpensive Compilers
Shareware and Freeware Compilers
Important Notes
Read Sequentially
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html (1 of 2) [22/07/2003 5:36:14 PM]
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html (2 of 2) [22/07/2003 5:36:14 PM]
The most ``popular'' (whatever that means) compilers for MS-DOS and Windows systems are
manufactured by Microsoft and Inprise (formerly Borland). Both of these companies have several
compiler products to offer, including some cheap ones (under $100) targeted at individuals, and some
expensive ones ($600 to $1000 or more) targeted at professional software developers. There are
compilers for MS-DOS or Windows (though the former are becoming obsolete) and for C, C++, or both.
An inexpensive ($20 or $30) commercial option is Power C from Mix Software, 1132 Commerce Drive,
Richardson, TX 75801, 214-783-6001.
For the Macintosh, the two commercial compilers I know of are Think C by Symantec, and CodeWarrior
by Metroworks.

http://www.eskimo.com/~scs/cclass/handouts/avcompilers/sx1.html [22/07/2003 5:36:16 PM]
If you don't need the latest-and-greatest compiler (and, as we'll see, you don't), there are a number of
``budget'' alternatives. One is to find a used compiler--several computer stores and bookstores (including
Half-Price Books in the Seattle area) carry used software at a fraction of its original price. The software
is always a few years old, but it's fully functional and comes with complete documentation at a price
that's hard to beat. (Whenever I visit Half-Price Books, I just about always see copies of Microsoft,
Borland/Inprise, and Think C for $30-50, which when new would have cost $100-600.)
Another option (also to be found in bookstores) is to obtain a book with a title like ``Teach Yourself C in
N days--disk included!''. I've seen two or three titles along these lines, for between 30 and 50 dollars. The
disk contains (among other things) a stripped-down version of some popular, commercial compiler.
Since the book sells for less than the full-fledged compiler would by itself, the version of the compiler
included with the book is typically crippled in some way, designed to be useful for students learning C
but not for professionals doing large-scale software development. But since we're learning C, a
``learning'' compiler is fine, and even if the book is no good, the cover price may be worth it for the
compiler alone.


I've heard of a couple of shareware C compilers for MS-DOS, but I don't know much about them. One is
or was available in the file PCC12C.ZIP and might have been written by somebody named Mark De
Smet. Another, ``Pacific C'', is or was available at ftp.hitech.com.au/hitech/pacific.
A research group at Princeton generated a ``Retargetable C Compiler'' and made the results freely
available. Supposedly they have a compiler called lcc, although I've never tried it. You can find details
on their web page, http://www.cs.princeton.edu/software/lcc .
Finally, we come to the GNU C compiler, or gcc. This is an extremely high-quality compiler put out by
the folks at the Free Software Foundation. It was originally developed as a Unix compiler, and there are
plenty of professional programmers doing serious, commercial development using it. There are also ports
to other operating systems, including VMS and--though it seems impossible--MS-DOS. gcc is a huge
and powerful compiler and MS-DOS is a cramped, restricted quasi operating system, but somehow D J
Delorie got gcc working under it, by dint of a lot of hard work and a 32-bit DOS extender or two. The
result is called ``djgpp,'' and it's described on its own web page at http://www.delorie.com/djgpp/. (The
main, non-djgpp distribution of gcc is on the FSF's ftp server at prep.ai.mit.edu; see also the FSF home
page at http://www.fsf.org/.)
The downside of gcc (and, by extension, djgpp) is that it is huge, complex, and difficult to install. It is
explicitly ``not recommended for use in learning C.'' I have had students use it successfully, but
downloading and installing it (it's between 5 and 10 meg to download) is a daunting task.
I've heard that there's another, more recent port of gcc that's tailored for use with Windows, but I don't
know any details yet.
Alas, there are currently no shareware or freeware compilers that I know of for the Macintosh. (There
were two, Sesame C and Harvest C, but the former never worked and the latter only barely worked and
has been abandoned by its author. I've heard that there is a version of gcc which works on the
Macintosh, but it requires the Macintosh Programmer's Workbench which, though it's available free from
Apple, is nearly impossible to use.)

Important Notes
Important Notes
Whatever compiler you choose, make sure it's for the operating system you have on your computer. An
older MS-DOS compiler probably won't run well under Windows (though it may be possible to run it in
a ``DOS box''.) A newer Windows compiler certainly won't run under plain MS-DOS.
Also, it's preferable to have a C compiler, or a C++ compiler that also compiles C, as opposed to a
compiler that only compiles C++. As far as I know, most C++ compilers do have an optional mode in
which they act just like C compilers, and when you're writing C programs, this is obviously the mode
you want to use. (C++ compilers that also compile C are often called ``C/C++ compilers.'') If you find
yourself stuck with a compiler that compiles only C++, you may be able to use it (since C++ is mostly a
superset of C, and most C programs--especially simple ones--are also valid C++ programs), but it won't
be ideal.
Finally, as mentioned above, you don't need the latest-and-greatest of the brand-new compilers out there,
and in fact you may not even want a brand-new compiler. C is a stable language, and especially for the
simple programs we'll be writing, a compiler from a few years ago will be perfectly adequate. Some of
the fancier features that newer or more powerful compilers contain would only get in our way. Do make
sure that any compiler you get is new enough to run under your computer's operating system (or old
enough, if you're using an older operating system such as MS-DOS or Windows 3.1). Do make sure that
the compiler is `ÀNSI compatible'' (although you'd be hard-pressed to find one that was not, since the
Standard was ratified almost 10 years ago). Beyond that, it hardly matters which compiler you get, so
stick with something simple and inexpensive, which will be bound to serve you well while you're
learning.

Copyright
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/copyright.html [22/07/2003 5:36:22 PM]
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/index.html

Steve Summit
This handout contains reasonably detailed, step-by-step instructions for entering, compiling, and running
C programs, in a number of environments. Much of this information appears in other handouts as well, so
there is a certain amount of overlap here. If you've got a good handle on the compilation process, there's
no need to read this handout, but if you've been having difficulty, or prefer explicit instructions, read on.
Since everyone's situation is different (different environments, different compilers, different amounts of
experience, different learning styles), this handout will run through the process several times, with
increasing levels of detail. After the first two overviews, the discussion splits into threads for several
environments: Unix, DOS/Windows, and the Macintosh.
1. First Overview
2. Second Overview
3. Third Pass: Details
Read Sequentially
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/index.html [22/07/2003 5:36:24 PM]
1. First Overview
1. First Overview
In broad outline, there are three steps:
1. Type in or edit the C source code. It will go into an ordinary text file on your computer, with a
name of your own choosing, although the name must end in .c to indicate that it is a file full of C
source code.
2. Compile it. (Remember that ``compiling'' is the process performed by the compiler, the translation
of your program from the high-level C language which we're supposed to be able to understand,
into the low-level machine language which the processor can understand.)
3. Run it.
If the compiler gives you error messages during step 2, you'll have to return to step 1, discover and
correct the errors, and then try the compilation (step 2) again. If the program doesn't work as you expect
(or doesn't work at all), you'll have to return to steps 1 and 2 (in that order, of course).

http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx1.html [22/07/2003 5:36:25 PM]
2. Second Overview
2. Second Overview
The edit, compile, and run steps take different forms depending on whether you are working in a
traditional command line environment (such as Unix or MS-DOS) or a Graphical User Interface
(Microsoft Windows or the Macintosh).
2.1. Command Line Environment
2.2. Graphical User Interface


In a command line environment, steps 1, 2, and 3 all involve separate commands you type at your prompt
(often $ or % under Unix, C:\> under MS-DOS). To type your program's source code in for the first
time, or make changes to it, you'll invoke the text editor of your choice. (If possible, you'll want to use a
plain text editor in preference to a fancier word processor; a word processor's insistence on formatting
your text as paragraphs for you, and on worrying about fonts and styles and the like, will only get in your
way.) To perform step 2, you'll invoke your compiler. And to perform step 3, you'll invoke your
program; if the compilation in step 2 is successful, the compiler will build your program as an executable
file (typically in the current directory), which you can then invoke and run just like any other command.

http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx2a.html [22/07/2003 5:36:28 PM]

In a visual, graphical user interface environment, you will typically be able to perform all three steps
from within your compiler's user interface. To type in your program, you'll open a new or existing source
code window, which will be a lot like a text editor, but tailored for the typing in and editing of C code.
(For example, the editor may attempt to automatically indent your if statements and loops for you.) To
compile your program, you'll press a button or select a menu entry saying ``Compile'' (or perhaps
``Run''). Finally, to run your program, you may be able to use a ``Run'' button or menu entry, in which
case the compiler will open another new window in which your program will run, where its output will
appear, and where you will be able to type any input which your program might request. Alternatively,
your compiler may be able to generate standalone executables which you then invoke either by doubleclicking on their icons, or by typing their names in some command-line window (e.g. a ``DOS box'').

http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx2b.html [22/07/2003 5:36:30 PM]

Now, we get down to the details. There are four main sections, for Unix, MS-DOS, Windows, and
Macintosh, which an extra section in the middle concentrating specifically on the University of
Washington.
3.1. Unix
3.2. University of Washington
3.3. MS-DOS
3.4. Microsoft Windows
3.5. Macintosh

3.1. Unix
3.1. Unix
Since C is the programming language of Unix, it used to be the case that virtually all Unix machines had
C compilers. That's less true today (alas!), but C compilers are still very common. One extremely popular
Unix compiler, which happens to be of extremely high quality, and also happens to be free, is the Free
Software Foundations's gcc, or GNU C Compiler.
Step 1: entering or editing your source code
The most popular Unix text editors are vi and emacs. I'm not going to provide tutorials on them here; if
you've been using Unix for a while, there's a good chance that you've become familiar with one or the
other of them. Like most text editors, you can invoke them in one of two slightly different ways: either
with no command-line arguments, in which case you are given a blank screen in which to begin typing
(following which you must save your work to a file, after selecting a filename); or with a filename as a
command-line argument, in which case that file is read in so that you can make incremental changes to it
(following which you must of course again save your work, which by definition will go back to the same
file as you initially read in). Another popular text editor is pico, which is easy for beginners to use,
particularly if you use the PINE e-mail handling program. (pico is PINE's ``composer'', made available
as a standalone program.) See the `Ùniversity of Washington'' section below for more information about
pico.
For example, for the first ``Hello, world!'' program, you might type one of the following four commands:
$
$
$
$
vi
emacs
vi hello.c
emacs hello.c
(In these and the later examples, I'm using $ as a generic Unix shell prompt.) In the first two cases, you'd
be given a blank slate, and you'd save your work (with :w in vi or ^X^W in emacs), specifying a
filename of hello.c. In the second two cases, you'd be presented with hello.c (if it existed), and
when you were finished, you'd save it back to that file (with :w again under vi, or ^X^S under emacs).
In any case, whichever text editor you're using and however you choose to invoke it, make sure to type in
the program (if it's an initial example from the class notes or from a book) exactly as shown, including all
punctuation, and with the lines laid out just as they are in the code you're typing in. (Later, we'll learn
what does and doesn't matter in terms of program layout: where your degrees of freedom are, what the
compiler does and doesn't care so much about, which syntax you have to get exactly right and which
parts allow you to express some creativity. For now, though, just type everything verbatim.)
Step 2: compiling the program
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3a.html (1 of 3) [22/07/2003 5:36:35 PM]
3.1. Unix
Most Unix C compilers use about the same command-line syntax. To compile our ``Hello, world!''
program, you'd type
$ cc -o hello hello.c
The -o hello part says that the output, the executable program which the compiler will build, should
be named hello (that is, placed in a file of that name in the current directory). The C source file to be
read and compiled is obviously hello.c.
If you're using gcc, the command line is almost identical:
$ gcc -o hello hello.c
If cc gives you strange syntax errors, it may be that it's an old-fashioned, pre-ANSI compiler, in which
case it won't be able to compile most of the sample programs we'll be working with (unless you were to
edit them slightly). For many systems where plain cc is obsolete, there is an ANSI C compiler available,
often called acc. Again, the invocation is probably the same:
$ acc -o hello hello.c
For any of these invocations, if you leave out the -o hello part (i.e. if you just type cc hello.c),
the default is usually to leave your executable program in a file named a.out, which is cryptic and
somewhat less than perfectly useful.
If you're compiling a program which uses any of the math functions declared in the header file
<math.h>, such as sqrt, sin, cos, or pow, you'll typically have to request explicitly that the
compiler include the math library:
$ cc -o myprogram myprogram.c -lm
Notice that, unlike most command-line options, the -lm option which requests the math library must be
placed at the end of the command line.
It's also possible to use cc (or gcc, or acc, or any of the others) to compile multiple C source files,
either at once or separately, and to link them all together into one program, but we'll cover those details
later, when we start talking about separate compilation at all.
Step 3: running the program
Assuming you did use the -o option to specify the output filename hello when compiling, you can
now attempt to run your program by just typing its name:
3.1. Unix
$ hello
If you get a message like ``not found'' or ``no such file or directory'', it probably indicates that your shell
is not searching the current directory for commands that you type. You can either adjust your search path
(the PATH environment variable), or else invoke
$ ./hello
to explicitly indicate that the hello you're trying to run is in the current directory, known as dot (.).
If the program prints its output to the screen (as most programs do), and if you'd like to save it (for
posterity, or to print out or mail to me for review), you can type something like
$ hello > hello.out
On the Unix command line, a greater-than symbol > requests that the program's output be redirected to
the named file, instead of going to the screen. (Later, when you begin writing programs that read input
from the keyboard, you may be interested to know that it's possible to redirect a program's input as well,
using the less-than symbol, <. An invocation like program < inputfile requests that the program
read its input from the file named inputfile instead of from the keyboard.)
Finally, one last word on those text editors. I glossed over one detail, which may be troubling you: If you
invoke a text editor with a filename as a command line argument, but the file does not exist, most editors
assume that you're about to create it. You start out with a blank screen, almost as if you'd invoked the
editor with no filename argument at all, but when it comes time to save or quit, the editor remembers the
filename, so you don't have to type it in again. That's all. (In other words, you get to do a ``Save'' instead
of ``Save As''.)


If you're a student or staff member at the University of Washington, you have access to the Uniform
Access computers, several of which give you command lines at which you can do programming and C
compilers that you can invoke in the process. (There may also be Macs or PC's in some of the computing
centers with C compilers installed on them. If you're using one of those compilers, see the appropriate
section below. But if you're using a Mac or PC to log in to one of the Uniform Access computers, using a
program such as telnet, then this section applies.)
Not all of the Uniform Access computers have shell prompts and C compilers available. If your primary
login is on a machine such as homer or milton, that machine is intended primarily for reading e-mail and
the like, and uses a dedicated menu user interface. (It may be possible to escape from those menus to a
shell prompt, but I'm not sure how, or if there are C compilers included in the selection of programs
underneath.) Therefore, you will need to have an account on one of the ``shell prompt'' machines, such as
saul or mead. (You can request an account on one of these shell prompt machines, in addition to your
existing account(s), at one of the support desks. Your login name and password will be the same; you'll
just have another choice of machine to connect to.)
A popular text editor is pico, which is the ``composer'' from the PINE e-mail program, broken out as a
separate program for use in editing arbitrary text files. If you use PINE for e-mail, you'll find pico easy
to use as well; all of the commands you normally use when editing messages as you compose them
(moving the cursor, cutting and pasting, etc.) will apply, except that you'll be editing a text file
containing your C code instead of an e-mail message.
If you're doing this for the first time, just type
% pico
at the shell prompt. (In this example, I'm using % as the shell prompt, yours may be different.) pico will
clear the screen and give you a PINE-line menu, with some control-key commands listed at the bottom.
(If you get error messages about your terminal type, or if the screen or the text you try to type doesn't
appear correctly, you'll have to get this problem resolved somehow, but if you're logging in in the same
way that you do when you read your e-mail, the terminal type and emulations are presumably already set
up correctly, and pico should work just as well as PINE.)
Type in your program. If you're typing in an initial example from the class notes or from a book, type it
in exactly as shown, including all punctuation, and with the lines laid out just as they are in the code
you're typing in. (Later, we'll learn what does and doesn't matter in terms of program layout: where your
degrees of freedom are, what the compiler does and doesn't care so much about, which syntax you have
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3b.html (1 of 3) [22/07/2003 5:36:37 PM]
to get exactly right and which parts allow you to express some creativity. For now, though, just type
everything verbatim.)
When you're done typing, type control-O to write the source code that you've just typed in out to a text
file. pico will prompt you for a file name; use hello.c (assuming you're typing in the first ``Hello,
world!'' example). Exit pico by typing control-X.
If you get compilation errors in step 2, or if the program doesn't work in step 3, such that you have to
come back here to step 1, you will edit (modify) your program by typing
% pico hello.c
(or whatever the filename is), making your changes, writing the file out again with control-O, and then
quitting again with control-X.
Type
% gcc -o hello hello.c
(again, assuming you're working with a source file named hello.c). The -o hello part says that the
output, the executable program which the compiler will build, should be named hello (that is, placed in
a file of that name in the current directory). The C source file to be read and compiled is obviously
hello.c.
(If you leave out the -o hello part, the default is usually to leave your executable program in a file
named a.out, which is cryptic and somewhat less than perfectly useful.)
Later in the class, when it comes time to compile a program which uses any of the math functions
declared in the header file <math.h>, such as sqrt, sin, cos, or pow, you'll have to request
explicitly that gcc include the math library:
% gcc -o myprogram myprogram.c -lm
Notice that, unlike most command-line options, the -lm option which requests the math library must be
placed at the end of the command line.
It's also possible to compile multiple C source files, either at once or separately, and to link them all
together into one program, but we'll cover those details later, when we start talking about separate
compilation at all.
When I tested gcc on saul during the process of writing this handout, it gave me an unusual warning
message of the form `às0: Warning: extraneous values on version stamp ignored''. I have a pretty good
idea of what this message means, and it indicates a slight problem in the installation of one of the parts of
the compiler, not a problem with my (or your) C program. You can ignore this message; the programs
produced by the compiler seem to be just fine in spite of it.
Assuming you did use the -o option to specify the output filename hello when compiling, you can
now attempt to run your program by just typing its name:
% hello
If you get a message like ``not found'' or ``no such file or directory'', it probably indicates that the shell is
not searching the current directory for commands that you type. You can either adjust your search path
(the PATH environment variable), or else invoke
% ./hello
to explicitly indicate that the hello you're trying to run is in the current directory, known as dot (.).
As mentioned in the Unix section above, you can capture your program's output in a file (instead of
having it go to the screen) by typing
% hello > hello.out
The output appears in the file hello.out, which will be a plain text file.
Finally, one footnote about pico: If you invoke it with the name of a file that does not exist, e.g. by
typing pico hello.c the very first time, pico assumes that you're about to create that file (as, in
fact, you are). It gives you the same initially empty editing screen as if you'd invoked it with no filename
argument at all, but when you type Ô to write the file out, it remembers the filename from the command
line, so you don't have to type it in again. That's all. (In other words, you get to do a ``Save'' instead of
``Save As''.)

3.3. MS-DOS
3.3. MS-DOS
If you are using an older MS-DOS compiler, it may be invoked from the command line, in which case it
will act somewhat similarly to the Unix compilers discussed above. On the other hand, there are several
MS-DOS compilers which ``roll their own'' graphical user interfaces, and these act more like the GUI
compilation environments described below.
The MS-DOS compilers I am familiar with are Microsoft's, Borland's, and DJ Delorie's (aka djgpp).
Other MS-DOS compilers are Watcom's and Mix Software's. Microsoft and Borland, at least, make both
command line compilers and ``visual'' compilers with a more graphical user interface. The rest of this
section assumes that you're not using much of a graphical interface, that you're instead simply typing
commands at the DOS (C:\>) prompt. (Theoretically, it's even possible to use these command-line
compilers in a ``DOS box'' under Microsoft windows, but they tend not to work too well there.)
Most MS-DOS compilers probably come with some kind of a source code editor for typing in your
programs, but if not, you'll have to use an ordinary text editor. (If you have a text editor you prefer, you
can probably use it instead of the compiler's editor in any case.)
If at all possible, do not try to use a ``word processor'' to enter or edit C source code. The compiler
expects the source files you'll be giving it to be plain text files, not the document files (incorporating font
and size changes, pagination, and all the other formatting goodies) that word processors create by default.
Even if you remember to ask your word processor to save files as ``plain text'' or ``MS-DOS text files'',
the word processor will still be eager to rearrange your source code into what it thinks of as nicelyformatted ``paragraphs'', which is not what you want your C code to look like.
In any case, whichever editor you use, make sure to type in the program (if it's an initial example from
the class notes or from a book) exactly as shown, including all punctuation, and with the lines laid out
just as they are in the code you're typing in. (Later, we'll learn what does and doesn't matter in terms of
program layout: where your degrees of freedom are, what the compiler does and doesn't care so much
about, which syntax you have to get exactly right and which parts allow you to express some creativity.
For now, though, just type everything verbatim.) When you're finished, be sure to save your work in a
filename ending in .c . (Also, once again, make sure it's a plain, MS-DOS text file, if you're trying to
use a word processor to do the entering.)
The older versions of the Microsoft compilers (e.g. MSC V6.00), which are the only ones I'm familiar
with, use a command-line syntax like
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3c.html (1 of 3) [22/07/2003 5:36:39 PM]
3.3. MS-DOS
C:\> cl hello.c
(``cl'' stands for ``compile and link''. If you're using a really ancient version, such as 3.00, the command
may be msc, but I doubt there are too many of those in circulation any more.)
Borland's syntax (at least the versions I've used) is similar:
C:\> bcc hello.c
(``bcc'' obviously stands for ``Borland C compiler''.)
Both of these commands request that the file named HELLO.C be compiled. Ideally (i.e. if there are no
errors), an executable program will be created, named HELLO.EXE.
It's also possible to do a two-step compilation, requesting the compiler to build only an `òbject'' (.OBJ)
file corresponding to the .C file, and then separately invoking the linker to actually create the executable.
We'll defer discussion of the linker to the later topic of separate compilation, but I mention it here
because I can't remember if bcc invokes the linker automatically at all. If, when you attempt to compile
HELLO.C, the bcc command (or any MS-DOS compiler) creates HELLO.OBJ but not HELLO.EXE, it
means that you'll have to invoke the linker yourself. For simple, one-source-file programs like the ones
we'll be writing at first, the linker invocations will be equally simple:
C:\> link hello;
or
C:\> tlink hello;
(The Microsoft linker is usually named link; the Borland linker is usually named tlink.) The linker
assumes a .obj extension on the input file names you give it (although you can type link
hello.obj; if you prefer), and creates an executable named HELLO.EXE (or whatever.EXE,
depending on the name of the first object file you give it). The semicolon at the end of the line keeps the
linker from prompting you for additional information which would be unnecessary for these simple
examples. (The extra information, which you can also supply in advance on the command line if you
wish, includes such things as an explicitly-selected name for the executable file and a list of extra
libraries to load.)
I haven't said much about djgpp, which is DJ Delorie's monumental port of the entire GNU C
compilation suite to MS-DOS. That topic deserves a section of its own, which I haven't written yet.
djgpp is a very mixed bag: it's an extremely high-quality compiler, better in many ways than the
commercial alternatives, but it is not easy to install or set up. Its opening documentation says something
like `ìf you're just learning C, this is not the compiler you want.'' Nevertheless, the price is right, and I
3.3. MS-DOS
have had students successfully use it.

Once you've successfully created an executable file (which, for the sake of this example, we're still
assuming is named HELLO.EXE), you can now attempt to run it by just typing its name:
C:\> hello
In other words, your program is now a command which you can type at the MS-DOS prompt, on the
same footing as dir, type, and the compiler itself!
If your program prints its output to the screen (as most programs do), and if you'd like to save it (for
posterity, or to print out or mail to me for review), you can type something like
C:\> hello > hello.out
On the MS-DOS command line, a greater-than symbol > requests that the command's output be
redirected to the named file, instead of going to the screen. (Later, when you begin writing programs that
read input from the keyboard, you may be interested to know that it's possible to redirect a program's
input as well, using the less-than symbol, <. An invocation like program < inputfile requests
that the program read its input from the file named inputfile instead of from the keyboard.)


If you're using a more recent compiler, with a ``visual'' or graphical user interface, it will probably run in
its own window (or set of windows) under Microsoft Windows. (As mentioned in the MS-DOS section
above, there are also a few MS-DOS command-line compilers which bring up their own full-screen
interfaces, and much of the discussion in this section will apply to them as well.) You'll typically invoke
or open the compilation environment by double clicking on its icon, or perhaps by typing its name at a
DOS prompt.
A visual compiler revolves around the concept of a project or program. The project or program is built
from one or more modules or source files, including the ones you'll create and type in. Once you've got
everything set up correctly, you'll be able to click on a ``Build'' or ``Run'' button, or make one of these
selections from a menu, to have all aspects of compiling (and perhaps running) your program taken care
of automatically.
The question is, of course, how do you ``get everything set up correctly''? You will probably have to
open a New Project or New Program, perhaps from a Project or Program menu. You will probably have
to pick a name for the project; for the first ``Hello, world!'' example, a good name would be simply
``hello''.
Once you've opened a new project, it will be empty, without any modules or source files to build it from.
So your next step will be to add one. You'll probably request a New source file (either from the Project,
Source, or File menus), and doing so will bring up a source file editing window, in which you can begin
typing your source code. At some point, you'll be asked to select a name for the source file, which will be
something like hello.c.
The source code editing window will work much like a text editor or word processor, except that it will
be streamlined and tailored for the typing and editing of C source code. You may find that the cursor
automatically indents itself on each new line, as the editor guesses the syntax and structure of the C
program you're typing. It may even display different bits of the program (keywords, identifiers,
comments, etc.) in different colors; perhaps this is thought to make a program easier to read.
Once you've successfully opened a source code editing window, type in the program (if it's an initial
example from the class notes or from a book) exactly as shown, including all punctuation, and with the
lines laid out just as they are in the code you're typing in. (Later, we'll learn what does and doesn't matter
in terms of program layout: where your degrees of freedom are, what the compiler does and doesn't care
so much about, which syntax you have to get exactly right and which parts allow you to express some
creativity. For now, though, just type everything verbatim.)
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3d.html (1 of 4) [22/07/2003 5:36:43 PM]
Whether or not the compiler explicitly prompts you to, you will eventually want to do a Save or Save As,
to save your work to disk. Again, you will always use a name ending in .c for source files. If you've just
invoked the compiler for the first time, the default directory (or folder) where you'll first end up trying to
store your own source files may end up being the compiler's installation directory. This isn't ideal, and
although it would probably work, you'll be much better off creating your own directory or folder,
somewhere else, where you'll keep all your own source files and programs that you've written or typed
in, and where they can stay separate from all the compiler's supporting files.
The compiler may automatically add new source files (once you've typed them in, and given them a
name) to the current project or program, or you may have to explicitly use an Add selection (perhaps
from a source or project menu) to add the to the current project. Somewhere, there should be a window
or other display which lists the ``modules'' that make up the current program. Make sure that, eventually,
a name incorporating the word ``hello'' shows up on this list (assuming you've named your source file
hello.c).
Later, when you have occasion to come back to an existing source file to edit it, you'll be able to open it
using Open (instead of New) from a File menu, or perhaps by selecting its name from within the
compiler's list of files or modules which make up the current project.
Modern compilers typically include several different compilation modes, and although some of them
might not end up mattering very much for the simple programs we'll be writing, it's worth paying a bit of
attention to their selection, for best results. Somewhere (perhaps in the Project menu) there will probably
be a menu or dialog of ``Compiler Settings''. One choice may be the `Ìnput Language'', and it may have
options like `ÀNSI C'', `ÀNSI C with Extensions'', ``C++'', and ``pre-ANSI'' or ``K&R'' or ``Classic'' or
``Traditional C''. For this class, you definitely want the `ÀNSI C'' option, or as close as you can get to it.
(Any extensions, no matter how seductive they sound, are unnecessary and would only get in the way,
besides intruding on our goal of learning portable C.)
Another selection may concern the type of program or executable or application to create--there may be
selections with names like ``Windows Application'' or ``Console Application'' or ``stand-alone
executable'' or ``run within compiler''. Definitely don't request a Windows Application; that would
require that you learn how to open windows (from the program side) and learn the rest of the Windows
API, and it would prohibit you from using our most basic library function, printf. A ``console
application'', if you see a choice like that, is much more like what we want: this arranges that a window
be opened automatically, in which our printf output will appear, and into which we can type any input
which our program expects, all without our having to worry about the gory details of Windows. The
result is a ``console'' or ``virtual terminal'' environment just like the old-fashioned ones which the C
textbooks and tutorials (mine included) all still assume you're using.
If you have a choice between running your program within the compilation environment versus creating
a standalone executable, you may do either, but you should understand the distinction. If you run within
the compilation environment, it means that you'll ask the compilation environment to run your program,
and it will open a new window to do so. Creating a standalone MS-DOS executable, on the other hand,
means that the compiler will write an executable file to disk (perhaps automatically picking a name like
HELLO.EXE, or perhaps prompting you for one). You'll then invoke your program yourself, by typing
its name at a DOS prompt. (If you were creating a standalone Windows application, you'd invoke it by
double-clicking on it, but as mentioned, we're probably not ready to create standalone Windows
applications.)
Once you've made your selections of input language and output format (and any other choices which
look interesting, such as how verbose you want the compiler to be about warning you of possible errors,
and how hard it should work at `òptimizing'' your program--you'll probably want to choose ``very'' and
``not very'', respectively), you're ready to actually do the compiling. There may be a Compile or Build or
Make button or menu selection somewhere (again, perhaps on the Project menu). Or, especially if you've
selected ``run within compiler'', there may simply be a Run button (or a top-level Run menu) which first
automatically compiles and builds your program and then, if everything is successful, takes you directly
to step 3.
As helpful and integrated and hand-holding as these visual compilation environments try to be, they tend
to let you down and leave you out in the cold in one respect, namely the automatic selection and
incorporation of the standard C libraries. If, when you first try to build or run your program, you get error
messages telling you that functions (or `èxternal references'') with names like printf are `ùndefined'',
this means that the compiler wants you to explicitly tell it that yes, you do want to make use of those
standard library functions which your textbook and instructor assured you would be provided by the
compiler vendor. (It may even want you to tell it where they are.) If so, you will have to go back to the
Add Module (or perhaps Add Library or even Add Source File) menu, and tell the compiler that your
project is made up of more than just the one source file you gave it. You will have to add (as a module or
something) the ``Standard'' or `ÀNSI'' C libraries. To do so, you may end up going through a standard
file-opening dialog, and you may have to wander over to the compiler's installation directory to find the
library (it may be in a subdirectory or subfolder called ``LIB''). If you find several versions of the
libraries, having to do with different ``memory models'' (or beginning with different cryptic letters like S,
M, C, or L), then you'll have to pick the right one somehow, although I hope you don't have to, because
picking the right one depends on which memory model the compiler is compiling for, which would be
another compiler setting which I didn't talk about.
(I realize that this subsection, in fact all of this section, has been pretty generic, and not very ``step by
step''. Eventually, I may be able to learn enough about the popular Windows C compilation environments
to provide specific instructions for each of them. Until then, if you're having trouble, you may be able to
glean some hints from reading through the detailed instructions in the Macintosh section below, although
the discussion there is of course of a compiler other than the one you're using.)
Depending on the style of program or executable you've built, there will be a couple different ways of
running your program. If you're running it within the compilation environment, it will be particularly
easy; you'll just select something like Run Program from the Project or Run menu. (In fact, it may be so
easy that steps 2 and 3 meld together.) If you've created a standalone executable, you'll have to return to
the DOS prompt (i.e. switch to a ``DOS box'') and type the program's name there, perhaps after changing
to the correct directory. If you're invoking a standalone executable from the DOS prompt, you can read
some notes about input and output redirection at the end of the MS-DOS section above.

3.5. Macintosh
3.5. Macintosh
I've saved the best for last! :-) Although not the total programmer's environment that Unix is, there are
some excellent C compilers available for the Mac. In fact, all of the examples and problem set solutions
for this class have been written and tested on my Macintosh, and all of the software I use to format and
typeset these course notes (including the handout you're reading now) was written by me, much of it on
the Mac. The compiler I use is Think C version 5.1, by Symantec. The rest of this section discusses that
compiler, but as far as I know, other Mac compilers (including Codewarrior) are quite similar.
In Think C, at any one time, you're always working with exactly one project, which is Think C's
metaphor for the program you're writing. The project is built from one or more sources, including the C
source files you'll create and type in. Once you've got things set up correctly, you simply select Run from
the Project menu to have all aspects of compiling and running your program taken care of automatically.
(You can also ``make'' your program without running it.)
When you first start up Think C, you're presented with a New Project dialog. You can name your project
anything you like, but don't add a .c to the end of the name, because as we'll see, we'll be creating
individual .c files later which are distinct from the ``project file''. (Later, once you've created a project,
you can start up the compiler `òn'' that project by double-clicking on the project file's icon, or you can
select Open Project from the Project menu.)
If you've just invoked the compiler for the first time, the default folder where you'll first end up trying to
store your own project and source files will probably end up being the compiler's installation directory.
This is not ideal, and Symantec recommends against it. Instead, you'll be much better off creating your
own folder somewhere else, where you'll keep all your own source files and projects that you've written
or typed in, and where they can stay separate from the compiler and all its supporting files.
Once you've opened a project, select New from the File menu. This opens a new source file editing
window, in which you can begin typing your source code. The source code editing window works like
any text editing window, except that it is streamlined and tailored for the typing and editing of C source
code. (For example, you may notice that the cursor automatically indents itself on each new line, as the
editor guesses the syntax and structure of the C program you're typing.)
Once you've successfully opened a source code editing window, type in the program (if it's an initial
example from the class notes or from a book) exactly as shown, including all punctuation, and with the
lines laid out just as they are in the code you're typing in. (Later, we'll learn what does and doesn't matter
in terms of program layout: where your degrees of freedom are, what the compiler does and doesn't care
so much about, which syntax you have to get exactly right and which parts allow you to express some
creativity. For now, though, just type everything verbatim.)
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3e.html (1 of 4) [22/07/2003 5:36:46 PM]
3.5. Macintosh
At some point, but definitely by the time you're finished typing, you'll need to select a name for the
source file, which will be something like hello.c. Select Save As from the File menu, and save the file
as normal. You'll want to place it in the same folder as the project file, which ought to be where it ends
up by default, but due to the way the Mac's standard open file dialog always seems to remember the last
folder you were in for any reason, you'll want to double check that you're still where you want to be
before saving files in Think C.
Later, when you have occasion to come back to an existing source file to edit it, you can either open it as
usual using Open from the File menu, or else double-click on its name in think C's ``project window'',
which is a list of all the sources that make up the current project. (Think C always displays this window
whenever a project is open; it's usually pretty small and narrow, and located in the top center or left of
the screen.)
First, pull down the Edit menu and go to the Options dialog, and run through the various choices. For the
simple programs we'll be writing, most of the defaults will be adequate. But do turn off any extensions,
``Think Object'' or C++ or otherwise. (Compiler extensions, no matter how seductive they sound, are
unnecessary and would only get in the way, besides intruding on our goal of learning portable C.)
Once you've got a project set up, the easiest way way of compiling it is by simply selecting Run from the
Project menu, which compiles the program (if necessary) and then runs it, taking you directly to step 3.
But if you try that right away, you'll get an error saying that main is undefined, because even though
you've typed in a .c file containing your main function, Think C hasn't realized it's an official part of
your project yet.
There are at least three ways of adding a source file to the current project:
1. Pull down Add... from the Source menu. Double-click on the desired file in the upper half of the
dialog (or select it and click Add) to place it in the lower list of files to be added. Then click
Done.
2. If the desired source file is the topmost (current) window, select Add (the one without the three
dots) from the Source menu.
3. If the desired source file is the topmost (current) window, select Compile from the Source menu.
If the compilation is successful, the source will be automatically added to the project menu.
You can use any one of these methods to add hello.c to the current project. (Generally, I find number
3 most convenient, because it's got a command-key shortcut.)
You're now ready to hit an annoying little feature of Think C, which is that it does not assume any
default, initial set of function libraries, ANSI standard or otherwise. This means that the second time you
attempt to build or run your program (after fixing the ``main undefined'' problem from when you tried to
3.5. Macintosh
run a project with no sources at all), you're likely to get error messages telling you that functions with
names like printf are undefined, in spite of the fact that your textbook and instructor assured you they
would be provided by the compiler vendor. You have to pull down Add... from the Source menu, go to
the folder (wherever it is) where you installed Think C, go to the C Libraries subfolder, and select the
ANSI library. (In other words, you add it as a second Source to your project. Here, ``source'' just means
``source of information for the project''; it doesn't mean ``C source code''.) (Also, once you've gone over
to the C Libraries folder to get the ANSI library, remember to go back to your own C programming
folder next time you have occasion to save a file, otherwise it might get lost over in the Think C folder.)
When you select Run from the Project menu, Think C (obviously) runs your program. If your program
calls printf to generate any output (as all of our programs will do), a ``console window'' will
automatically be opened to display the output. If the program tries to read any input, you can type that
input into the same console window.
You can also pull down Build Application from the Project menu. This allows you to turn your program
into an actual Mac application. You'll be asked to pick a file name for the application, which will have to
be different from both the project file name and any source file (.c) names. (For this reason, it's common
to name project files ending in .prj or project or, to be cute, the pi symbol.)
When you create a standalone Mac application, you'll be able to invoke it by double-clicking on its icon,
as usual. (However, it probably won't look much like a Mac application once it's running, because
assuming it just calls printf, it will still open up the same console window, and it won't have any
fancy menus or any dialogs at all.)
Later, when it comes time to write some sample programs which try to read their command line
arguments, you will obviously have a problem typing these arguments, because the Mac of course has no
command line. Think C does incorporate a mechanism for passing command lines to a program (because
even though doing so is entirely un-Mac-like, many C programs, especially those written during
introductory classes, still expect them). To get an `àpplication'' built by Think C to prompt you for a
command line, you have to do two things:
1. Place the line
#include <console.h>
at the top of the source file containing your main function, along with the other #include
lines.
2. Define your main function like this:
main(int argc, char *argv)
3.5. Macintosh
{
... any local variable declarations go here ...
argc = ccommand(&argv);
... rest of program ...
In other words, the first executable statement of your main function must be the assignment argc =
ccommand(&argv); . The call to the ccommand function causes a dialog box to pop up in which
you can type a command line, the contents of which will then be available to the rest of your code
through the argc and argv variables, as usual. Also, the dialog gives you a way to redirect the input
and output of your program (see the Unix and MS-DOS sections above for a bit more information on
input/output redirection). But don't worry about any of this until we get to the topic of command line
arguments, towards the end of the class.

http://www.eskimo.com/~scs/cclass/handouts/errmsgs/index.html

Steve Summit
Since C was originally a language designed for programmers who considered themselves experts, to this
day a C compiler can be a somewhat cryptic beast to install and configure correctly, and to understand
when it begins to complain about problems (it thinks) it has detected in your programs. The error
messages that a C compiler prints usually have a direct relationship to some distinct problem in your
program, but until you learn what they mean, it's not always immediately obvious what the messages are
trying to tell you. This handout summarizes some of the error and warning messages commonly printed
by C compilers, and gives you a slightly more comprehensive explanation of what they're really saying.
(As we'll see, sometimes what they're saying is not that there's a problem with your program, but rather
with the compiler itself.)
If your compiler prints some cryptic message which you can't find in this list, please let me know, so that
I can add it.
One side note: sometimes, a compiler gets so badly confused by something weird which you've done that
it goes off the deep end, spewing messages which have nothing to do with actual problems in your
program (or which may have nothing to do with your program at all). If your compiler has printed scores
of error messages, and you've dealt with the first few (that is, corrected the problems) but can't figure out
what the rest are referring to, try compiling again anyway. It may be that the first problem(s) you fixed
will make all (or most) of the other messages go away.
Error and Warning Messages

Compiler Installation Questions
Read Sequentially
http://www.eskimo.com/~scs/cclass/handouts/errmsgs/index.html [22/07/2003 5:36:48 PM]
``syntax error''
This is the granddaddy of all compiler error messages, although it can be less than useful. It
means that there's something wrong with the syntax of your program (yes, I know, you figured
that much out already). Roughly speaking, ``syntax'' refers to the typographical characters you've
used to express your program and the order you've arranged them in. If parts of your program are
in the wrong order, or if some required punctuation (such as a comma, semicolon, brace, or
parenthesis) is missing, you're liable to get this error. The actual error is not always on the line the
compiler indicates; sometimes, a problem on one line isn't noticed right away but percolates down
and triggers an error message a few lines later.
You may also get this error if you're trying to use an old, pre-ANSI compiler (such as cc on many
Unix systems) to compile a newer, ANSI-compatible program. (Try using acc or gcc instead.)
`ùndefined symbol x''
`ùndeclared variable x''
``x undefined''
All of these mean that you have apparently tried to use a variable named x (or, of course,
whatever name the compiler is complaining about) without declaring it. C requires that all
variables be declared before use; the declaration informs the compiler in advance of the name and
type of each variable that you will be using. (If the compiler is complaining about a variable
which you did think you'd declared, it may be that you misspelled it either where you declared it
or where you used it, or that you declared it in a different part of your program than the part where
you're trying to use it.)
(See also discussion about `ùndefined external'' message below.)

``newline in string or character constant''
This simply means that you've got an unbalanced single quote ' or double quote ". Here is an
example:
printf("hello, world!\n);
You might wonder why it's illegal to have a newline in a string constant (and why it is that you
have to use the \n sequence instead). The reason is precisely so that the compiler can give you
this error message when you accidentally leave out a closing quote. Otherwise, the compiler
might try to interpret the whole rest of your source file as one very long string constant.
``bad lvalue'' (or ``missing lvalue'')
lvalue is a piece of compiler writer's jargon which refers to an object with a location. (Actually, its
original derivation had to do with the fact that lvalues are what appear on the left hand side of an
http://www.eskimo.com/~scs/cclass/handouts/errmsgs/sx1.html (1 of 4) [22/07/2003 5:36:50 PM]
assignment operator.) What this message means to you and me is that the compiler was trying to
assign to or modify some variable, but there was no variable there for it to modify (or, perhaps,
for some reason the particular variable could not be modified). For example, if you accidentally
wrote
1 = a + b
(that's a number 1 there, not a letter l), the compiler would complain, because the number 1 is not
a variable which it's possible for you to assign a new value to.
``can't find header file xxx.h''
This message means that one of your source files contains a line like
#include <xxx.h>
or
#include "xxx.h"
but the compiler (specifically, the preprocessor) could not find the header file xxx.h.
If the header file is one of your own (usually one that you've used the #include "" form with),
make sure that you have actually composed (written) the header file, and that it is in the correct
directory--usually, the same directory as the rest of your source files.
If the header file is a Standard one (usually one that you've used the #include <> form with),
such as stdio.h, then there's something wrong with the way the compiler has been installed or
configured. You may have to go into a configuration screen and set the directory name where the
standard header files are kept. (Of course, to do this, you will have to know where that directory
is! It usually has a name like `ìnclude'', and is located somewhere within the compiler's
installation tree.)
(Finally, if the preprocessor is complaining about a misspelled header name, simply fix the
misspelling.)
`ùndefined external: xxx''
This message means that during the linking stage, a function (or, occasionally, a global variable)
by the name of xxx could not be found. There can be several causes, depending on whether the
function is one of your own, or was one which you had expected would be provided for you.
If the function name is one of your own, you'll have to figure out why the linker couldn't find it.
Did you provide a defining instance (not just an external declaration) for the function? Did you
enter its source code into the same file(s) as the rest of the program? If not, did you tell the linker
where to find it? You'll either have to write the function if you forgot to, or adjust the program's
build procedure (perhaps by editing the ``Makefile'' or adding a ``module'' to the ``project list'').
If the function name is one from the Standard C library, such as printf, which was supposed to
be provided for you, you'll have to figure out why the linker couldn't find it. Occasionally, the
problem is that the compiler doesn't know where the Standard library is. You may have to go into
a configuration screen and set the path or directory name of the library. (Of course, to do this, you
will have to know where the library is! It usually has a name containing the string ``libc'', such as
libc.a or SLIBC.LIB. The libraries--there may be several--are often located in a subdirectory
named ``lib'', somewhere within the compiler's installation tree.)
If the undefined function's name is close but not exactly the same as one of your own functions or
one of the Standard library functions, it obviously means that you've mistyped its name in a
function call somewhere. All you have to do is find that typo in whichever source file it's in, fix it,
and recompile.
If the function is a math-related one, such as sin, cos, or sqrt, and you're using a Unix system,
make sure that you've used the -lm option, at the end of the command line, when
compiling/linking.
If the function name printed in the error message includes a leading underscore, don't worry about
the underscore. (Under some compilers, the names manipulated by the linker have this underscore
added by the compiler, although you continue to refer to the function by the name you thought
you had given it.) In most cases, if the linker complains about ``_myfunc'' being undefined, it's
``myfunc'' you've got to worry about.
If the name is ``_end'', you'll rarely have to worry about it; this is a symbol used internally by the
traditional Unix linkers, and shows up as `ùndefined'' only when there are also other symbols
undefined. If you fix all of the others, the mysterious undefined ``_end'' will also disappear.
(See also discussion about `ùndeclared variable'' message above.)

``call of function with no prototype in scope''
This message means that the compiler has noticed you trying to call a function which the compiler
has no (or only minimal) information about. Under most compilers, this is a warning message, not
an error, because the compiler is willing to assume certain things about unknown functions, and
will let you try to call them. However, the assumptions it makes might not be right, and the
compiler won't be able to check that you've called the function with the correct number and type
of arguments, which is why it issues the warning.
If you'd like the compiler to keep checking function calls carefully for you, you must arrange to
supply prototypes for the functions you call. For functions from the Standard C library, you
almost always supply prototypes by using the #include directive. For example, in a source file
where you call printf, you should always have the line
#include <stdio.h>
near the top of the file, because <stdio.h> contains the prototype for printf (and many other
functions). For your own functions, you will have to supply your own prototypes. (You may wish
to place them in your own header files, and use #include "" to bring them in wherever they're
needed. Using header files in this way is generally more reliable than manually typing copies of
the prototype into each source file where a particular function of yours is called.)
If, for whatever reason, you'd rather that the compiler not check function calls for you (that is, if
you think that you can get them right yourself), you can usually configure a compiler not to
complain about function calls with no prototype in scope.
top

When you install a C compiler on your computer for the first time, it may ask you some mysterious
questions. (Again, these questions tend to assume that you're already enough of an expert that you ought
to know what they mean; they are not always self-explanatory.) Most of the time, the default answers
should be adequate. Just in case, here are a few common varieties of installation questions, along with
suggestions for answering them. (Don't worry a bit if your compiler's installation procedure does not ask
you all of these questions. I've included some obscure ones which I or my students have seen in the past,
but if your compiler's installation procedure is smart enough not to require so much coddling,
consolation, and confirmation, so much the better.)
`Ùse floating-point emulation?''
We won't be doing much floating-point work, but it doesn't hurt to say ``yes''. (The worst that can
happen is that if your machine already has a floating-point coprocessor, the programs that you
build will be slightly bigger--by a few K--if they include a software floating-point emulation
library which you don't really need. But it certainly won't hurt.)
``Which memory models?''
Under MS-DOS, for arcane reasons which I'll avoid the temptation to editorialize about, you have
your choice among programming in at least four different ``memory models'': ``small,''
``compact,'' ``medium,'' and ``large'' (and perhaps also ``tiny'' and ``huge''). For our purposes, the
programs we'll be writing are so small and simple that it absolutely won't make a difference which
model we use. If disk space is not a problem and there's a chance you'll be working on larger
programs later, pick both ``small'' and ``large'' models, and ignore the rest. If all you'll be writing
in the foreseeable future is the small programs for this class, just pick ``small'' model and be done
with it. (You can always go back and re-install the support for the other memory models later, if
for some reason you ever need them.)
`Ìnclude xxx library?''
(``xxx'' might be ``graphics,'' `ÒWL'' ``networking,'' or some other keyword which you do or
don't recognize.) It's probably safe to say ``no'' to any of these special libraries; I'm not aware of
any special libraries which would be needed by any compilers to deal with the simple programs
we'll be writing.
``Delete component libraries after build?''
It's safe to say ``yes,'' or just use the default. (Saying ``yes'' conserves some disk space; you're not
going to need the ``component libraries'' later if the installation procedure has built you some
``combined libraries.'')
top
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
Scientific Programming (in C)

Steve Summit
C was not used very much for floating-point work at first (i.e. back when it was still being developed and refined),
so its support for some aspects of floating-point and scientific work is a little weak. (On the other hand, C's support
for string processing and other data structures can make up the difference in a program that does some of each.)
There are two significant areas of concern. One is specific to C, and concerns the handling of multidimensional
arrays. The other is endemic in floating-point work done on computers, and concerns accuracy, precision, and
round-off error.
Functions and Multi-dimensional Arrays
We've seen that when a function receives a one-dimensional array as a parameter:
f(double a[])
{
...
}
the size of the array need not be specified, because the function is not allocating the array and does not need to
know how big it is. The array is actually allocated in the caller (and in fact, only a pointer to the beginning of the
array is passed to a function like f). If, in the function, we need to know the number of elements in the array, we
have to pass it as a second parameter:
f(double a[], int n)
{ ... }
If we wish to emphasize that it is really a pointer which is being passed, we can write it like this:
f(double *a, int n)
{ ... }
(In fact, if we declare the parameter as an array, the compiler pretends we declared it as a pointer, as in this last
form.)
In the one-dimensional case, it is actually a bit of an advantage that we do not need to specify the size of the array.
If f does something generic like ``find the average of the numbers in the array,'' it will clearly be useful to be able
to call it with arrays of any size. (If we had to write things like
f(double a[10])
{ ... }
then we might find it a real nuisance that f could only operate on arrays of size 10; we'd hate to have to write
several different versions of f, one for each size.)
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (1 of 10) [22/07/2003 5:37:03 PM]
Unfortunately, the situation is not nearly so straightforward for multi-dimensional arrays. If you had an array
double a2[10][20];
which you wanted to pass to a function g, what would g receive? We could declare g as
g(double a[10][20])
{ ... }
and let the compiler worry about it--that is, let it ignore the dimensions it doesn't need, and pretend that g really
receives a pointer. But what pointer does it receive? It's easy to imagine that g could also be declared as
g(double **a)
{ ... }
/* WRONG */
but this is incorrect. Notice that the array a2 that we're trying to pass to g is not really a multidimensional array; it
is an array of arrays (more precisely, a one-dimensional array of one-dimensional arrays). So in fact, what g
receives is a pointer to the first element of the array, but the first element of the array is itself an array! Therefore,
g receives a pointer to an array. Its actual declaration looks like this:
g(double (*a)[20])
{ ... }
(and this is what the compiler pretends we'd written if we instead write a[10][20] or a[][20]). This says that
the parameter a is a pointer to an array of 20 doubles. We can still use normal-looking subscripts: a[i][j] is
the j'th element in the i'th array pointed to by a. The problem is that no matter how we declare g and its array
parameter a, we have to specify the second dimension. If we write
g(double a[][])
the compiler complains that a needed dimension is missing. If we write
g(double (*a)[])
the compiler complains that a needed dimension is missing. We have to specify the ``width'' of the array, and if
we've got various sizes of arrays running around in our program, we're back with the problem of potentially
needing different functions to deal with different sizes of arrays.
Why wouldn't
g(double **a)
/* WRONG */
work? That says that a is a pointer to a pointer, but since we're passing an array of arrays, the function will receive
a pointer to an array. A pointer to an array is not compatible with a pointer to a pointer. (More on double **a
below.)
A second problem arises if you need any temporary arrays inside a function. Often, a temporary array needs to be
the same size as the passed-in arrays being worked on. Unfortunately, there's no way to declare an array and say
``just make it as big as the passed-in array.''
It's easy to imagine a solution to all of the problems we've been discussing. To write a function accepting a
multidimensional array with the size specified by the caller, you might want to write
g(double a[m][n], int m, int n)
/* won't work */
where the array dimensions are taken from other function parameters. (This is, of course, just how you do it in
FORTRAN.) To declare a temporary array, you might try things like
f(double a[n], int n)
{
double tmparray[n];
...
}
/* won't work */
g(double a[m][n], int m, int n)

{
double tmparray[m][n];
...
}
/* won't work */
or
But you can't do any of these things in standard C. Every array dimension must be a compile-time constant. Array
parameters can't take their dimensions from other parameters, and local arrays can't take their dimensions from
parameters.
(It is worth noting that at least one popular compiler, namely gcc, does let you do these things: parameter and local
arrays can take their dimensions from parameters. If you make use of this extension, just be aware that your
program won't work under other compilers.)
How can we work around these limitations? The most powerful and flexible solution involves abandoning
multidimensional arrays and using pointers to pointers instead. This approach has the advantage that all
dimensions may be specified at run time, though it also has a few disadvantages. (Namely: you may have to
remember to free the arrays, especially if they're local; the intermediate pointers take up more space, and accessing
`àrray'' elements via pointers-to-pointers may be slightly less efficient than true subscript references. On the other
hand, on some machines, pointer accesses are more efficient. Another disadvantage is that our treatise on scientific
programming in C is about to digress into a treatise on memory allocation.)
To illustrate, here is a pair of functions for allocating simulated one- and two-dimensional arrays of double. (By
``simulated'' I mean that these are not true arrays as the compiler knows them, but because of the way arrays and
pointers work in C, we are going to be able to treat these pointers as if they were arrays, by using [] array
subscript notation.)
#include <stdio.h>
#include <stdlib.h>
double *alloc1d(int n)
{
double *dp = (double *)malloc(n * sizeof(double));
if(dp == NULL) {
exit(1);
}
return dp;
}
double **alloc2d(int m, int n)
{
int i;
double **dpp = (double **)malloc(m * sizeof(double *));
if(dpp == NULL) {
exit(1);
}
for(i = 0; i < m; i++) {
if((dpp[i] = (double *)malloc(n * sizeof(double))) == NULL) {
exit(1);
}
}
return dpp;
}
These routines print an error message and exit if they can't allocate the requested memory. Another approach
would be to have them return NULL, and have the caller check the return value and take appropriate action on
failure. (In this case, alloc2d would want to ``back out'' by freeing any intermediate pointers it had successfully
allocated.)
Note that these routines allocate arrays of type double. They could be trivially rewritten to allocate arrays of
other types. (Note, however, that it is impossible in portable, standard C to write a routine which generically
allocates a multidimensional array of arbitrary type, or with an arbitrary number of dimensions.)
When a one-dimensional array allocated by alloc1d is no longer needed, it can be freed by passing the pointer to
the standard free routine. Freeing two-dimensional arrays allocated by alloc2d is a bit trickier, because all the
intermediate pointers need to be freed, too. Here is a function to do it:
void free2d(double **dpp, int m)
{
int i;
for(i = 0; i < m; i++)
free((void *)dpp[i]);
free((void *)dpp);
}
(The casts in the malloc and free calls in these routines are not necessary under ANSI C, but are included to
make it easier to port this code to a pre-ANSI compiler, if necessary.)
With these routines in place, we can dynamically allocate one- and two-dimensional arrays of double to our
heart's content. Here is an (artificial) example:
double *a1 = alloc1d(10);
double **a2 = alloc2d(10, 20);
a1[0] = 0;
a1[9] = 9;
a2[0][0] = 0.0;
a2[0][19] = 0.19;
a2[9][0] = 9.0;
a2[9][19] = 9.19;
Now, when we pass a two-dimensional array allocated by alloc2d to a function, that function will be declared as
accepting a pointer-to-a-pointer:
g(double **a)
/* okay, for alloc2d "arrays" */
When we need a temporary array within a function, we can allocate one with alloc1d or alloc2d; however,
we must remember to free it before the function returns. (Remember that malloc'ed memory persists until
explicitly freed, or until the program exits. Local variables disappear when the function they're local to returns,
but in the case of pointers to malloc'ed memory, that just means that we lose the pointer, and not that the pointedto memory has been freed. We have to explicitly free it, before we lose the pointer.)
One last note: since
g1(double a[10][20])
is not compatible with
g2(double **a)
we will have to maintain a segregation between arrays-of-arrays and pointers-to-pointers. We can't write a single
function that operates on either a true multidimensional array or one of our pointer-to-pointer, simulated
multidimensional arrays. (Actually, there is one way, but it's more elaborate, and not strictly portable.)
That's about enough (for now, at least) about multidimensional arrays. We now turn to the subject of...
Floating-Point Precision
It's the essence of real numbers that there are infinitely many of them: not only infinitely large (in both the positive
and negative direction), but infinitely small and infinitely fine. Between any two real numbers like 1.1 and 1.2
there are infinitely many more numbers: 1.11, 1.12, etc., and also 1.101, 1.102, etc., and also 1.1001, 1.1002, etc.;
and etc. There is no way in a finite amount of space to represent the infinite granularity of real numbers. Most
computers use an approximation called floating point. In a floating point representation, we keep track of a base
number or ``mantissa'' with some degree of precision, and a magnitude or exponent. The parallel here with
scientific notation is close: The number actually represented is something like mantissa x
10<sup>exponent</sup>. (Actually, most computers use mantissa x 2<sup>exponent</sup>, and we'll soon
see how this difference can be important.)
The advantage of a floating-point representation is that it lets us devote a finite amount of space to each value but
still do a decent job of approximating the real numbers we're interested in. If we have six decimal digits of
precision, we can represent numbers like 123.456 or 123456 or 0.123456 or 0.000000123456 or 123456000000.
However, we can not accurately represent numbers which require more than six significant digits: we can't talk
about numbers like 123456.123456 or 0.1234567 or even 1234567. If we try to represent numbers like these (i.e. if
they occur in our calculations), they'll be ``rounded off'' somehow. (``Rounded off'' is in quotes because we don't
always know whether the unrepresentable digits will be properly rounded up or down, or randomly rounded up or
down, or simply discarded. High-quality floating-point processors and subroutine libraries will try to do a good job
of rounding when a result cannot be exactly represented, but unfortunately there are some low-quality ones out
there.)
In simple circumstances, the finite precision of floating-point numbers is not a problem. Real-world inputs (i.e.
anything you measure) never have infinite precision; in fact they often have only two or three significant digits of
precision. Almost everyone has a vague understanding that floating-point representations have finite precision, so
it's easy to shrug off slight inaccuracies as ``roundoff error.'' If these errors always stayed confined to the leastsignificant digit, they wouldn't be much of a problem.
The problem is that error can accumulate. In the worst case, you can lose one (or more than one) significant digit in
each step of a calculation. If a calculation involves six steps, and you only have six significant digits to play with,
by the end of the calculation, you've got nothing of significance left. As a wise programmer once said, ``Floating
point numbers are like piles of sand. Every time you move them around, you lose a little sand.''
How can an error grow to involve more than the least-significant digit? Here is one simple example. Suppose we
are computing the sum 1+1+1+1+1+1+1+1+1+1+1000000, and we have only six significant digits to play with.
Clearly the answer is 1000010, which has six significant digits, so we're okay. But suppose we compute it in the
other order, as 1000000+1+1+1+1+1+1+1+1+1+1. In the first step, 1000000+1, the intermediate result is 1000001,
which has seven significant digits and so cannot be represented. It will probably be rounded back down to
1000000. So 1000000+1+1+1+1+1+1+1+1+1+1 = 1000000, and 1+1+1+1+1+1+1+1+1+1+1000000 !=
1000000+1+1+1+1+1+1+1+1+1+1. In other words, the associative law does not hold for floating-point addition.
Elementary school students often wonder whether it makes a difference which order you add the column of
numbers up in (and sometimes notice that it does, i.e. they sometimes make mistakes when they try it). But here
we see that, on modern digital computers, the folklore and superstition is true: it really can make a difference
which order you add the numbers up in.

If you don't believe the above, I invite you to try this little program:
#include <stdio.h>
#define BIG 1e8
main()
{
int i;
float f = 0;
for(i = 0; i < 20; i++)
f += 1;
f += BIG;
printf("%f\n", f);
f = BIG;
for(i = 0; i < 20; i++)
f += 1;
printf("%f\n", f);
return 0;
}
You may have to play with the value of BIG (other values to try are 1e6, 1e7, or 1e9), but you should probably
be able to get a surprising result.
Because of the finite precision of floating point numbers on computers, you also have to be careful when
subtracting large numbers, as the result may not be significant at all. If we try to compute the difference
100000789 - 100000123 (again using only six significant digits), we are not likely to get 666, because neither
100000789 nor 100000123 are representable in six digits in the first place. This may not seem surprising, but what
if the difference we were really interested in, and thought was significant, was 789 - 123, with that bias of
100000000 being an artifact of the algorithm which we assumed would cancel out? In this case, it does not cancel
out, and in general we cannot assume that
(a + b) - (c + b) = a - c
Another thing to know about floating point numbers on computers is that, as mentioned above, they're usually
represented internally in base 2. It's well known that in base 10, fractions like 1/3 have infinitely-repeating decimal
representations. (In fact, if you truncate them, you can easily find anomalies like the one we discussed in the
previous paragraphs, such as that 0.333+0.333+0.333 is 0.999, not 1.) What's not so well known is that in base 2,
the fraction one tenth is an infinitely-repeating binary number: it's 0.00011001100110011<sub>2</sub>, where
the 0011 part keeps repeating. This means that almost any number that you thought was a ``clean'' decimal fraction
like 0.1 or 2.3 is probably represented internally as a messy infinite binary fraction which has been truncated or
rounded and hence is not quite the number you thought it was. (In fact, depending on how carefully routines like
atof and printf have been written, it's possible to have something like printf("%f", atof("2.3"))
print an ugly number like 2.299999.) Where you have to watch out for this is that if you multiply a number by 0.1,
it is not quite the same as dividing it by 10. Furthermore, if you multiply a number by 0.1 three times, it may be
different than if you'd multiplied it by 0.001 and different than if you'd divided it by 1000.
My purpose in bringing up these anomalies is not to scare you off of doing floating point work forever, but to drive
home the point that floating point inaccuracies can be significant, and cannot be brushed of as simple uncertainty
about the least-significant digit. With care, the inaccuracies can be confined to the least-significant digit, and the
care involves making sure that errors do not proliferate and compound themselves. (Also, I should again point out
that none of these problems are specific to C; they arise any time floating-point work is done with finite precision.)
In practice, anomalies crop up when both large and small numbers participate in the same calculation (as in the
1+1+1+1+1+1+1+1+1+1+1000000 example), and when large numbers are subtracted, and when calculations are
iterated many times. One thing to beware of is that many common algorithms magnify the effects of these
problems. For example, the definition of the standard deviation is the square root of the mean variance, namely
sqrt(sum(sqr(x[i] - mean)) / n)
where sum() and sqr() are summing and squaring operators (neither of which are standard C library functions,
of course) and where mean is
sum(x[i]) / n
(The statisticians out there know that the denominator in the first expression is not necessarily n but should usually
be n-1.)
Computing standard deviations from the definition can be a bit of a nuisance because you have to walk over the
numbers twice: once to compute the mean, and a second time to subtract it from each input number, square the
difference, and sum the squares. So it's quite popular to rearrange the expression to
sqrt((sum(sqr(x[i])) - sqr(sum(x))/n) / n)
which involves only the sum of the x's and the sum of the squares of the x's, and hence can be completed in one
pass. However, it turns out that this method can in practice be significantly less accurate. By squaring each
individual x and accumulating sum(sqr(x[i])), the big numbers get bigger and the small numbers get
smaller, so the small numbers are more likely to get lost in the underflow. When you use the definition, it's only
the differences that get squared, so less of significance gets lost. (However, even working from the definition, if
the individual variances sum to near zero they can sometimes end up summing to a slightly negative number, due
to accumulated error, which is a problem when you try to take the square root.)
Other standard mathematical manipulations to watch out for are:
The quadratic formula:
If b<sup>2</sup> and 4ac are both large, there may be very few significant digits left in the determinant
b<sup>2</sup> - 4ac.
Solving systems of equations, inverting matrices, computing the determinant of matrices:
The standard mathematical techniques involve large numbers of compounded operations which can lost a lot of
accuracy.
Iterated simulations:
Obviously, any iteration is prone to compounded errors.
Trigonometric reductions
The only sane ways of computing trig functions numerically (i.e. the ways likely to be used by library functions
such as sin, cos, and tan) work only for the one period of the function near 0. These functions will of course
work on arguments outside the interval (-pi, pi), but the function must first reduce the arguments by subtracting
some multiple of 2*pi, and as we've seen, taking the difference of two large numbers (whether you do it or the
library function does it) can lose all significance.
Compounded interest
The standard formula is something like
principle x (1 + rate)<sup>n</sup>
but raising a number near 1 to a large power can be inaccurate.
Converting strings to numbers:
[okay, so this is a computer science problem, not a mathematical problem] If you handle digits to the right of the
decimal by repeatedly multiplying by 0.1, you'll lose accuracy.
For some of these problems, using double instead of single precision makes the difference, but for others, great
care and rearrangement of the algorithms may be required for best accuracy.
Returning to C, you might like to know that type float is guaranteed to give you the equivalent of at least 6
decimal digits of significance, and double (which is essentially C's version of the ``double precision'' mentioned
in the preceding paragraph) is guaranteed to give you at least 10. Both float and double are guaranteed to
handle exponents in at least the range -37 to +37.
One other point relating to numerical work and specific to C is that C does not have a built-in exponentiation
operator. It does have the library function pow, which is the right thing to use for general exponents, but don't use
it for simple squares or cubes. (Use simple multiplication instead.) pow is often implemented using the identity
x<sup>y</sup> = e<sup>y ln(x)</sup>

and the inevitable slight inaccuracies in the underlying natural log and e<sup>x</sup> routines can foil you.
For example, if you compute (int)pow(2., 3.), on some machines the answer comes out 7, because pow
returns a number like 6.99999, and C's floating-to-int conversion simply discards all of the fractional part.
The fact that a floating-to-int conversion truncates the fractional part has nothing to do, by the way, with the
problem mentioned earlier of a poor-quality floating-point processor not rounding well. (Back there we were
talking about rounding an intermediate result which was still trying to be represented as a floating-point number.)
When you do want to round a number off to the nearest integer, the usual trick is
(int)(x + 0.5)
By adding 0.5 before truncating the fractional part, you arrange that numbers with fractions of .5 or above get
rounded up. (This trick obviously doesn't handle negative numbers correctly, nor does it do even/odd rounding.)
***
I am not a numerical analyst, so this presentation has been somewhat superficial. On the other hand, I've tried to
make it practical and accessible and to highlight the real-world problems which do come up in practice but which
too few programmers are aware of. You can learn much more (including the traditional notations for describing
and analyzing floating point accuracy, and some clever algorithms for doing floating point arithmetic without
losing so much accuracy) in the following references.
David Goldberg, ``What Every Computer Scientist Should Know about Floating-Point Arithmetic,'' ACM
Computing Surveys, Vol. 23 #1, March, 1991, pp. 5-48.
Brian W. Kernighan and P.J. Plauger, The Elements of Programming Style, Second Edition, McGraw-Hill, 1978,
ISBN 0-07-034207-5, pp. 115-118.
Donald E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, second edition,
Addison-Wesley, 1981, ISBN 0-201-03822-6, chapter 4.
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, Numerical Recipes in C, 2nd
edition, Cambridge University Press, 1992, ISBN 0-521-43108-5.
Copyright
http://www.eskimo.com/~scs/cclass/handouts/copyright.html [22/07/2003 5:37:04 PM]
When the contents of a web page are not static, but rather are generated on-the-fly as the page is fetched, a program runs
to generate the contents. When that program requires input from the client who is actually fetching the page (input such as
the selections made when filling out a form) that input is propagated to the program via the Common Gateway Interface,
or CGI.
Programs which generate web pages on-the-fly (commonly called ``CGI programs'') can be written in any language. Perl,
an interpreted or scripting language, is currently the language of choice for writing CGI programs (also called ``CGI
scripts'' if they're written in a scripting language), but it's eminently feasible, not difficult, and possibly advantageous to
write them in C, which the rest of this handout will describe how to do.
The basic operation of a CGI program is quite simple. Since its job is to generate a web page on the fly, that's just what it
must do, by ``printing'' text to its standard output. When we're writing a CGI program in C, therefore, we'll be calling
printf a lot, to generate the text we want the virtual page (that is, the one we're building) to contain. Some of the
printf calls will print constant or ``boilerplate'' text; for example, the first few lines of almost any C-written CGI
program will be along the lines of
printf("Content-Type: text/html\n\n");
printf("<html>\n");
printf("<head>\n");
The <html> and <head> tags are the ones that appear at the beginning of any HTML page. The first line informs the
receiving browser that the page is encoded in HTML. For static pages, that line is automatically added by the web server
(often as a function of the file name, i.e. if it ends in .html or perhaps .htm), which is why you never see it in static
HTML files. But since CGI programs are generating pages on the fly, they must include that line explicitly. (The content
type is almost invariably text/html, unless you're getting really fancy and generating images on the fly. Notice that the
first, Content-Type: line is followed by two newlines, creating a blank line which separates this ``header'' from the
rest of the HTML document.)
Of course, not all of the printf calls will print constant text (or else the resulting page will essentially be static, and
needn't have been generated by a CGI program at all). Therefore, many of the printf calls will typically insert some
variable text into the output. For example, if title is a string variable containing a title which has been generated for the
generated page, one of the next lines in the CGI program might be
printf("<title>%s</title>\n", title);
Many of the decisions controlling the output of a particular run of a CGI program will of course be made based on
selections made by the requesting user (perhaps by having filled out a form). The most important thing to understand
about CGI programming (in fact, the very aspect of CGI programming which gives it its name, that is, the aspect which
the CGI specification specifies) is the set of mechanisms by which the user's choices and other information are made
available to a CGI program.
One mechanism is by means of the environment. The environment is a set of variables maintained, not by any one
program, but by the operating system itself, on behalf of a user or process. For example, on MS-DOS and Unix systems,
the PATH variable contains a list of directories to be searched for executable programs. On Unix systems, other common
environment variables are HOME (the user's home directory), TERM (the user's terminal type) and MAIL (the user's
electronic mailbox).
Environment variables are completely separate from the variables used in programs written in typical programming
languages, such as C. When a C program wishes to access an environment variable, it does not simply declare a C variable
http://www.eskimo.com/~scs/cclass/handouts/cgi.html (1 of 10) [22/07/2003 5:37:08 PM]
of the same name, and hope that the value will somehow be magically linked into the program. Instead, a C program must
call the library function getenv to fetch environment variables. The getenv function accepts a string which is the name
of an environment variable to be fetched, and returns a string which is the contents of the variable. (Environment variables
are always strings.) If the requested environment variable does not exist, getenv returns a null pointer, instead.
For example, here is a scrap of code which would print out a user's terminal type (presumably on a Unix system):
char *term;
termtype = getenv("TERM");
if(termtype != NULL)
printf("your terminal type is \"%s\"\n", termtype);
else
printf("no terminal type defined\n");
Two things to be careful of when calling getenv are
1. As already mentioned, it returns a null pointer if the requested variable doesn't exist, and this null pointer return
must always be checked for; and
2. You shouldn't modify (scribble on) the string it returns, that is, you should treat it as a constant string.
A web daemon (a.k.a. HTTP server or httpd) which supports CGI passes a host of environment variables to a CGI
program. Many of them are somewhat esoteric, and this handout will not describe them all. (See the References at the
end.) The ones which are most likely to be of use to ordinary CGI programs are:
REQUEST_METHOD The specific request that the client used to fetch the page (that is, to invoke the CGI
program): usually GET or POST.
PATH_INFO Any extra path information that appeared after the program name in the URL that invoked this CGI
program.
PATH_TRANSLATED The path information from PATH_INFO, with any ``virtual-to-physical mapping''
performed on it.
SCRIPT_NAME The actual name of the CGI program that is running, as a URL, in case the generated page needs
to reference itself.
QUERY_STRING The query being performed by the user. In the case of form processing, the QUERY_STRING
variable contains encoded information about each field filled in by the user. (We'll have much more to say about
QUERY_STRING in a bit.)
REMOTE_HOST The hostname of the client (if known).
CONTENT_TYPE The HTTP/MIME content type of the attached information, for queries (such as POST) with
attached information.
CONTENT_LENGTH The size of the attached information, if any.
HTTP_USER_AGENT An indication of the name and version of the client browser in use.
HTTP_ACCEPT The specification by the client browser of which document types it will accept.
For simple uses of the above-listed variables, it suffices to call getenv and then inspect the returned string (if any). For
example, you could generate different output depending on which browser a user is using with code like this:
char *browser = getenv("HTTP_USER_AGENT");
if(browser != NULL && strstr(browser, "Mozilla") != NULL)
printf("I see you're using Netscape, just like everyone else.\n");
else
printf("Congratulations on your daring choice of browser!\n");
(However, this is not a terribly good example. Building knowledge of specific browsers into web pages and servers is a
risky proposition, as there will always be more browsers out there than you've heard of. Doing so also flies in the face of
the notion of interoperability, which is one of the very foundations of the Internet. If the protocols are well designed, and
if all clients and servers implement them properly, no specific server should ever need to know what kind of client it's
talking to, or vice versa.)
Environment variables are not the only input mechanism available in CGI. It's also possible for input to arrive via the
standard input, to be read with getchar or the like. In fact, in production CGI programming, the standard input is
heavily used, because it's where the user's query information arrives when the retrieval method (that is,
REQUEST_METHOD) is POST. Because it allows essentially unlimited-size requests, POST is strongly recommended as
the request method for HTML forms. It will be a bit easier, however, to describe and demonstrate the processing of query
information if we restrict ourselves to the environment variables (specifically, QUERY_STRING), so our examples will
use the request method of GET (which causes request information to be sent as a query string rather than on standard
input).
For simple queries (such as <ISINDEX> searches), the QUERY_STRING environment variable simply contains the query
string (although it will have been encoded; see below). For more complicated queries, however, and in particular for
queries resulting from filled-out and submitted HTML forms, the value of the QUERY_STRING variable is a complicated
string, with substructure, containing potentially many pieces of information. The basic syntax of the QUERY_STRING
string in this case is
name=value&name=value&name=value
The string is a series of name=value pairs, separated by ampersand characters. Each name is the name attribute from
one of the input elements in the form; the value is either the text typed by the user for an element with a type attribute
of text, password, or textarea, or one of the <option> values from a <select> element, or the value
attribute of a selected element of type checkbox or radio (or `òn'' for selected checkbox elements without value
attributes).
What if some text typed by the user (that is, one of the values) happened to contain an = or & character? That would
certainly screw up the syntax of the QUERY_STRING string. To guard against this problem, and to keep the query string a
single string, it is encoded in two ways:
1. All spaces are replaced by + signs.
2. All = and & characters, and any other special characters (such as + and % characters) are replaced by a percent sign
(%) followed by a two-digit hexadecimal number representing the character's value in ASCII. For example, = is
represented by %3d, & is represented by %26, + is represented by %2b, % is represented by %25, and ~ is often
represented by %7e.
A CGI program must arrange to decode (that is, undo) these encodings before attempting to make further use of the
information in the QUERY_STRING variable.
We're going to write a function to centralize the parsing of the QUERY_STRING variable, for three reasons:
1. Splitting the string into name/value pairs is a bit of a nuisance, and is best isolated from the calling code;
2. Decoding the + and % encodings is a bit of a nuisance, and is best isolated from the calling code; and
3. If we do it right, we can arrange that improving our programs later to use POST instead of GET (that is, to retrieve
query information from the standard input rather than the QUERY_STRING variable) will require rewriting only
this one function, not the rest of the program.
The function we'll write is very simple; its prototype is

char *cgigetval(char *);
You hand it the name of the value you're looking for, and it either returns the (properly decoded) value, or a null pointer if
no value by that name could be found. (This might happen if the program or the calling form were improperly written, or
if the user hadn't typed anything or selected anything in the requested input element.) The strings returned by
cgigetval are in dynamically-allocated memory, that is, cgigetval calls malloc to get storage for each string it
returns, and returns a pointer. This means that
1. the caller doesn't have to worry about providing a buffer or anything;
2. the caller can modify the string (as long as no attempt is made to extend it); but
3. the caller is responsible for freeing the memory, if necessary.
(In some programs, the last requirement could be a significant problem; it might be a real nuisance for the caller to have to
remember to free the returned pointers, and the program might run out of memory if callers forgot. In this case, however,
it's not likely that we'll run out of memory in any case--CGI programs are and should be short-lived--and the advantages of
having cgigetval return dynamically-allocated memory will be real conveniences.)
Before presenting the innards of the cgigetval function, let's look at how we'll use it in a (contrived, but working and
illustrative) real example.
Our example CGI program will be called by this web page:
<html>
<head>
<title>CGI test page</title>
</head>
<body>
<form action="/cgi-bin/sillycgi" method="get">
<p>
This page demonstrates simple form and CGI processing.
<p>
Enter a line of text here:
<input type="text" name="textfield">
<p>
Select a transformation to apply to the text:
<br>
<input type="radio" name="edittype" value="reverse">
Reverse
<br>
<input type="radio" name="edittype" value="upper">
Upper-case
<br>
<input type="radio" name="edittype" value="lower">
Lower-case
<p>
Press this button to see the result:
<input type="submit">
<p>
(Or press this button to clear this form:
<input type="reset">)
</form>
</body>
The bulk of this page is a form, with one text input area, one set of three radio buttons, a submit button, and a clear button.
The user is supposed to type some text, select a radio button representing one of three transformations to be applied to the
text, and finally press the ``Submit'' button. Notice that the four <input> tags all have name attributes; these are the
names we'll retrieve the values by. (The names of the three radio button elements are the same; this is how the browser
knows to implement the one-of-n functionality on the group of three.)
The form header is
<form action="/cgi-bin/sillycgi" method="get">
If you try this form out, you may have to modify the URL in the action attribute, depending on how CGI programs are
accessed under your server. The method attribute is specified as ``get'', which means that our CGI program will receive
the user's form inputs in the QUERY_STRING variable.
Here is the CGI program itself. It's rather small and simple, because cgigetval does most of the hard work, which is as
it should be.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern char *cgigetval(char *);
main()
{
char *browser;
char *edittype;
char *text;
printf("Content-Type: text/html\n\n");
printf("<html>\n");
printf("<head>\n");
printf("<title>CGI test result</title>\n");
printf("</head>\n");
printf("<body>\n");
browser = getenv("HTTP_USER_AGENT");
printf("<p>\n");
printf("Hello, ");
if(browser != NULL && strstr(browser, "Lynx") != NULL)
printf("Lynx");
else if(browser != NULL && strstr(browser, "Mosaic") != NULL)

printf("Mosaic");
else if(browser != NULL && strstr(browser, "Mozilla") != NULL)
printf("Netscape");
else
printf("unknown browser");
printf(" user!\n");
printf("<p>\n");
printf("Here is your modified text:\n");
printf("<br>\n");
text = cgigetval("textfield");
edittype = cgigetval("edittype");
if(text == NULL)
{
printf("You didn't enter any text!\n");
}
else if(edittype != NULL && strcmp(edittype, "reverse") == 0)
{
reverse(text);
printf("%s\n", text);
}
else if(edittype != NULL && strcmp(edittype, "upper") == 0)
{
upperstring(text);
}
else if(edittype != NULL && strcmp(edittype, "lower") == 0)
{
lowerstring(text);
}
else
{
printf("You didn't select a transformation!\n");
}
printf("</body>\n");
printf("</html>\n");
}
Just for fun, we inspect HTTP_USER_AGENT and print a slightly different greeting depending on which browser the user
seems to be using. Then, we call cgigetval twice, to retrieve the user's text and radio button selections. Depending on
the radio button selection, we call one of three different functions to transform the text. (As we mentioned, we're allowed
to modify the strings returned by cgigetval, in this case, text.) We also check for either the textfield or
edittype values coming back as null pointers, indicating that the user neglected to enter them.
Reversing a string in-place simply involves calling the reverse function from assignment 4. Converting strings to upperor lower case is also easy; here are functions to do it:
#include <ctype.h>
upperstring(char *str)
{
char *p;
for(p = str; *p != '\0'; p++)
{
if(islower(*p))
*p = toupper(*p);
}
}
lowerstring(char *str)
{
char *p;
for(p = str; *p != '\0'; p++)
{
if(isupper(*p))
*p = tolower(*p);
}
}
The isupper, islower, toupper, and tolower functions are all in the standard C library, and are declared in the
header file <ctype.h>. They operate on characters, with the obvious results. (Actually, on a modern, ANSI-compatible
system, the calls to isupper and islower are not required, but they don't hurt much. Older systems required them.)
Finally, here is the code for cgigetval. It is somewhat long and involved, partly because it has a significant amount of
work to do, partly because of some stubborn details of the sort that sometimes crop up in real-world programs.
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *unescstring(char *, int, char *, int);
char *cgigetval(char *fieldname)
{
int fnamelen;
char *p, *p2, *p3;
int len1, len2;
static char *querystring = NULL;
if(querystring == NULL)
{
querystring = getenv("QUERY_STRING");
if(querystring == NULL)
return NULL;
}
if(fieldname == NULL)
return NULL;
fnamelen = strlen(fieldname);
for(p = querystring; *p != '\0';)
{
p2 = strchr(p, '=');
p3 = strchr(p, '&');
if(p3 != NULL)
len2 = p3 - p;
else
len2 = strlen(p);
if(p2 == NULL || p3 != NULL && p2 > p3)
{
/* no = present in this field */
p3 += len2;
continue;
}
len1 = p2 - p;
if(len1 == fnamelen && strncmp(fieldname, p, len1) == 0)
{
/* found it */
int retlen = len2 - len1 - 1;
char *retbuf = malloc(retlen + 1);
if(retbuf == NULL)
return NULL;
unescstring(p2 + 1, retlen, retbuf, retlen+1);
return retbuf;
}
p += len2;
if(*p == '&')
p++;
}
/* never found it */
return NULL;
}
cgigetval presents a perfect example of a topic which we mentioned in passing but never gave any realistic examples
of: local static variables. cgigetval's job is to parse the value of the QUERY_STRING environment variable, but the
value of that variable will remain constant over one run of a program calling cgigetval. Therefore, we only need to
call getenv once, the first time cgigetval is called. Thereafter, we can continue to use the previous value.
cgigetval maintains the value of QUERY_STRING in a local static variable, querystring. This variable is
initialized to NULL, so the first time cgigetval is called, it notices that querystring is null, and calls getenv to
properly initialize it. The whole point of a local static variable is that it retains its value from call to call, so on subsequent
calls to cgigetval, querystring will not be null, and there will be no need to call getenv again.
The main loop in cgigetval loops over each name/value pair in the query string, one pair at a time. The standard
library strchr function (declared in <string.h>) is used to locate the next = and & characters in the string. It would
be easier to overwrite the & and/or = characters with '\0', to isolate the name and value substrings as true 0-terminated
strings, but we can't modify querystring (which was obtained from getenv, and which we'll be using again next
time we're called). Instead, we compute the lengths of the entire name/value pair we're looking at and the name within that
pair (in len2 and len1 respectively), and use strncmp to compare each name against the name we're looking for.
(strncmp is a standard library function like strcmp, except that it only compares the first n characters.)
When (if) we find a matching name, it's time to decode and return the value. First, we call malloc to allocate the return
buffer, remembering to add 1 to the length, to leave room for the terminating \0. We conservatively estimate the size of
the buffer we'll need by using the length of the encoded string. If there are any % encodings, this means we'll allocate more
than we need (the decoded string will end up shorter than the encoded string), but it's too hard to figure out in advance
exactly how big a buffer we'd need, and in this case, the slight amount of wasted memory should be inconsequential.
If we don't find a matching string, we take another trip through the main loop. Each trip through the loop, p points at the
remainder of the string we're to examine. After discarding a name/value pair, therefore, we add len2 to p, which either
moves it to the & or to the \0 at the very end of query_string. If we're now pointing at an &, we increment p by one
more so that it points to the first character of the next name/value pair.
Here is the unescstring function. It accepts a pointer to the encoded string, the length of the encoded string, a pointer
to the location to store the decoded result, and the maximum size of the decoded result (so that it can double-check it's not
overflowing anything). We pass the encoded string in as a pointer plus a length, rather than as a 0-terminated string,
because in most cases it will actually be an unterminated substring, and the length we receive will be such that we stop
decoding just before the & that separates one name/value pair from the next.
static int xctod(int);
char *unescstring(char *src, int srclen, char *dest, int destsize)
{
char *endp = src + srclen;
char *srcp;
char *destp = dest;
int nwrote = 0;
for(srcp = src; srcp < endp; srcp++)
{
if(nwrote > destsize)
return NULL;
if(*srcp == '+')
*destp++ = ' ';
else if(*srcp == '%')
{
*destp++ = 16 * xctod(*(srcp+1)) + xctod(*(srcp+2));
srcp += 2;
}
else
*destp++ = *srcp;
nwrote++;
}
*destp = '\0';
return dest;
}
static int xctod(int c)
{
if(isdigit(c))
return c - '0';
else if(isupper(c))
return c - 'A' + 10;
else if(islower(c))
return c - 'a' + 10;
else
return 0;
}
The basic operation is simple: copy characters one by one from the source to the destination, converting + signs to spaces,
converting % sequences, and passing other characters through unchanged. It's mildly tricky to convert a % sequence to the
equivalent character value. Here, we use an auxiliary function xctod which converts one hexadecimal digit to its decimal
equivalent. (xctod is declared static because it's private to unescstring and to this source file.) xctod converts
the digit characters 0-9 to the values 0-9, and a-f or A-F to the values 10-15. (It does so by subtracting character
constants which provide the correct offsets, without our having to know the particular character set values.)
It would also be possible to convert the hexadecimal digits by calling sscanf, and thus dispense with xctod. The code
would look like this:
else if(*srcp == '%')
{
int x;
sscanf(srcp+1, "%2x", &x);
*destp++ = x;
srcp += 2;
}
The %2x in the sscanf format string means to convert a hexadecimal integer that's exactly two characters long.
And that's our introduction to writing CGI programs in C! I encourage you to type in the example and play with it
(assuming you have access to a web server where you can install CGI programs at all, of course.) If you would like more
information on CGI programming, you can read the page http://hoohoo.ncsa.uiuc.edu/cgi/overview.html (which is where I
learned everything I've presented here).
Introductory C Programming -- Assignments
Introductory C Programming -Assignments

Here are the problem sets I hand out during each of the eight weeks this class runs when I teach it in
person. If you're out there on the net somewhere, unable to attend the class in person, feel free to follow
along here!
Week 1
Topics: basics -- simple example programs
Assignment
Answers
Week 2
Topics: data types and constants; variables, identifiers, and declarations; operators and expressions
Assignment
Answers
Week 3
Topics: arrays, more about declarations (incl. initialization), functions
Assignment
Answers
Week 4
Topics: assignment, increment, and decrement operators; character I/O; strings
Assignment
Answers
Week 5
Topics: C preprocessor
Assignment
http://www.eskimo.com/~scs/cclass/asgn.beg/index.html (1 of 2) [22/07/2003 5:37:10 PM]
Introductory C Programming -- Assignments
Answers
Week 6
Topics: pointers, strings
Assignment
Answers
Week 7
Topics: array/pointer equivalence, memory allocation, more about strings
Assignment
Answers
Week 8
Topics: command line, file I/O
One last handout: notes, Chapter 14
http://www.eskimo.com/~scs/cclass/asgn.beg/index.html (2 of 2) [22/07/2003 5:37:10 PM]
Assignment #1
Assignment #1
Introductory C Programming
UW Experimental College
Assignment #1
Handouts:
Syllabus
Assignment #1
Class Notes, Chapter 1
Reading Assignment:
Class Notes, Chapter 2 (optional)
Review Questions:
1.
2.
3.
4.
5.
6.
7.
Approximately what is the line #include <stdio.h> at the top of a C source file for?
What are some uses for comments?
Why is indentation important? How carefully does the compiler pay attention to it?
What are the largest and smallest values that can be reliably stored in a variable of type int?
What is the difference between the constants 7, '7', and "7"?
What is the difference between the constants 123 and "123"?
What is the function of the semicolon in a C statement?
Exercises: (easy to harder; do as many as you like, but at least two)

1. Get the ``Hello, world!'' program to work on your computer. If you're using a Unix machine, the
http://www.eskimo.com/~scs/cclass/asgn.beg/PS1.html (1 of 3) [22/07/2003 5:37:12 PM]
Assignment #1
instructions in the notes should get you started. If you're using a commercial compiler on a home
computer, the compiler's instruction manuals should (really, must) tell you how to enter, compile, and run
a program. (In either case, the ``Compiling C Programs'' handout should help, too.)
Figure out how to print both the program's source code and its output, or how to send me both the source
code and the output by e-mail.
2. What do these loops print?
for(i = 0; i < 10; i = i + 2)
printf("%d\n", i);
for(i = 100; i >= 0; i = i - 7)
printf("%d\n", i);
for(i = 1; i <= 10; i = i + 1)
printf("%d\n", i);
for(i = 2; i < 100; i = i * 2)
printf("%d\n", i);
3. Write a program to print the numbers from 1 to 10 and their squares:
1
2
3
...
10
1
4
9
100
4. Write a program to print this triangle:

*
**
***
****
*****
******
*******
********
*********
**********
Assignment #1
Don't use ten printf statements; use two nested loops instead. You'll have to use braces around the
body of the outer loop if it contains multiple statements:
for(i = 1; i <= 10; i = i + 1)
{
/* multiple statements */
/* can go in here */
}
(Hint: a string you hand to printf does not have to contain the newline character \n.)
Copyright
http://www.eskimo.com/~scs/cclass/asgn.beg/copyright.html [22/07/2003 5:37:17 PM]
Assignment #1 Answers
Assignment #1 ANSWERS
Question 1. Approximately what is the line #include <stdio.h> at the top of a C source file for?
In the case of our first few programs, it lets the compiler know some important information about the
library function, printf. It also lets the compiler know similar information about other functions in the
``Standard I/O library,'' some of which we'll be learning about later. (It also provides a few I/O-related
definitions which we'll be using later.) We'll learn more about the #include directive when we cover
the C Preprocessor in a few weeks.
Question 2. What are some uses for comments?
Describing what a particular variable is for, describing what a function is for and how it works,
documenting the name, version number, purpose, and programmer of an entire program, explaining any
tricky or hard-to-understand part about a program, ...
Question 3. Why is indentation important? How carefully does the compiler pay attention to it?
Indentation is important to show the logical structure of source code, in particular, to show which parts-the ``bodies`` of loops, if/else statements, and other control flow constructs--are subsidiary to which
other parts.
Indentation is not specially observed by the compiler; it treats all ``white space'' the same. Code which is
improperly indented will give a human reader a mistaken impression of the structure of the code, an
impression potentially completely different from the compiler's. Therefore, it's important that the
punctuation which describes the block structure of code to the compiler matches the indentation.
Question 4. What are the largest and smallest values that can be reliably stored in a variable of type
int?
32,767 and -32,767.
Question 5. What is the difference between the constants 7, '7', and "7"?
http://www.eskimo.com/~scs/cclass/asgn.beg/PS1a.html (1 of 3) [22/07/2003 5:37:19 PM]
The constant 7 is the integer 7 (the number you get when you add 3 and 4). The constant '7' is a
character constant consisting of the character `7' (the key between `6' and `8' on the keyboard, which also
has a `&' on it on mine). The constant "7" is a string constant consisting of one character, the character
`7'.
Question 6. What is the difference between the constants 123 and "123"?
The constant 123 is the integer 123. The constant "123" is a string constant containing the characters
1, 2, and 3.
Question 7. What is the function of the semicolon in a C statement?
It is the statement terminator.
Exercise 3. Write a program to print the numbers from 1 to 10 and their squares.
#include <stdio.h>
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
printf("%d %d\n", i, i * i);
return 0;
}
Exercise 4. Write a program to print a simple triangle of asterisks.
#include <stdio.h>
int main()
{
int i, j;
for(i = 1; i <= 10; i = i + 1)
{
for(j = 1; j <= i; j = j + 1)
printf("*");
printf("\n");
}
return 0;
}
Assignment #2
Assignment #2
Assignment #2
Handouts:
Assignment #2
Reading Assignment:
Class Notes, Chapter 3, Secs. 3.1-3.2
Class Notes, Chapter 3, Secs. 3.3-3.6 (optional)
Review Questions:
1. What are the two different kinds of division that the / operator can do? Under what circumstances does
it perform each?
2. What are the definitions of the ``Boolean'' values true and false in C?
3. Name three uses for the semicolon in C.
4. What would the equivalent code, using a while loop, be for the example
for(i = 0; i < 10; i = i + 1)
?
5. What is the numeric value of the expression 3 < 4 ?
6. Under what conditions will this code print ``water''?
if(T < 32)
printf("ice\n");
else if(T < 212)
printf("water\n");
else
printf("steam\n");
7. What would this code print?
Assignment #2
int x = 3;
if(x)
printf("yes\n");
else
printf("no\n");
8. (trick question) What would this code print?
int i;
for(i = 0; i < 3; i = i + 1)
printf("a\n");
printf("b\n");
printf("c\n");
Tutorial Section
(This section presents a few small but complete programs and asks you to work with them. If you're
comfortable doing at least some of the exercises later in this assignment, there's no reason for you to work
through this ``tutorial section.'' But if you're not ready to do the exercises yet, this section should give you some
practice and get you started.)
1. Type in and run this program, and compare its output to that of the original ``Hello, world!'' program
(exercise 1 in assignment 1).
#include <stdio.h>
int main()
{
printf("Hello, ");
printf("world!\n");
return 0;
}
You should notice that the output is identical to that of the original ``Hello, world!'' program. This shows
that you can build up output using multiple calls to printf, if you like. You mark the end of a line of
output (that is, you arrange that further output will begin on a new line) by printing the ``newline''
character, \n.
2. Type in and run this program:
#include <stdio.h>
int main()
{
int i;
Assignment #2
printf("statement 1\n");
for(i = 0; i < 10; i = i + 1)
{
}
return 0;
}
This program doesn't do anything useful; it's just supposed to show you how control flow works--how
statements are executed one after the other, except when a construction such as the for loop alters the
flow by arranging that certain statements get executed over and over. In this program, each simple
statement is just a call to the printf function.
Now delete the braces {} around statements 3 and 4, and re-run the program. How does the output
change? (See also question 8 above.)
#include <stdio.h>
int main()
{
int i, j;
printf("start of program\n");
for(i = 0; i < 3; i = i + 1)
{
for(j = 0; j < 5; j = j + 1)
printf("i is %d, j is %d\n", i, j);
printf("end of i = %d loop\n", i);
}
printf("end of program\n");
return 0;
}
This program doesn't do much useful, either; it's just supposed to show you how loops work, and how
loops can be nested. The outer loop runs i through the values 0, 1, and 2, and for each of these three
values of i (that is, during each of the three trips through the outer loop) the inner loop runs, stepping
the variable j through 5 values, 0 to 4. Experiment with changing the limits (initially 3 and 5) on the
Assignment #2
two loops. Experiment with interchanging the two loops, that is, by having the outer loop manipulate the
variable j and the inner loop manipulate the variable i. Finally, compare this program to the triangleprinting program of Assignment 1 (exercise 4) and its answer as handed out this week.
#include <stdio.h>
int main()
{
int day, i;
for(day = 1; day <= 3; day = day + 1)
{
printf("On the %d day of Christmas, ", day);
printf("my true love gave to me\n");
for(i = day; i > 0; i = i - 1)
{
if(i == 1)
{
if(day == 1) printf("A ");
else
printf("And a ");
printf("partridge in a pear tree.\n");
}
else if(i == 2)
printf("Two turtledoves,\n");
else if(i == 3)
printf("Three French hens,\n");
}
printf("\n");
}
return 0;
}
The result (as you might guess) should be an approximation of the first three verses of a popular (if
occasional tiresome) song.
a. Add a few more verses (calling birds, golden rings, geese a-laying, etc.).
b. Here is a scrap of code to print words for the days instead of digits:
if(day == 1)
printf("first");
else if(day == 2)
printf("second");
else if(day == 3)
Assignment #2
printf("third");
Incorporate this code into the program.
c. Here is another way of writing the inner part of the program:
if(day >= 3)
printf("Three French hens,\n");
if(day >= 2)
printf("Two turtledoves,\n");
if(day >= 1)
{
if(day == 1) printf("A ");
else
printf("And a ");
printf("partridge in a pear tree.\n");
}
Study this alternate method and figure out how it works. Notice that there are no else's between the
if's.
Exercises:
(As before, these range from easy to harder. Do as many as you like, but at least two or three.)
1. Write a program to find out (the hard way, by counting them) how many of the numbers from 1 to 10 are
greater than 3. (The answer, of course, should be 7.) Your program should have a loop which steps a
variable (probably named i) over the 10 numbers. Inside the loop, if i is greater than 3, add one to a
second variable which keeps track of the count. At the end of the program, after the loop has finished,
print out the count.
2. Write a program to compute the average of the ten numbers 1, 4, 9, ..., 81, 100, that is, the average of the
squares of the numbers from 1 to 10. (This will be a simple modification of Exercise 3 from last week:
instead of printing each square as it is computed, add it in to a variable sum which keeps track of the
sum of all the squares, and then at the end, divide the sum variable by the number of numbers summed.)
If you keep track of the sum in a variable of type int, and divide by the integer 10, you'll get an integeronly approximation of the average (the answer should be 38). If you keep track of the sum in a variable
of type float or double, on the other hand, you'll get the answer as a floating-point number, which
you should print out using %f in the printf format string, not %d. (In a printf format string, %d
prints only integers, and %f is one way to print floating-point numbers. In this case, the answer should
be 38.5.)
3. Write a program to print the numbers between 1 and 10, along with an indication of whether each is
even or odd, like this:
1 is odd
Assignment #2
2 is even
3 is odd
...
4.
5.
6.
7.
(Hint: use the % operator.)

On page 3 of chapter 3 of the notes is a scrap of code to print the words ``Northeast'', ``Southwest'', etc.
corresponding to the x and y components of a direction of travel. Modify the code to also print ``North'',
`Èast'', etc. if x or y is 0.
Write a program to print the first 7 positive integers and their factorials. (The factorial of 1 is 1, the
factorial of 2 is 1 * 2 = 2, the factorial of 3 is 1 * 2 * 3 = 6, the factorial of 4 is 1 * 2 * 3 * 4 = 24, etc.)
[Extra credit: why did I only ask for the first 7?]
Write a program to print the first 10 Fibonacci numbers. Each Fibonacci number is the sum of the two
preceding ones. The sequence starts out 0, 1, 1, 2, 3, 5, 8, ...
Make some improvements to the prime number printing program in section 3.6 of the notes. Other than
2, no even numbers are prime, so have the program test only the odd numbers between 3 and 100.
Rather than recomputing sqrt(i) each time through the inner loop, compute it just once for each new
value of i (that is, compute it at the beginning of the body of the outer loop, just before the beginning of
the inner loop) and assign its value to another variable.
Question 1. What are the two different kinds of division that the / operator can do? Under what
circumstances does it perform each?
The two kinds of division are integer division, which discards any remainder, and floating-point division,
which yields a floating-point result (with a fractional part, if necessary). Floating-point division is
performed if either operand (or both) is floating-point (that is, if either number being operated on is
floating-point); integer division is performed if both operands are integers.
Question 2. What are the definitions of the ``Boolean'' values true and false in C?
False is always represented by a zero value. Any nonzero value is considered true. The built-in operators
<, <=, >, >=, ==, !=, &&, ||, and ! always generate 0 for false and 1 for true.
Question 3. Name three uses for the semicolon in C.
Terminating declarations, terminating statements, and separating the three control expressions in a for
loop.
Question 4. What would the equivalent code, using a while loop, be for the example
for(i = 0; i < 10; i = i + 1)
?
Just move the first (initialization) expression to before the loop, and the third (increment) expression to
the body of the loop:
i = 0;
while(i < 10)
{
i = i + 1;
}
Question 5. What is the numeric value of the expression 3 < 4 ?
1 (or ``true''), because 3 is in fact less than 4.
Question 6. Under what conditions will this code print ``water''?
if(T < 32)
printf("ice\n");
else if(T < 212)
printf("water\n");
else
printf("steam\n");
If T is greater than or equal to 32 and less than 212. (If you said ``greater than 32 and less than 212'', you
weren't quite right, and this kind of distinction--paying attention to the difference between ``greater than''
and ``greater than or equal''--is often extremely important in programming.)
Question 7. What would this code print?
int x = 3;
if(x)
printf("yes\n");
else
printf("no\n");
It would print ``yes'', since x is nonzero.
Question 8. What would this code print?
int i;
for(i = 0; i < 3; i = i + 1)
printf("a\n");
printf("b\n");
printf("c\n");
It would print
a
a
a
b
c
The indentation of the statement printf("b\n"); is (deliberately, for the sake of the question)
misleading. It looks like it's part of the body of the for statement, but the body of the for statement is
always a single statement, or a list of statements enclosed in braces {}. In this case, the body of the loop
is the call printf("a\n");. The two statements printf("b\n"); and printf("c\n"); are
normal statements following the loop.
The code would be much clearer if the printf("b\n"); line were indented to line up with the
printf("c\n"); line. If the intent was that ``b'' be printed 3 times, along with `à'', it would be
necessary to put braces {} around the pair of lines printf("a\n"); and printf("b\n"); .
Exercise 1. Write a program to find out how many of the numbers from 1 to 10 are greater than 3.
#include <stdio.h>
int main()
{
int i;
int count = 0;
for(i = 1; i <= 10; i = i + 1)
{
if(i > 3)
count = count + 1;
}
printf("%d numbers were greater than 3\n", count);
return 0;
}
Tutorial 4b. Print `Òn the first day'' instead of `Òn the 1 day'' in the days-of-Christmas program.
Simply replace the lines
with
printf("On the ");
if(day == 1) printf("first");
else if(day == 2) printf("second");
else if(day == 3) printf("third");
else if(day == 4) printf("fourth");
else if(day == 5) printf("fifth");
else if(day == 6) printf("sixth");
else
printf("%d", day);
printf(" day of Christmas, ");
The final
else printf("%d", day);
is an example of ``defensive programming.'' If the variable day ever ends up having a value outside the
range 1-6 (perhaps because we later added more verses to the song, but forgot to update the list of
words), the program will at least print something.
Exercise 2. Write a program to compute the average of the ten numbers 1, 4, 9, ..., 81, 100.
#include <stdio.h>
int main()
{
int i;
float sum = 0;
int n = 0;
for(i = 1; i <= 10; i = i + 1)
{
sum = sum + i * i;
n = n + 1;
}
printf("the average is %f\n", sum / n);
return 0;
}
Exercise 3. Write a program to print the numbers between 1 and 10, along with an indication of whether
each is even or odd.
#include <stdio.h>
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
if(i % 2 == 0)
printf("%d is even\n", i);
else
printf("%d is odd\n", i);
}
return 0;
}
Exercise 4. Print out the 8 points of the compass, based on the x and y components of the direction of
travel.
if(x > 0)
{
if(y > 0)
printf("Northeast.\n");
else if(y < 0)
printf("Southeast.\n");
else
printf("East.\n");
}
else if(x < 0)
{
if(y > 0)
printf("Northwest.\n");
else if(y < 0)
printf("Southwest.\n");
else
printf("West.\n");
}
else
{
if(y > 0)
printf("North.\n");
else if(y < 0)
printf("South.\n");
else
printf("Nowhere.\n");
}
(You will notice that this code also prints ``Nowhere'' in the ninth case, namely when x == 0 and y
== 0.)
Exercise 5. Write a program to print the first 7 positive integers and their factorials.
#include <stdio.h>
int main()
{
int i;
int factorial = 1;
for(i = 1; i <= 7; i = i + 1)
{
factorial = factorial * i;
printf("%d
%d\n", i, factorial);
}
return 0;
}
The answer to the extra credit question is that 8!, which is how mathematicians write `èight factorial,'' is
40320, which is not guaranteed to fit into a variable of type int. To write a portable program to compute
factorials higher than 7, we would have to use long int.
Exercise 6. Write a program to print the first 10 Fibonacci numbers.
#include <stdio.h>
int main()
{
int
int
int
int
i;
fibonacci = 1;
prevfib = 0;
tmp;
for(i = 1; i <= 10; i = i + 1)

{
printf("%d
%d\n", i, fibonacci);
tmp = fibonacci;
fibonacci = fibonacci + prevfib;
prevfib = tmp;
}
return 0;
}
(There are many ways of writing this program. As long as the code you wrote printed 1, 1, 2, 3, 5, 8, 13,
21, 34, 55, it's probably correct. If the code you wrote is clearer than my solution above, so much the
better!)
Exercise 7. Make some improvements to the prime number printing program.
Here is a version incorporating both of the suggested improvements:
#include <stdio.h>
#include <math.h>
int main()
{
int i, j;
double sqrti;
printf("%d\n", 2);
for(i = 3; i <= 100; i = i + 2)
{
sqrti = sqrt(i);
for(j = 2; j < i; j = j + 1)
{
if(i % j == 0)
break;
if(j > sqrti)
{
printf("%d\n", i);
break;
}
}
}
return 0;
}

/* not prime */
/* prime */
Assignment #3
Assignment #3
Assignment #3
Handouts:
Assignment #3
Answers to Assignment #2
Reading Assignment:
Class Notes, Chapter 4, Secs. 4.1-4.1.1, 4.3
(optional) Class Notes, Secs. 3.6, 4.1.2-4.2, 4.4, 5.4
Review Questions:
1. How many elements does the array
int a[5];
contain? Which is the first element? The last?
2. What's wrong with this scrap of code?
int a[5];
for(i = 1; i <= 5; i = i + 1)
a[i] = 0;
3. How might you rewrite the dice-rolling program (from the notes, chapter 4, p. 2) without arrays?
4. What is the difference between a defining instance and an external declaration?
5. What are the four important parts of a function? Which three does a caller need to know?
Assignment #3
Tutorial Section
1. Here is another nested-loop example, similar to exercise 4 of assignment 1, and to tutorial 3 of
assignment 2. This one prints an addition table for sums from 1+1 to 10+10.
/* print an addition table for 1+1 up to 10+10 */
#include <stdio.h>
int main()
{
int i, j;
/* print header line: */
printf(" ");
for(j = 1; j <= 10; j = j + 1)
printf(" %3d", j);
printf("\n");
/* print table: */
for(i = 1; i <= 10; i = i + 1)
{
printf("%2d", i);
for(j = 1; j <= 10; j = j + 1)
printf(" %3d", i + j);
printf("\n");
}
return 0;
}
The first j loop prints the top, header row of the table. (The initial printf(" "); is to make it
line up with the rows beneath, which will all begin with a value of i.) Then, the i loop prints the
rest of the table, one row per value of i. For each value of i, we print that value (on the left edge
of the table), and then print the sums resulting from adding that value of i to ten different values
of j (using a second, inner loop on j).
Make a simple modification to the program to print a multiplication table, or a subtraction table.
2. Here is yet another nested-loop example. It is very similar to the one above, except that rather
than printing the sum i+j, it determines whether the sum is even or odd, using the expression
(i+j) % 2 . The % operator, remember, gives the remainder when dividing, and an even
number gives a remainder of 0 when dividing by 2. Depending on whether the sum is even or odd,
the program prints an asterisk or a space. Type in and run the program, and look at the pattern that
results.
Assignment #3
#include <stdio.h>
int main()
{
int i, j;
for(i = 1; i <= 10; i = i + 1)
{
for(j = 1; j <= 10; j = j
{
if((i + j) % 2 ==
printf("*
else
printf("
}
printf("\n");
}
return 0;
}
+ 1)
0)
");
");
(Why are there parentheses around i + j in the expression (i + j) % 2 ? What if they

were left out?)
Modify the program to print a pattern of . and # characters, or X's and O's. If you wish,
experiment by taking the remainder when dividing by 3 or 4, instead. Since the remainder when
dividing by 3 can be 0, 1, or 2, you could use a cascaded if/else statement to print one of three
characters for each sum (or one of four if taking the remainder when dividing by 4).
3. Later in this assignment (exercise 3), you're asked to create a simple function to compute the
squares of numbers. Sometimes, rather than writing a function which will compute a value each
time it's called, it's useful to build an array containing all the values we might need. Here is a
program which declares an array, then fills it with the squares of the numbers from 1 to 10:
#include <stdio.h>
int main()
{
int i;
int squares[11];
/* [0..10]; [0] ignored */
/* fill array: */
for(i = 1; i <= 10; i = i + 1)
squares[i] = i * i;
/* print table: */
printf("n\tsquare\n");
for(i = 1; i <= 10; i = i + 1)
printf("%d\t%d\n", i, squares[i]);
Assignment #3
return 0;
}
There's one slight trick in the declaration of the squares array. Remember that arrays in C are
based at 0. So if we wanted an array to hold the squares of 10 numbers, and if we declared it as
int squares[10]; the array's 10 elements would range from squares[0] to
squares[9]. This program wants to use elements from squares[1] to squares[10], so
it simply declares the array as having size 11, and wastes the 0th element.
The program also uses the tab character, \t, in its printouts, to make the columns line up.
Modify the program so that it also declares and fills in a cubes array containing the cubes (third
powers) of the numbers 1-10, and prints them out in a third column.
4. We're going to write a simple (very simple!) graphics program. We'll write a function
printsquare which prints a square (made out of asterisks) of a certain size. Then we'll use our
favorite for-i-equals-1-to-10 loop to call the function 10 times, printing squares of size 1 to 10.
#include <stdio.h>
int printsquare(int);
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
printsquare(i);
printf("\n");
}
return 0;
}
int printsquare(int n)
{
int i, j;
for(i = 0; i < n; i = i + 1)
{
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
}
return 0;
}
Assignment #3
Type the program in and run it. Then see if you can modify it to print `òpen'' squares, like this:
****
* *
* *
****
instead of filled squares. You'll have to print the box in three parts: first the top row, then a
number of ``* *'' rows, and finally the bottom row. There's obviously no way to print `òpen''
boxes of size 1 or 2, so don't worry about those cases. (That is, you can change the loop in the toplevel main function to for(i = 3; i <= 10; i = i + 1) .)
Exercises (pick three or four):

1. Write code to sum the elements of an array of int. (Write it as a function, if you like.) Use it to
sum the array
int a[] = {1, 2, 3, 4, 5, 6};
(The answer, of course, should be 21).
2. Write a loop to call the multbytwo() function (chapter 5, section 5.1, p. 1) on the numbers 110. For extra credit, compile your main function and the multbytwo() function as two source
files, one function per file.
3. Write a square() function and use it to print the squares of the numbers 1-10:
1 1
2 4
3 9
4 16
...
9 81
10 100
4. Write the function
void printnchars(int ch, int n)
which is supposed to print the character ch, n times. (Remember that %c is the printf format
to use for printing characters.) For example, the call printnchars('x', 5) would print 5
Assignment #3
x's. Use this function to rewrite the triangle-printing program of assignment 1 (exercise 4).
5. Write a function to compute the factorial of a number, and use it to print the factorials of the
numbers 1-7. (Extra credit: print the factorials of the numbers 1-10.)
6. (Kernighan and Ritchie) Write a function celsius() to convert degrees Fahrenheit to degrees
Celsius. (The conversion formula is C = 5/9 * (F - 32).) Use it to print a Fahrenheit-toCentigrade table for -40 to 220 degrees Fahrenheit, in increments of 10 degrees. (Remember that
%f is the printf format to use for printing floating-point numbers. Also, remember that the
integer expression 5/9 gives 0, so you won't want to use integer division.)
7. Modify the dice-rolling program (Sec. 4.1) so that it computes the average of all the rolls of the
pair of dice. Remember that integer division truncates, so you'll have to declare some of your
variables as float or double.
For extra credit, also compute the standard deviation, which can be expressed as
sqrt((sum(x*x) - sum(x)*sum(x)/n) / (n - 1))
(where the notation sum(x) indacates summing all values of x, as usually expressed with the
Greek letter sigma; there is notsuch a sum() function in C!) If you put the line
#include <math.h>
at the top of the file, you'll be able to call sqrt().
(See the note in Assignment #2, Exercise 5, about compiling with the math library under Unix.
There are better ways of computing the standard deviation, but the above expression will suffice
for our purposes.)
8. Write either the function
randrange(int n)
which returns random integers from 1 to n, or the function
int randrange2(int m, int n)
which returns random integers in the range m to n. Use your function to streamline the dicerolling program a bit.
The header file <stdlib.h> defines a constant, RAND_MAX, which is the maximum number
returned by the rand() function. A better way of reducing the range of the rand() function is
like this:
rand() / (RAND_MAX / N + 1)
Assignment #3
(where N is the range of numbers you want).

Extra credit: investigate the behavior of randrange(2), both using the `òbvious'' rangereduction technique (the modulus operator %, as shown in the notes) and this improved method. If
your library's implementation of rand() is a good one, you won't see a difference. But if you're
not so lucky, you may see something very surprising.
9. Rewrite the dice-rolling program to also print a histogram. For example, if there are 21 rolls of 7,
the output line ``7: 21'' should also contain a horizontal row of 21 asterisks. (Use the
printnchars function from exercise 4, if you wish.) The output might look like this:
2: 2
3: 5
4: 4
5: 10
6: 15
7: 28
8: 12
9: 9
10: 7
11: 5
12: 3
**
*****
****
**********
***************
****************************
************
*********
*******
*****
***
Question 1. How many elements does the array int a[5] contain? Which is the first element? The
last?
The array has 5 elements. The first is a[0]; the last is a[4].
Question 2. What's wrong with the scrap of code in the question?
The array is of size 5, but the loop is from 1 to 5, so an attempt will be made to access the nonexistent
element a[5]. A correct loop over this array would run from 0 to 4.
Question 3. How might you rewrite the dice-rolling program without arrays?
About all I can think of would be to declare 11 different variables:
int roll2, roll3, roll4, roll5, roll6, roll7, roll8, roll9,
roll10, roll11, roll12;
sum = d1 + d2;
if(sum == 2)
roll2 =
else (sum == 3)
roll3 =
else (sum == 4)
roll4 =
else (sum == 5)
roll5 =
...
/* sum two die rolls */
roll2 + 1;
roll3 + 1;
roll4 + 1;
roll5 + 1;
What a nuisance! (Fortunately, we never have to write code like this; we just use arrays, since this sort of
situation is exactly what arrays are for.)
Question 4. What is the difference between a defining instance and an external declaration?
A defining instance is a declaration of a variable or function that actually defines and allocates space for
that variable or function. In the case of a variable, the defining instance may also supply an initial value,
using an initializer in the declaration. In the case of a function, the defining instance supplies the body of
the function.
An external declaration is a declaration which mentions the name and type of a variable or function
which is defined elsewhere. An external declaration does not allocate space; it cannot supply the initial
value of a variable; it does not need to supply the size of an array; it does not supply the body of a
function. (In the case of functions, however, an external declaration may include argument type
information; in this case it is an external prototype declaration.)
Question 5. What are the four important parts of a function? Which three does a caller need to know?
The name, the number and type of the arguments, the return type, and the body. The caller needs to know
the first three.
Tutorial 3. Modify the array-of-squares program to also print cubes.
#include <stdio.h>
int main()
{
int i;
int squares[11];
/* [0..10]; [0] ignored */
int cubes[11];
/* fill arrays: */
for(i = 1; i <= 10; i = i + 1)
{
squares[i] = i * i;
cubes[i] = i * i * i;
}
/* print table: */
printf("n\tsquare\tcube\n");
for(i = 1; i <= 10; i = i + 1)
printf("%d\t%d\t%d\n", i, squares[i], cubes[i]);
return 0;
}
Tutorial 4. Rewrite the simple graphics program to print `òpen'' boxes.
I made a new version of the original printsquare function called printbox, like this:
int printbox(int n)
{
int i, j;
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
for(i = 0; i < n-2; i = i + 1)
{
printf("*");
for(j = 0; j < n-2; j = j + 1)
printf(" ");
printf("*\n");
}
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
return 0;
}
A box of size 1 or 2 doesn't have all three parts. If you want the function to handle those sizes more
appropriately, here are the necessary modifications:
int printbox(int n)
{
int i, j;
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
for(i = 0; i < n-2; i = i + 1)
{
printf("*");
for(j = 0; j < n-2; j = j + 1)
printf(" ");
if(n > 1)
printf("*\n");
}
if(n > 1)
{
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
}
return 0;
}
Exercise 1. Write code to sum the elements of an array of int.
Here is a little array-summing function:
int sumnum(int a[], int n)
{
int i;
int sum = 0;
for(i = 0; i < n; i = i + 1)
sum = sum + a[i];
return sum;
}
Here is a test program to call it:
#include <stdio.h>
int a[] = {1, 2, 3, 4, 5, 6};
int sumnum(int [], int);
int main()
{
printf("%d\n", sumnum(a, 6));
return 0;
}
Exercise 2. Write a loop to call the multbytwo function on the numbers 1-10.
#include <stdio.h>
int multbytwo(int);
int main()
{
int i;
for(i = 1; i <= 10; i++)
printf("%d %d\n", i, multbytwo(i));
return 0;
}
Exercise 3. Write a square() function and use it to print the squares of the numbers 1-10.
The squaring function is quite simple:
int square(int x)
{
return x * x;
}
Here is a loop and main program to call it:
#include <stdio.h>
int square(int);
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
printf("%d %d\n", i, square(i));
return 0;
}
Exercise 4. Write a printnchars function, and use it to rewrite the triangle-printing program.
Here is the function:
void printnchars(int ch, int n)
{
int i;
for(i = 0; i < n; i++)
printf("%c", ch);
}
Here is the rewritten triangle-printing program:
#include <stdio.h>
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
printnchars('*', i);
printf("\n");
}
return 0;
}
Exercise 5. Write a function to compute the factorial of a number, and use it to print the factorials of the
numbers 1-7.
int factorial(int x)
{
int i;
int fact = 1;
for(i = 2; i <= x; i = i + 1)
fact = fact * i;
return fact;
}
Here is a driver program:
#include <stdio.h>
int factorial(int);
int main()
{
int i;
for(i = 1; i <= 7; i = i + 1)
printf("%d
%d\n", i, factorial(i));
return 0;
}
The answer to the `èxtra credit'' problem is that to portably compute factorials beyond
factorial(7), it would be necessary to declare the factorial() function as returning long
int (and to declare its local fact variable as long int as well, and to use %ld in the call to
printf). 8! (`èight factorial'') is 40320, but remember, type int is only guaranteed to hold integers up
to 32767.
(Some machines, but not all, have ints that can hold more than 32767, so computing larger factorials on
those machines would happen to work, but not portably. Some textbooks would tell you to `ùse long
int if your machine has 16-bit ints,'' but why write code two different ways depending on what kind
of machine you happen to be using today? I prefer to say, `Ùse long int if you would like results
greater than 32767.'')
Exercise 6. Write a function celsius() to convert degrees Fahrenheit to degrees Celsius. Use it to
print a Fahrenheit-to-Centigrade table for -40 to 220 degrees Fahrenheit, in increments of 10 degrees.
double celsius(double f)
{
return 5. / 9. * (f - 32);
}
Here is the driver program:
#include <stdio.h>
double celsius(double);
int main()
{
double c, f;
for(f = -40; f <= 220; f = f + 10)
printf("%f\t%f\n", f, celsius(f));
return 0;
}
Exercise 7. Modify the dice-rolling program so that it computes the average (and, optionally, the
standard deviation) of all the rolls of the pair of dice.
The new code involves declaring new variables sum, n, and mean (and, for the extra credit problem,
sumsq and stdev), adding code in the main dice-rolling loop to update sum and n (and maybe also
sumsq), and finally adding code at the end to compute the mean (and standard deviation) and print them
out.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
double sum = 0;
double sumsq = 0;
int n = 0;
double mean;
double stdev;
for(i = 2; i <= 12; i = i + 1)
a[i] = 0;
for(i = 0; i < 100; i = i + 1)
{
d1 = rand() % 6 + 1;
d2 = rand() % 6 + 1;
a[d1 + d2] = a[d1 + d2] + 1;
sum = sum + d1 + d2;
sumsq = sumsq + (d1 + d2) * (d1 + d2);
n = n + 1;
}
for(i = 2; i <= 12; i = i + 1)
printf("%d: %d\n", i, a[i]);
printf("\n");
mean = sum / n;
stdev = sqrt((sumsq - sum * sum / n) / (n - 1));
printf("average: %f\n", mean);
printf("std. dev.: %f\n", stdev);
return 0;
}
Exercise 8. Write a randrange function.
Here is a straightforward implementation of randrange2:

#include <stdlib.h>
{
return rand() % (n - m + 1) + m;
}
Here is one using the suggested ``better way of reducing the range of the rand function'':
#include <stdlib.h>
{
return rand() / (RAND_MAX / (n - m + 1) + 1) + m;
}
Notice that I've replaced N in the suggested general form with the expression n - m + 1.
You could implement the simpler randrange function either as
int randrange(int n)
{
return rand() % n + 1;
}
or, using the improvement,
{
return rand() / (RAND_MAX / n + 1) + 1;
}
or, by writing it `òn top of'' the more general randrange2 you already wrote,
{
return randrange2(1, n);
}
The various ``fudge factors'' in these expressions deserve some explanation. The first one is
straightforward: The + 1 in (n - m + 1) simply gives us the number of numbers in the range m to n,
including m and n. (Leaving out the + 1 in this case is the classic example of a fencepost error, named
after the old puzzle, ``How many pickets are there in a picket fence ten feet long, with the pickets one
foot apart?'')
The other +1 is a bit trickier. First let's consider the second implementation of randrange. We want to
divide rand's output by some number so that the results will come out in the range 0 to n - 1. (Then
we'll add in 1 to get numbers in the range 1 through n.) Left to its own devices, rand will return
numbers in the range 0 to RAND_MAX (where RAND_MAX is a constant defined for us in <stdlib.h>).
The division, remember, is going to be integer division, which will truncate. So numbers which would
have come out in the range 0.0 to 0.99999... (if the division were exact) will all truncate to 0, numbers
which would have come out in the range 1.0 to 1.99999... will all truncate to 1, 2.0 to 2.99999... will all
truncate to 2, etc. If we were to divide rand's output by the quantity
RAND_MAX / n
that is, if we were to write
rand() / (RAND_MAX / n)
then when rand returned RAND_MAX, the division could yield exactly n, which is one too many. (This
wouldn't happen too often--only when rand returned that one value, its maximum value--but it would be
a bug, and a hard one to find, because it wouldn't show up very often.) So if we add one to the
denominator, that is, divide by the quantity
RAND_MAX / n + 1
then when rand returns RAND_MAX, the division will yield a number just shy of n, which will then be
truncated to n - 1, which is just what we want. We add in 1, and we're done.
In the case of the more general randrange2, everything we've said applies, with n replaced by n - m
+ 1. Dividing by
RAND_MAX / (n - m + 1)
would occasionally give us a number one too big (n + 1, after adding in m), so we divide by
RAND_MAX / (n - m + 1) + 1
instead.
Finally, just two lines in the dice-rolling program would need to be changed to make use of the new
function:
d1 = randrange(6);
d2 = randrange(6);
or
d1 = randrange2(1, 6);
d2 = randrange2(1, 6);
The answer to the extra-credit portion of the exercise is that under some compilers, the output of the
rand function is not quite as random as you might wish. In particular, it's not uncommon for the rand
funciton to produce alternately even and odd numbers, such that if you repeatedly compute rand % 2,
you'll get the decidedly non-random sequence 0, 1, 0, 1, 0, 1, 0, 1... . It's for this reason that the slightly
more elaborate range-reduction techniques involving the constant RAND_MAX are recommended.
Exercise 9. Rewrite the dice-rolling program to also print a histogram.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
for(i = 2; i <= 12; i = i + 1)

a[i] = 0;
for(i = 0; i
{
d1 =
d2 =
a[d1
}
< 100; i = i + 1)
rand() % 6 + 1;
rand() % 6 + 1;
+ d2] = a[d1 + d2] + 1;
for(i = 2; i <= 12; i = i + 1)

{
printf("%d: %d\t", i, a[i]);
printnchars('*', a[i]);
printf("\n");
}
return 0;
}
The \t in the second-to-last call to printf prints a tab character, so that the histogram bars will all be
lined up, regardless of the number of digits in the particular values of i or a[i]. Another possibility
would be to use printf's width specifier (which we haven't really covered yet) to keep the digits lined
up. That approach might look like this:
printf("%2d: %3d
", i, a[i])
Assignment #4
Assignment #4
Assignment #4
Handouts:
Assignment #4
Class Notes, Chapters 6, 7, and 8
`Èxpressions and Statements'' handout
Reading Assignment:
Class Notes, Chapter 7 (Sec. 7.3 is a bit advanced, but do at least skim it)
Review Questions:
1. What would the expression
c = getchar() != EOF
do?
2. Why must the variable used to hold getchar's return value be type int?
3. What is the difference between the prefix and postfix forms of the ++ operator?
4. (trick question) What would the expression
i = i++
do?
5. What is the definition of a string in C?
6. (Advanced) What will the getline function from section 6.3 of the notes do if successive calls
to getchar return the four values 'a', 'b', 'c', and EOF? Is getline's behavior
reasonable? Is it possible for this situation to occur, that is, for a ``line'' somehow not to be
terminated by \n?
Assignment #4
Tutorial Section:
1. Here is a toy program for computing your car's gas mileage. It asks you for the distance driven on
the last tank of gas, and the amount of gas used (i.e. the amount of the most recent fill-up). Then it
simply divides the two numbers.
#include <stdio.h>
#include <stdlib.h>
/* for atoi() */
int getline(char [], int);

int main()
{
char inputline[100];
float miles;
float gallons;
float mpg;
printf("enter miles driven:\n");
getline(inputline, 100);
miles = atoi(inputline);
printf("enter gallons used:\n");
gallons = atoi(inputline);
mpg = miles / gallons;
printf("You got %.2f mpg\n", mpg);
return 0;
}
int
{
int
int
max
getline(char line[], int max)

nch = 0;
c;
= max - 1;

{
if(c == '\n')
Assignment #4
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
For each of the two pieces of information requested (mileage and gallons) the code uses a little
three-step procedure: (1) print a prompt, (2) read the user's response as a string, and (3) convert
that string to a number. The same character array variable, inputline, can be to hold the string
used both times, because we don't care about keeping the string around once we've converted it to
a number. The function for converting a string to an integer is atoi. (The compiler then
automatically converts the integer returned by atoi into a floating-point number to be stored in
miles or gallons. There's another function we could have used, atof, which converts a
string--possibly including a decimal point--directly into a floating-point number.)
You might wonder why we're reading the user's response as a string, only to turn around and
convert it to a number. Isn't there a way to read a numeric response directly? There is (one way is
with a function called scanf), but we're going to do it this way because it will give us more
flexibility later. (Also, it's much harder to react gracefully to any errors by the user if you're using
scanf for input.)
Obviously, this program would be a nuisance to use if you had several pairs of mileages and
gallonages to compute. You'd probably want the program to prompt you repetitively for additional
pairs, so that you wouldn't have to re-invoke the program each time. Here is a revised version
which has that functionality. It keeps asking you for more distances and more gallon amounts,
until you enter 0 for the mileage.
#include <stdio.h>
#include <stdlib.h>
/* for atoi() */
int getline(char [], int);

int main()
{
Assignment #4
char inputline[100];
float miles;
float gallons;
float mpg;
for(;;)
{
printf("enter miles driven (0 to end):\n");
miles = atoi(inputline);
if(miles == 0)
break;
printf("enter gallons used:\n");
gallons = atoi(inputline);
mpg = miles / gallons;
printf("You got %.2f mpg\n\n", mpg);
}
return 0;
}
The (slightly) tricky part about writing a program like this is: what kind of a loop should you use?
Conceptually, it seems you want some sort of a while loop along the lines of ``while(miles
!= 0)''. But that won't work, because we don't have the value of miles until after we've
prompted the user for it and read it in and converted it.
Therefore, the code above uses what at first looks like an infinite loop: ``for(;;)''. A for loop
with no condition tries to run forever. But, down in the middle of the loop, if we've obtained a
value of 0 for miles, we execute the break statement which forces the loop to terminate. This
sort of situation (when we need the body of the loop to run part way through before we can decide
whether we really wanted to make that trip or not) is precisely what the break statement is for.
2. Next, we're going to write our own version of the atoi function, so we can see how it works (or
might work) inside. (I say ``might work'' because there's no guarantee that your compiler's
particular implementation of atoi will work just like this one, but it's likely to be similar.)
int myatoi(char str[])
{
int i;
int digit;
int retval = 0;
Assignment #4
for(i = 0; str[i] != '\0'; i = i + 1)

{
digit = str[i] - '0';
retval = 10 * retval + digit;
}
return retval;
}
You can try this function out and work with it by rewriting the gas-mileage program slightly to
use it. (Just replace the two calls to atoi with myatoi.)
Remember, the definition of a string in C is that it is an array of characters, terminated by '\0'.
So the basic strategy of this function is to look at the input string, one character at a time, from
left to right, until it finds the terminating '\0'.
The characters in the string are assumed to be digits: '1', '5', '7', etc. Remember that in C, a
character's value is the numeric value of the corresponding character in the machine's character
set. It turns out that we can write this string-to-number conversion code without knowing what
character set our machine uses or what the values are. (If you're curious, in the ASCII character
set which most machines use, the character '0' has the value 48, '1' has the value 49, '2' has
the value 50, etc.)
Whatever value the character '0' has, '0' - '0' will be zero. (Anything minus itself is 0.) In
any realistic character set, '1' has a value 1 greater than '0', so '1' - '0' will be 1.
Similarly, '2' - '0' will be 2. So we can get the ``digit'' value of a character (on the range 09) by subtracting the value of the character '0'. Now all we have to do is figure out how to
combine the digits together into the final value.
If you look at the number 123 from left to right, the first thing you see is the digit 1. You might
imagine that you'd seen the whole number, until you saw the digit 2 following it. So now you've
seen the digits 1 2, and how you can get the number 12 from the digits 1 and 2 is to multiply the 1
by 10 and add the 2. But wait! There's still a digit 3 to come, but if you multiply the 12 by 10 and
add 3, you get 123, which is the answer you wanted. So the algorithm for converting digits to their
decimal value is: look at the digits from left to right, and for each digit you find, multiply the
number you had before by 10 and add in the digit you just found. (When the algorithm begins,
``the number you had before'' starts out as 0.) When you run out of digits, you're done. (The code
above uses a variable named retval to keep track of ``the number you had before'', because it
ends up being the value to be returned.)
If you're not convinced that this algorithm will work, try adding the line
printf("digit = %d, retval = %d\n", digit, retval);
Assignment #4
at the end of the loop, so you can watch it as it runs.

The code above works correctly as long as the string contains digits and only digits. But if the
string contained, say, the letter 'Q', the code would compute the value 'Q' - '0', which
would be a meaningless number. So we'd like to have the code do something reasonable if it
should happen to encounter any non-digits (that is, if someone should happen to call it with a
string containing non-digits).
One way for the string to contain non-digits is if it contains any leading spaces. For example, the
string " 123" should clearly be converted to the value 123, but our code above wouldn't be able
to do so. One thing we need to do is skip leading spaces. And the other thing we'll do (which isn't
perfect, but is a reasonable compromise) is that if we hit a non-space, non-digit character, we'll
just stop (i.e. as if we'd reached the end of the string). This means that the call
myatoi("123abc") will return 123, and myatoi("xyz") will return 0. (As it happens, the
standard atoi function will behave exactly the same way.)
Classifying characters (space, digit, etc.) is easy if we include the header file <ctype.h>, which
gives us functions like isspace which returns true if a character is a space character, and
isdigit which returns true if a character is a digit character.
Putting this all together, we have:
#include <ctype.h>
{
int i;
int retval = 0;
for(i = 0; str[i] != '\0'; i = i + 1)
{
if(!isspace(str[i]))
break;
}
for(; str[i] != '\0'; i = i + 1)
{
if(!isdigit(str[i]))
break;
retval = 10 * retval + (str[i] - '0');
}
return retval;
Assignment #4
}
(You may notice that I've deleted the digit variable in this second example, because we didn't
really need it.)
There are now two loops. The first loop starts at 0 and looks for space characters; it stops (using a
break statement) when it finds the first non-space character. (There may not be any space
characters, so it may stop right away, after making zero full trips through the loop. And the first
loop doesn't do anything except look for non-space characters.) The second loop starts where the
first loop left off--that's why it's written as
for(; str[i] != '\0'; i = i + 1)
The second loop looks at digits; it stops early if it finds a non-digit character.
The remaining problem with the myatoi function is that it doesn't handle negative numbers. See
if you can add this functionality. The easiest way is to look for a '-' character up front,
remember whether you saw one, and then at the end, after reaching the end of the digits and just
before returning retval, negate retval if you saw the '-' character earlier.
3. Here is a silly little program which asks you to type a word, then does something unusual with it.
#include <stdio.h>
int main()
{
char word[20];
int len;
int i, j;
printf("type something: ");
len = getline(word, 20);
for(i = 0; i < 80 - len; i++)
{
for(j = 0; j < i; j++)
printf(" ");
printf("%s\r", word);
}
printf("\n");
return 0;
}
Assignment #4
(To understand how it works, you need to know that \r prints a carriage return without a
linefeed.) Type it in and see what it does. (You'll also need a copy of the getline function.) See
if you can modify the program to move the word from right to left instead of left to right.
Exercises:
1. Type in the character-copying program from section 6.2 of the notes, and run it to see how it
works. (When you run it, it will wait for you to type some input. Type a few characters, hit
RETURN; type a few more characters, hit RETURN again. Hit control-D or control-Z when
you're finished.)
2. Type in the getline() function and its test program from section 6.3 of the notes, and run it to
see how it works. You can either place getline() and its test program (main()) in one source
file or, for extra credit, place getline() in a file getline.c and the test program in a file
getlinetest.c (or perhaps gtlntst.c), for practice in compiling a program from two
separate source files.
3. Rewrite the getline test program from Exercise 2 to use the loop
printf("%s\n", line);
That is, have it simply copy a line at a time, without printing ``You typed''. Compare the behavior
of the character-copying and line-copying programs. Do they behave differently?
Now rewrite the character-copying program from Exercise 1 to use the loop
printf("you typed '%c'\n", c);
and try running it. Now do things make more sense?
4. The standard library contains a function, atoi, which takes a string (presumably a string of
digits) and converts it to an integer. For example, atoi("123") would return the integer 123.
Write a program which reads lines (using getline), converts each line to an integer using
atoi, and computes the average of all the numbers read. (Like the example programs in the
notes, it should determine the end of `àll the numbers read'' by checking for EOF.) See how much
of the code from assignment 3, exercise 7 you can reuse (if you did that exercise). Remember that
integer division truncates, so you'll have to declare some of your variables as float or double.
For extra credit, also compute the standard deviation (see assignment 3, exercise 7).
Assignment #4
5. Write a rudimentary checkbook balancing program. It will use getline to read a line, which
will contain either the word "check" or "deposit". The next line will contain the amount of
the check or deposit. After reading each pair of lines, the program should compute and print the
new balance. You can declare the variable to hold the running balance to be type float, and you
can use the function atof (also in the standard library) to convert an amount string read by
getline into a floating-point number. When the program reaches end-of-file while reading
"check" or "deposit", it should exit. (In outline, the program will be somewhat similar to
the average-finding program.)
For example, given the input
deposit
100
check
12.34
check
49.00
deposit
7.01
the program should print something like
balance:
balance:
balance:
balance:
100.00
87.66
38.66
45.67
Extra credit: Think about how you might have the program take the word "check" or
"deposit", and the amount, from a single line (separated by whitespace).
6. Rewrite the ``compass'' code from Assignment 2 (exercise 4) to use strcpy and strcat to
build the "northeast", "southwest", etc. strings. (Don't worry about capitalizing them
carefully.) You should be able to write it in a cleaner way, without so many if's and else's.
Remember to declare the array in which you build the string big enough to hold the largest string
you build (including the trailing \0).
7. Write a program to read its input, one character at a time, and print each character and its decimal
value.
8. Write a program to read its input, one line at a time, and print each line backwards. To do the
reversing, write a function
int reverse(char line[], int len)
{
Assignment #4
...
}
to reverse a string, in place. (It doesn't have to return anything useful.)
Question 1. What would the expression

c = getchar() != EOF
do?
It would read one character and compare it to the constant EOF. If the character read was equal to EOF, it would set c
to 1, otherwise (i.e. for any other character) it would set c to 0. What it would not do is read a character, assign it to c,
and then test it against EOF, which is what you usually want to do, and which you need to write
(c = getchar()) != EOF
to do.
Question 2. Why must the variable used to hold getchar's return value be type int?
So that it can reliably store the value EOF.
Variables of type char are typically 8 bits large, which means that they can hold 2<sup>8</sup>, or 256 different
character values. Furthermore, on an 8-bit system, getchar can theoretically return characters having any of these
256 character values. However, getchar can also return a 257th value, EOF, which is not a character value but rather
an indication that there are no more characters to get. You can no more reliably store getchar's 257 return values in a
variable of type char than you can store 13 eggs in a carton that holds a dozen. If you tried to assign getchar's
return value to a char, you could either mistake a real character value for EOF or EOF for a real character value,
resulting either in premature termination of input or an infinite loop.
An int, on the other hand, is on the vast majority of machines larger than a char, so it can comfortably hold all 256
character values, plus EOF.
Question 3. What is the difference between the prefix and postfix forms of the ++ operator?
The prefix form increments first, and the incremented value goes on to participate in the surrounding expression (if
any). The postfix form increments later; the previous value goes on to participate in the surrounding expression.
Question 4. What would the expression
i = i++
do?
Nothing, or at least, nothing useful. Since it tries to modify i twice, it's undefined.
Question 5. What is the definition of a string in C?
An array of characters, terminated with the null character \0.
Question 6. What will the getline function do if successive calls to getchar return the four values 'a', 'b',
'c', and EOF?
The first three characters are placed in the line array, as usual, and when the EOF indicator is read, getline breaks
out of its loop, also as usual. Although c is now EOF, nch is 3, so the condition c == EOF && nch == 0 is false.
getline therefore does not return EOF, but rather terminates the line with \0 and returns its length, just as it does
with a normal line which it finds \n at the end of.
Can this situation ever occur? One way to answer a question like this is not to try too hard to answer it, to err on the
side of conservatism, to assume that if we can't prove that the unusual situation won't arise, we might as well be safe
and arrange that our code can handle it if it somehow comes up. (There's obviously little or no harm in writing code to
handle a situation that never comes up, while the reverse--neglecting to write code to handle a situation that does come
up--can of course be very harmful.)
In any case, under Unix at least, this situation can in fact come up. Unix does not enforce any notion of a ``text file'';
the sequence a, b, c, EOF at the end of a file is no more or less favored (by the operating system, that is) than the
sequence a, b, c, \n, EOF. Furthermore, there are some programs (e.g. full-screen text editors such as EMACS) which
make it easy to create a text file without a final newline (if only by accident). (But there are also examples of programs
which inadvertently ignore the last line of a file whose last line does not end in \n, and this is a bug which can cause
data loss.) So writing getline to treat a ``line'' ending in EOF (but no \n) is not only reasonable, but useful.
Tutorial 2. Improve the myatoi function so that it can handle negative numbers.
[You've seen the answer already, because I accidentally left the sign-handling code--within #ifdef SIGN--turned on
in the copy of the code printed in the assignment. Oops.]
#include <ctype.h>
{
int i;
int retval = 0;
int negflag = 0;
for(i = 0; str[i] != '\0'; i = i + 1)
{
if(!isspace(str[i]))
break;
}
if(str[i] == '-')
{
negflag = 1;
i = i + 1;
}
for(; str[i] != '\0'; i = i + 1)
{
if(!isdigit(str[i]))
break;
retval = 10 * retval + (str[i] - '0');
}
if(negflag)
retval = -retval;
return retval;
}
Tutorial 3. Modify the ``word zipping'' program to move the word from right to left instead of left to right.
#include <stdio.h>
int main()
{
char word[20];
int len;
int i, j;
printf("type something: ");
len = getline(word, 20);
for(i = 80 - len - 1; i >= 0; i--)
{
for(j = 0; j < i; j++)
printf(" ");
printf("%s \r", word);
}
printf("\n");
return 0;
}
Exercise 4. Write a program which computes the average (and standard deviation) of a series of numbers.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
char line[100];
int x;
double sum, sumsq;
int n;
double mean, stdev;
sum = sumsq = 0.0;
n = 0;
{
x = atoi(line);
sum = sum + x;
sumsq = sumsq + x * x;
n = n + 1;
}
mean = sum / n;
stdev = sqrt((sumsq - sum * sum / n) / (n - 1));
printf("mean: %f\n", mean);
printf("std. dev.: %f\n", stdev);
return 0;
}
Exercise 5. Write your own version of the atoi function.

[Obviously I should have removed this exercise when I added Tutorial 2. Apologies for the duplication.]
Here is a simple implementation:
{
int retval = 0;
int i = 0;
while(str[i] != '\0')
{
int digit = str[i] - '0';
i++;
}
return retval;
}
Remember that characters are represented by small integers representing their values in the machine's character set. We
shouldn't have to know what these values are, but if we assume that the characters '0', '1', '2', ... '9' have
consecutive values (which, as it happens, is a perfectly valid assumption) then subtracting the value of the character
'0' from any digit character will give us that digit's value. For example, '1' - '0' is 1, '2' - '0' is 2, and of
course '0' - '0' is 0. If str[i] is a digit character, then str[i] - '0' is its value. (In all of these
subtractions, we are using the constant '0' to mean ``the value of the character '0','' which is, in fact, exactly what it
means.)
The operation of the function is simple: it moves through the string from left to right, converting each digit character to
a digit value and building up the return value. Each time we find another digit character, it (obviously) indicates that the
number we're converting has another digit, so we multiply the number we've converted so far by 10, and add in the new
digit. (Walk through this algorithm to convince yourself that it works. For example, if the string to be converted is
"123", we'll make three trips through the loop, and retval will successively take on the values 1, 12, and finally
123.)
Here is a little test program to read strings from the user, convert them to numbers using myatoi, and print out the
converted numbers:
#include <stdio.h>
int main()
{
char line[100];
int n;
while(1)
{
printf("type a number:\n");
if(getline(line, 100) == EOF)
break;
n = myatoi(line);
printf("you typed %d\n", n);
}
return 0;
}
This first implementation of myatoi works, but only if the string contains digits and only digits. If the string contains
any character other than a digit, the expression str[i] - '0' will result in a meaningless digit value. Therefore, a
better implementation of myatoi would ensure that it only attempted to convert digits.
There are in fact several situations under which the string might contain characters other than digits. It might contain
some spaces before the number to be converted (e.g. " 123"), or the number to be converted might begin with a
minus sign. In any case, the caller might carelessly pass a string which contained some non-digit characters (perhaps at
the end, following some digit characters). We will next look at an improved version of myatoi which allows for these
possibilities.
The standard library contains several functions which test for various classes of characters. The isspace function
returns nonzero (i.e. ``true'') for a whitespace character (e.g. space or tab). Similarly, the isdigit function returns
nonzero (i.e. ``true'') for a digit character. These functions are declared in the header <ctype.h>. Here is an improved
version of myatoi which uses isspace to skip leading whitespace and isdigit to ensure that it attempts to
convert only digits. It also checks for a leading minus sign (but after checking for leading whitespace).
#include <ctype.h>
{
int retval = 0;
int i = 0;
int negflag = 0;
while(isspace(str[i]))
i++;
/* skip leading whitespace */
if(str[i] == '-')
{
negflag = 1;
i++;
}
/* look for - sign */
while(str[i] != '\0' && isdigit(str[i]))

{
int digit = str[i] - '0';
i++;
}
if(negflag)
retval = -retval;
return retval;
}
Notice that negflag is an example of a ``Boolean'' variable--we store a 0 or 1 in it to represent a ``false'' or ``true''
condition which we then test for later. (If we find a minus sign, we find it before we scan and convert the digits, but it's
easier to make use of the fact that we saw a minus sign--to make use of it, that is, by actually negating the number we
converted--after we convert it.)
Since the improved function scans and converts digits only as long as they are digits (as confirmed by isdigit), this
version will quietly ignore any trailing non-digits. That is, atoi("123xyz") will simply return 123. (This is exactly
the way the standard atoi function behaves.) Furthermore, calling myatoi (or atoi) with a string containing no
digits at all, such as "abcxyz", will quietly return 0.
Exercise 6. Write a rudimentary checkbook balancing program.
#include <stdio.h>
#include <stdlib.h>
/* for atof() */
#define MAXLINE 100

int main()
{
double balance = 0.0;
char line1[MAXLINE], line2[MAXLINE];
while(getline(line1, MAXLINE) > 0)
{
getline(line2, MAXLINE);
if(strcmp(line1, "deposit") == 0)
balance += atof(line2);
else if(strcmp(line1, "check") == 0)
balance -= atof(line2);
else
{
printf("bad data line: not \"check\" or \"deposit\"\n");
continue;
}
printf("balance: %.2f\n", balance);
}
return 0;
}
Reading the key word and the amount from the same line would be surprisingly difficult, using only the tools we have
in hand so far. We'll see a clean way of doing it in a week or two.
Exercise 7. Rewrite the ``compass'' code to use strcpy and strcat.
char word[20];
if(y > 0)
strcpy(word, "north");
else if(y < 0)
strcpy(word, "south");
else
strcpy(word, "");
/* empty string */
if(x > 0)
strcat(word, "east");
else if(x < 0)
strcat(word, "west");
else
strcat(word, "");
/* empty string */
printf("%s\n", word);
Exercise 8. Write a program to read its input, one character at a time, and print each character and its decimal value.
#include <stdio.h>
int main()
{
int c;
printf("character %c has value %d\n", c, c);
return 0;
}
You will notice that this program prints a funny line or two for each new line ('\n') in the input, because when the %c
in the printf call finds itself printing a \n character that we've just read, it naturally prints a newline at that point.
Exercise 9. Write a program to read its input, one line at a time, and print each line backwards.
Here is one way of doing it, using only what we've seen so far:
#include <stdio.h>
extern int reverse(char [], int);
int main()
{
char line[100];
int len;
while((len = getline(line, 100)) != EOF)
{
reverse(line, len);
}
return 0;
}
int reverse(char string[], int len)
{
int i;
char tmp;
for(i = 0; i < len / 2; i = i + 1)
{
tmp = string[i];
string[i] = string[len - i - 1];

string[len - i - 1] = tmp;
}
return 0;
}
In practice, it would be a nuisance to have to pass the length of the string to the reverse function. Strings in C are
always terminated by the ``zero'' or ``nul'' character, represented by \0. Therefore, reverse (or any piece of code)
can always compute the length of a string, either by searching for the \0, or by calling the library function strlen,
which computes the length of a string by searching for the \0. Here is how the program might look if the reverse
function did not require that the length of the string be passed in:
#include <stdio.h>
#include <string.h>
extern int reverse(char []);
int main()
{
char line[100];
{
reverse(line);
}
return 0;
}
int reverse(char string[])
{
int len = strlen(string);
int i;
char tmp;
for(i = 0; i < len / 2; i = i + 1)
{
tmp = string[i];
string[i] = string[len - i - 1];
string[len - i - 1] = tmp;
}
return 0;
}
Assignment #5
Assignment #5
Assignment #5
Handouts:
Assignment #5
Reading Assignment:
Review Questions:
1. What's wrong with this #define line?
#define N 10;
2. Suppose you defined the macro
#define SIX 2*3
Then, suppose you used it in another expression:
int x = 12 / SIX;
What value would x be set to?
3. If the header file x.h contains an external prototype declaration for a function q(), where should
x.h be included?
4. (harder) How many differences can you think of between i and J as defined by these two lines?
int i = 10;
#define J 10
Assignment #5
(Hint: think about trying to write J = 5, or int a[i].)
Exercises:
1. Pick one of the programs we've worked with so far that uses the getline function, such as the
getline test program (assignment 4, exercise 2) or the ``word zipping'' program (assignment 4,
tutorial 3) or the checkbook balancing program (assignment 4, exercise 6). Arrange the program
in two source files, one containing main(), and one containing getline(). If possible, set
your compiler to warn you when functions are called or defined without a prototype in scope.
Create a header file getline.h containing the external function prototype for getline(),
and #include it as you see fit. Also use a preprocessor macro such as MAXLINE for the
maximum line length (i.e. in declarations of line arrays, and calls to getline).
2. Write a program to read its input and write it out, double-spaced (that is, with an extra blank line
between each line). Process the input a character at a time; do not read entire lines at a time using
getline.
int countchars(char string[], int ch);
which returns the number of times the character ch appears in the string. For example, the call
countchars("Hello, world!", 'o')
would return 2.
4. Write a short program to read two lines of text, and concatenate them using strcat. Since
strcat concatenates in-place, you'll have to make sure you have enough memory to hold the
concatenated copy. For now, use a char array which is twice as big as either of the arrays you
use for reading the two lines. Use strcpy to copy the first string to the destination array, and
strcat to append the second one.
replace(char string[], char from[], char to[])
which finds the string from in the string string and replaces it with the string to. You may
assume that from and to are the same length. For example, the code
char string[] = "recieve";
replace(string, "ie", "ei");
Assignment #5
should change string to "receive".

(Extra credit: Think about what replace() should do if the from string appears multiple times
in the input string.)
Question 1. What's wrong with #define N 10; ?

The semicolon at the end of the line will become part of N's definition, which is hardly ever what you
want.
Question 2. Suppose you had the definition #define SIX 2*3 . What value would the declaration
int x = 12 / SIX; initialize x to?
18. (18? How could it be 18? It sure looks like it should be 2, doesn't it?)
The preprocessor performs a simple textual substitution; it knows nothing about operator precedence, or
even much about the syntax of C. After the preprocessor substitutes the value of the macro SIX, the
declaration looks like
int x = 12 / 2*3;
Since multiplication and division ``group'' (or `àssociate'') from left to right, this is interpreted as
int x = (12 / 2) * 3;
To guard against these ``surprises,'' it's a very good idea to parenthesize the values of macros which are
not simple constants. In this case, a safer definition of the macro would have been
#define SIX (2*3)
This way, when the preprocessor performs its simple textual substitution, the resulting expression is
automatically parenthesized so that the compiler gives you the result you expect. (Of course, this
hypothetical SIX macro is useless in any case, but the point is that whenever a macro's value is an
expression of any kind, it needs extra parentheses to avoid surprises.)
Question 3. If the header file x.h contains an external prototype declaration for a function q(), where
should x.h be included?
It should be included in each source file where q() is called, so that the compiler will see the prototype
declaration and be able to generate correct code. It should also be included in the source file where q()
is defined, so that the compiler will be able to notice, and complain about, any mismatches between the
prototype declaration and the actual definition. (It's vital that the prototype which will be used where a
function is called be accurate; an incorrect prototype is worse than useless.)
Question 4. How many differences can you think of between i and J?
The line
int i = 10;
declares a conventional run-time variable, named i, initially containing the value 10. It will be possible
to change i's value at run time. i may appear in expressions (i.e., its value may be fetched), but since it
is not constant, it could not be used where C requires a constant, such as in the dimension of an array
declaration.
The line
#define J 10
on the other hand, defines a preprocessor macro named J having the value 10. For the rest of the current
source file, anywhere you write a single J, the preprocessor will replace it with 10. An array declaration
such as
int a[J];
will be fine; it will be just as if you had written
int a[10];
However, J exists only at compile time. It is not a run-time variable; if you tried to ``change its value'' at
run time by writing
J = 20;
it would be just as if you had written
10 = 20;
and the compiler would complain.

(One more little difference is that the line int i = 10; ends in a semicolon, while the line
#define J 10 does not.)
Exercise 2. Write a program to read its input and write it out, double-spaced.
This is easy if we realize that all we have to do is read the input a character at a time, copying each input
character through to the output, except that whenever we see a '\n' character, write a second one out,
too.
#include <stdio.h>
int main()
{
int c;
{
putchar(c);
if(c == '\n')
putchar('\n');
}
return 0;
}
The program won't be too interesting if you type text at it interactively. If you're using a Unix or MSDOS system, you can run it on a file by typing
doublespace < filename
The < mechanism indicates that the program should be run with its ``standard input'' (i.e. the stream of
characters read by getchar) connected to the file with the given filename, rather than to the
keyboard.
Exercise 3. Write a function which counts the number of times a character appears in a string.
Here is the function. Notice that it is very similar to the mystrlen function in the notes, except that
rather than counting all characters in the string, it only counts those matching the argument c.
int countnchars(char string[], int ch)
{
int i;
int count = 0;
for(i = 0; string[i] != '\0'; i++)
{
if(string[i] == ch)
count++;
}
return count;
}
Here is a tiny little main program, to test it out:
#include <stdio.h>
extern int countnchars(char string[], int ch);
int main()
{
char c = 'o';
printf("The letter %c appears in \"%s\" %d times.\n",
c, string, countnchars(string, c));
return 0;
}
Exercise 4. Write a short program to read two lines of text, and concatenate them using strcat.
#include <stdio.h>
#include <string.h>
/* for strcpy and strcat */
#define MAXLINE 100

int main()
{
char string1[MAXLINE], string2[MAXLINE];
int len1, len2;
char newstring[MAXLINE*2];
printf("enter first string:\n");
len1 = getline(string1, 100);

printf("enter second string:\n");
if(len1 == EOF || len2 == EOF)
exit(1);
strcpy(newstring, string1);
strcat(newstring, string2);
printf("%s\n", newstring);
return 0;
}
Exercise 5. Write a function to find a substring in a larger string and replace it with a different substring.
Here is one way. (Since the function doesn't return anything, I've defined it with a return type of void.)
void replace(char string[], char from[], char to[])
{
int start, i1, i2;
for(start = 0; string[start] != '\0'; start++)
{
i1 = 0;
i2 = start;
while(from[i1] != '\0')
{
if(from[i1] != string[i2])
break;
i1++;
i2++;
}
if(from[i1] == '\0')
{
for(i1 = 0; to[i1] != '\0'; i1++)
string[start++] = to[i1];
return;
}
}
}
This code is very similar to the mystrstr function in the notes, chapter 10, section 10.4, p. 8. (Since
strstr's job is to find one string within another, it's a natural for the first half of replace.)
Extra credit: Think about what replace() should do if the from string appears multiple times in the
input string.
Our first implementation replaced only the first occurrence (if any) of the from string. It happens,
though, that it's trivial to rewrite our first version to make it replace all occurrences--just omit the
return after the first string has been replaced:
void replace(char string[], char from[], char to[])
{
int start, i1, i2;
for(start = 0; string[start] != '\0'; start++)
{
i1 = 0;
i2 = start;
while(from[i1] != '\0')
{
if(from[i1] != string[i2])
break;
i1++;
i2++;
}
if(from[i1] == '\0')
{
for(i1 = 0; to[i1] != '\0'; i1++)
string[start++] = to[i1];
}
}
}
Assignment #6
Assignment #6
Assignment #6
Handouts:
Assignment #6
Class Notes, Chapters 12 and 13
Reading Assignment:
Review Questions:
1. If we say
int i = 5;
int *ip = &i;
then what is ip? What is its value?
2. If ip is a pointer to an integer, what does ip++ mean? What does
*ip++ = 0;
do?
3. How much memory does the call malloc(10) allocate? What if you want enough memory for
10 ints?
4. The assignment in
char c;
int *ip = &c;
/* WRONG */
is in error; you can't mix char pointers and int pointers like this. How, then, is is possible to
Assignment #6
write
without error on either line?
Exercises:
1. Write a program to read lines and print only those containing a certain word. (For now, the word
can be a constant string in the program.) The basic pattern (which I have to confess I have
parroted exactly from K&R Sec. 4.1) is
while(there's another line)
{
if(line contains word)
print the line;
}
Use the strstr function (mentioned in the notes) to look for the word. Be sure to include the
line
#include <string.h>
2.
3.
4.
5.
6.
7.
at the top of the source file where you call strstr.

Rewrite the checkbook-balancing program from assignment 4 (exercise 6) to use the getwords
function (from the notes) to make it easy to take the word ``check'' or ``deposit'', and the amount,
from a single line.
Rewrite the line-reversing function from assignment 4 (exercise 9) to use pointers.
Rewrite the character-counting function from assignment 5 (exercise 3) to use pointers.
Rewrite the string-concatenation program from assignment 5 (exercise 4) to call malloc to
allocate a new piece of memory just big enough for the concatenated result. Don't forget to leave
room for the \0!
Rewrite the string-replacing function from assignment 5 (exercise 5) to use pointers.
(harder) Write a program to read lines of text up to EOF, and then print them out in reverse order.
You can use getline to read each line into a fixed-size array (as we have been doing all along),
but you will have to call malloc and make a copy of each line before you read the next one.
Also, you will have to use malloc and realloc to maintain the `àrray'' of character pointers
which holds all of the lines. (Your code will be similar to that in section 11.3 of the notes, p. 5)
Assignment #6
Extra credit: remove the restriction imposed by the fixed-size array into which each line is
originally read; allow the program to accept arbitrarily many arbitrarily-long lines. (You'll have to
replace getline with a dynamically-allocating line-getting function which calls malloc and
realloc.)
Question 1. If we say int i = 5; int *ip = &i; then what is ip? What is its value?
ip is a variable which can point to an int (that is, its value will be a pointer to an int; or informally,
we say that ip is `à pointer to an int''). Its value is a pointer which points to the variable i.
Question 2. If ip is a pointer to an int, what does ip++ mean? What does *ip++ = 0 do?
ip++ means about the same as it does for any other variable: increment it by 1, that is, as if we had
written ip = ip + 1. In the case of a pointer, this means to make it point to the object (the int) one
past the one it used to. *ip++ = 0 sets the int variable pointed to by ip to 0, and then increments ip
to point to the next int.
Question 3. How much memory does the call malloc(10) allocate? What if you want enough memory
for 10 ints?
malloc(10) allocates 10 bytes, which is enough space for 10 chars. To allocate space for 10 ints,
you could call malloc(10 * sizeof(int)) .
Question 4. If char and int pointers are different, how is it possible to write
without error on either line?
malloc is declared as returning the special, ``generic'' pointer type void *, which can be (and is)
automatically converted to different pointer types, as needed.
Exercise 1. Write a program to read lines and print only those containing a certain word.
#include <stdio.h>
#include <string.h>
int main()
{
char line[100];
char *pat = "hello";
{
if(strstr(line, pat) != NULL)
}
return 0;
}
Exercise 2. Rewrite the checkbook-balancing program to use the getwords function to make it easy to
take the word ``check'' or ``deposit'', and the amount, from a single line.
#include <stdio.h>
#include <stdlib.h>
/* for atof() */
#define MAXLINE 100

#define MAXWORDS 10
extern int getwords(char *, char *[], int);
int main()
{
double balance = 0.0;
char line[MAXLINE];
char *words[MAXWORDS];
int nwords;
while (getline(line, MAXLINE) > 0)
{
nwords = getwords(line, words, MAXWORDS);
if(nwords == 0) /* blank line */
continue;
if(strcmp(words[0], "deposit") == 0)
{
if(nwords < 2)
{
printf("missing amount\n");
continue;
}
balance += atof(words[1]);
}
else if(strcmp(words[0], "check") == 0)
{
if(nwords < 2)
{
printf("missing amount\n");
continue;
}
balance -= atof(words[1]);
}
else
{
printf("bad data line: \"%s\"\n", words[0]);
continue;
}
printf("balance: %.2f\n", balance);
}
return 0;
}
Exercise 3. Rewrite the line-reversing function to use pointers.
Here is one way:
int reverse(char *string)
{
char *lp = string;
char *rp = &string[strlen(string)-1];
char tmp;
while(lp < rp)
{
tmp = *lp;
*lp = *rp;
*rp = tmp;
/* left pointer */
/* right pointer */
lp++;
rp--;
}
return 0;
}
Exercise 4. Rewrite the character-counting function to use pointers.
int countnchars(char *string, int ch)

{
char *p;
int count = 0;
for(p = string; *p != '\0'; p++)
{
if(*p == ch)
count++;
}
return count;
}
Exercise 5. Rewrite the string concatenation program to call malloc to allocate memory for the
concatenated result.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* for malloc */
/* for strcpy and strcat */
#define MAXLINE 100

int main()
{
char string1[MAXLINE], string2[MAXLINE];
int len1, len2;
char *newstring;
printf("enter first string:\n");
printf("enter second string:\n");
if(len1 == EOF || len2 == EOF)
exit(1);
newstring = malloc(len1 + len2 + 1);
/* +1 for \0 */
if(newstring == NULL)
{
exit(1);
}
strcpy(newstring, string1);
strcat(newstring, string2);
printf("%s\n", newstring);
return 0;
}
Exercise 6. Rewrite the string-replacing function to use pointers.
Here is one way:
void replace(char string[], char *from, char *to)
{
for(start = string; *start != '\0'; start++)
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
}
The bulk of this code is a copy of the mystrstr function in the notes, chapter 10, section 10.4, p. 8.
(Since strstr's job is to find one string within another, it's a natural for the first half of replace.) We
could also call strstr directly, simplifying replace:
#include <string.h>
{
char *p1;
char *start = strstr(string, from);
if(start != NULL)
{
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
Again, we might wonder about the case when the string to be edited contains multiple occurrences of the
from string, and ask whether replace should replace the first one, or all of them. The problem
statement didn't make this detail clear. Our first two implementations have replaced only the first
occurrence. It happens, though, that it's trivial to rewrite our first version to make it replace all
occurrences--just omit the return after the first string has been replaced:
{
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
}
}
}
Rewriting the second version wouldn't be much harder.
Exercise 7. Write a program to read lines of text up to EOF, and then print them out in reverse order.
One of the reasons this program is harder is that it uses pointers to pointers. When we used pointers to
simulate an array of integers, we used pointers to integers. In this program, we want to simulate an array
of strings, or an array of pointers to characters. Therefore, we use pointers to pointers to characters, or
char **.
#include <stdio.h>
#include <stdlib.h>
#define MAXLINE 100
int main()
{
int i;
char line[MAXLINE];
char **lines;
int nalloc, nitems;
nalloc = 10;
lines = malloc(nalloc * sizeof(char *));
if(lines == NULL)
{
exit(1);
}
nitems = 0;
{
{
char **newp;
nalloc += 10;
newp = realloc(lines, nalloc * sizeof(char *));

if(newp == NULL)
{
exit(1);
}
lines = newp;
}
lines[nitems] = malloc(strlen(line) + 1);
strcpy(lines[nitems], line);
nitems++;
}
for(i = nitems - 1; i >= 0; i--)
printf("%s\n", lines[i]);
return 0;
}
Notice that for each line we read, we call malloc to allocate space for a copy of it, and then use
strcpy to make the copy. We could not simply set
lines[nitems++] = line;
each time, because line is a single array, which can only hold one line, and it gets overwritten with the
contents of each input line. (In other words, if we didn't call malloc and make a copy, we'd end up at
the end with only the last line. You might try it to see what happens.)
Extra credit: remove the restriction imposed by the fixed-size array; allow the program to accept
arbitrarily-long lines.
First we will write another version of getline, called mgetline. mgetline calls malloc and
realloc to get enough memory for the line it's currently reading, regardless of how long that line is.
mgetline returns a pointer to the allocated memory; therefore, the caller does not have to pass an array
for mgetline to read into. (It will be the caller's responsibility to free the memory when it doesn't
need the line of text any more.)
#include <stdio.h>
#include <stdlib.h>
char *mgetline()
{
char *line;
int nalloc = 10;
int nch = 0;
int c;
line = malloc(nalloc + 1);
if(line == NULL)
{
exit(1);
}
{
if(c == '\n')
break;
if(nch >= nalloc)
{
char *newp;
nalloc += 10;
newp = realloc(line, nalloc + 1);
if(newp == NULL)
{
exit(1);
}
line = newp;
}
line[nch++] = c;
}
if(c == EOF && nch == 0)
{
free(line);
return NULL;
}
line[nch] = '\0';
return line;
}
Now we can rewrite the line-reversing program to call mgetline. Note that we no longer need the local
line array. Instead, we use a pointer, linep, which holds the return value from mgetline. Since
mgetline returns a new pointer to a new block of memory each time we call it, in this program (unlike
the previous one) we can set
lines[nitems++] = linep;
without overwriting anything.
#include <stdio.h>
#include <stdlib.h>
extern char *mgetline();
int main()
{
int i;
char *linep;
char **lines;
int nalloc, nitems;
nalloc = 10;
lines = malloc(nalloc * sizeof(char *));
if(lines == NULL)
{
exit(1);
}
nitems = 0;
while((linep = mgetline()) != NULL)
{
{
char **newp;
nalloc += 10;
newp = realloc(lines, nalloc * sizeof(char *));
if(newp == NULL)
{
exit(1);
}
lines = newp;
}
lines[nitems++] = linep;
}
for(i = nitems - 1; i >= 0; i--)
printf("%s\n", lines[i]);
return 0;
}
Assignment #7
Assignment #7
Assignment #7
Handouts:
Assignment #7
Reading Assignment:
Review Questions:
1. What do we mean by the `èquivalence between arrays and pointers'' in C?
2. If p is a pointer, what does p[i] mean?
3. Can you think of a few reasons why an I/O scheme which did not make you open a file and keep
track of a ``file pointer,'' but instead let you just mention the name of the file as you were reading
or writing it, might not work as well?
4. If argc is 2, what is argv[1]? What if argc is 1?
Exercises:
1. Rewrite the replace function from last week to remove the restriction that the replacement
substring have the same size. Be sure to test it on replacement strings which are larger, smaller,
and the same size. Should it work if the replacement string is the empty string?
2. Rewrite the pattern matching program (Assignment 6, Exercise 1) to prompt the user for the name
of the file to search in and a pattern to search for.
3. Rewrite the pattern matching program (Assignment 6, Exercise 1) to accept the pattern and file
name from the command line (like the Unix grep or MS-DOS find command).
Assignment #7
Question 1. What do we mean by the `èquivalence between pointers and arrays'' in C?

The cornerstone of the equivalence is that whenever we mention an array in an expression where it might seem that
the array's ``value'' is needed, the compiler automatically generates a pointer to the array's first element. Other parts
of the equivalence are that the array subscript notation [] works on both pointers and arrays, that an array can
seemingly be assigned to a pointer, that when an array is seemingly passed to a function, a pointer to its first
element is passed instead, and that when a function parameter is seemingly declared as an array, the compiler
quietly declares it as a pointer, instead.
Question 2. If p is a pointer, what does p[i] mean?
It means the same as *(p+i), or, in other words, the contents of the object i locations past the one pointed to by
p.
Question 3. Can you think of a few reasons why an I/O scheme which did not make you open a file and keep track
of a ``file pointer,'' but instead let you just mention the name of the file as you were reading it, might not work as
well?
There are two main reasons.
The first is that an operating system typically has to do a fair amount of work when you access a file: parse the
filename, determine the actual location of the file within the storage system (disk, etc.), determine if you have
permission to access the file, etc. If the filename were re-specified with each read or write call, the operating
system might have to re-do that work each time. If, instead, the file is `òpened'' just once, the operating system can
easily remember where the file is and everything it needs to know about it. It can arrange that the file descriptor or
file pointer (which it returns to you from the open call and which it askes you to to supply along with each read or
write call) will serve as a reference to its memory of the necessary information.
The more important reason is that an open file needs some context: in particular, the location within the file where
the next reading or writing will take place. If each time you read from a file, you just specified the file name, how
would the system know not to start reading fom the beginning of the file each time?
Question 4. If argc is 2, what is argv[1]? What if argc is 1?
If argc is 2, argv[1] is the first (and only) command line argument. If argc is 1, there are no command line
arguments, so argv[1] doesn't point anywhere.
Exercise 1. Rewrite the replace function from last week to remove the restriction that the replacement substring
have the same size.
If the replacement substring is a different size, we'll have to move the rest of the string right or left, either to make
room or to close up the gap. Here's code to do that, by shifting the characters, one at a time. (Notice that when
we're shifting to the right, we have to work our way through the characters we're shifting from right to left, to
avoid stepping on characters we haven't shifted yet. Notice that we also have to worry about the new position of
the '\0' string terminator in each case.)
{
int fromlen = strlen(from);
int tolen = strlen(to);
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
if(fromlen > tolen)
{
/* move rest of string left */
p2 = start + tolen;
for(p1 = start + fromlen; *p1 != '\0'; p1++)
*p2++ = *p1;
*p2 = '\0';
}
else if(fromlen < tolen)
{
/* move rest of string right */
int leftover = strlen(start);
p2 = start + leftover + (tolen - fromlen);
*p2-- = '\0';
for(p1 = start + leftover - 1;
p1 >= start + fromlen; p1--)
*p2-- = *p1;
}
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
}
As it happens, there's a function in the standard library, memmove, whose job it is to move characters (which
might be strings) from point a to point b, taking care to make the copy in the right order if the source and
destination strings overlap. memmove is declared in <string.h>, and its prototype looks approximately like
this:
memmove(char *dest, char *src, int n);
memmove is therefore a lot like strcpy (the dest and src arguments are in that order so that a call mimics an
assignment). We can simplify our new version of replace by calling memmove instead:
#include <string.h>
{
int fromlen = strlen(from);
int tolen = strlen(to);
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
if(fromlen != tolen)
{
memmove(start + tolen, start + fromlen,
strlen(start + fromlen) + 1);
/* + 1 for \0 */
}
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
}
Exercise 2. Rewrite the pattern matching program to prompt the user.
We'll need to use fgetline (see the notes) instead of getline, so that we can read lines from the file pointer
connected to the file we open. With fgetline in hand, we rewrite the pattern-matching program to prompt for
the file name and pattern, open the file, and finally call fgetline instead of getline:
#include <stdio.h>
#include <string.h>
#define MAXLINE 100
extern int fgetline(FILE *, char [], int);
int main()
{
char
char
char
FILE
line[MAXLINE];
filename[MAXLINE];
pat[MAXLINE];
*ifp;
printf("file name? ");

fflush(stdout);
getline(filename, MAXLINE);
printf("pattern? ");
fflush(stdout);
getline(pat, MAXLINE);
if(ifp == NULL)
{
exit(1);
}
{
}
return 0;
}
The calls to fflush(stdout) ensure that the prompts show on the screen before we start reading the reply.
This program is imperfect in that it does not check the return value from the two getline calls. If the user
doesn't type a file name or pattern, the program will misbehave.
Exercise 3. Rewrite the pattern matching program to accept arguments from the command line.
#include <stdio.h>
#include <string.h>
#define MAXLINE 100
int main(int
{
char
char
char
FILE
argc, char *argv[])

line[MAXLINE];
*filename;
*pat;
*ifp;
if(argc != 3)
{
fprintf(stderr, "missing search pattern or file name\n");
exit(1);
}
pat = argv[1];
filename = argv[2];
if(ifp == NULL)
{
exit(1);
}
{
}
return 0;
}
It is useful to allow the program to search for the pattern in multiple files, or, if no file names are specified on the
command line, to read from standard input. Here's how we might implement that. We break the read-lines-andsearch loop out into a separate function, fsearchpat. We either call fsearchpat once, handing it stdin as
a file pointer, or else we loop over all of the file names on the command line, opening each one and calling
fsearchpat with that file pointer.

#include <stdio.h>
#include <string.h>
#define MAXLINE 100
int fsearchpat(FILE *, char *);
int main(int
{
char
char
FILE
argc, char *argv[])

*filename;
*pat;
*ifp;
if(argc < 2)
{
fprintf(stderr, "missing search pattern\n");
exit(1);
}
pat = argv[1];
if(argc == 2)
fsearchpat(stdin, pat);
else
{
int i;
{
filename = argv[i];
if(ifp == NULL)
{
exit(1);
}
fsearchpat(ifp, pat);
fclose(ifp);
}
}
return 0;
}
int fsearchpat(FILE *ifp, char *pat)

{
char line[MAXLINE];
{
}
return 0;
}
Intermediate C Programming -- Assignments
Intermediate C Programming -Assignments

Here are the problem sets I hand out during each of the eight weeks this class runs when I teach it in
person. If you're out there on the net somewhere, unable to attend the class in person, feel free to follow
along here!
Week 1
Topics: program and data structure design; structures; linked lists
Assignment
Answers
Week 2
Topics: input/output; data files; stdio
Assignment
Answers
Week 3
Topics: miscellaneous C features (void and unsigned types; type qualifiers; storage classes and
typedef; ?:, cast, and comma operators; default promotions; switch, do/while, and goto
statements); returning aggregates
Assignment
Answers
Week 4
Topics: bitwise operators; function-like preprocessor macros
Assignment
Answers
Week 5
http://www.eskimo.com/~scs/cclass/asgn.int/index.html (1 of 2) [22/07/2003 5:37:55 PM]
Intermediate C Programming -- Assignments
Topics: dynamic memory allocation; void * type; malloc strategies

Assignment
Answers
Week 6
Topics: pointers to pointers; pointers to functions; multidimensional arrays
Assignment
Answers
Week 7
Topics: Object-Orinted Programming methodology, C++ preview; variable-length argument lists
Assignment
Answers
Week 8
One last handout
http://www.eskimo.com/~scs/cclass/asgn.int/index.html (2 of 2) [22/07/2003 5:37:55 PM]
Assignment #1
Assignment #1
Intermediate C Programming
Assignment #1
Handouts:
Syllabus
Assignment #1
Game source code
Notes about ``The Game''
Review Questions:
1. What are the four parts of a structure definition?
2. What is p->m equivalent to?
3. What's wrong with this code?
struct x *p = malloc(sizeof(struct x *));
4. The last code example in chapter 15 of the notes reads lines from stdin and prints them back
out in reverse order. What would happen if the prepend function did not call malloc and
strcpy to allocate space for a new copy of each listnode's item, but instead simply set
newnode->item = newword?
Exercises:
1. Get the initial version of the game (as handed out in class) to compile and run. It should print a ``?''
prompt, and you should be able to type the ``n'', ``s'', `è'', and ``w'' commands to attempt to move
around, and the ``take'' and ``drop'' commands to pick up and drop objects. (There are one or two other
commands, which you can learn by studying the file commands.c.)
Study and understand as much of the source code of the game as you can.
2. Add a ``help'' command which prints a message listing the available verbs (or at least the most
common ones).
http://www.eskimo.com/~scs/cclass/asgn.int/PS1.html (1 of 2) [22/07/2003 5:37:56 PM]
Assignment #1
3. Add a ``long description'' field to the object and room structures. (It can either be an array of char
or a pointer to char.) Modify the initializations of the objects and rooms arrays in main.c to
initialize the description field for some or all objects and rooms. Arrange that the ``look'' command print
the long description of the current room (if it has one), and that the description is also printed when the
actor enters a new room. Add an `èxamine'' command which prints the long description of an object, if it
has one. (Otherwise, have it print ``You see nothing special about the %s.'')
Game source code
You have four options in downloading the source code package for the game:
As a compressed tar (.tar.Z) file: download here

As a gzipped tar (.tar.gz) file: download here
As a zip file: download here
As individual files from the directory hierarchy referred to in the assignments: start here
http://www.eskimo.com/~scs/cclass/asgn.int/gamesource.html [22/07/2003 5:37:57 PM]
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html
An Introduction to ``The Game''
Our ongoing example during this entire class will be a relatively simple but fully functional text-based
adventure game, patterned roughly after the classic games ADVENT and Zork. If you haven't played one
of these games before, the basic idea is that you are an `àctor'' in a playing field (often referred to as a
``dungeon'') consisting of a number of interconnected rooms. You can move from room to room, pick up
objects you find lying around (such as keys or swords or C compilers), and use these objects to perform
various tasks, such as unlocking doors or slaying dragons or writing programs.
You interact with the game by typing commands which are simple English sentences. (In our first
attempts, the sentences will be very simple indeed, but in sophisticated games of this sort the parsers are
quite elaborate and can `ùnderstand'' rather complicated sentences.) The game responds to each
command by telling you what you see--a new room you've entered, or the result of some task you've
managed to perform.
Part of the ``fun'' (or, at least, the challenge) of these games is just figuring out what commands are
acceptable. To get you started, you move from room to room with the commands ``north'', `èast'',
``west'', and ``south'' (which can be abbreviated to ``n'', `è'', ``w'', or ``s''). You can pick up objects with
the ``take'' command and put them down by saying ``drop''. You can request a description of the room
you're in by typing ``look'', and request a list of your possessions by typing `ìnventory'' (or just `ì'').
(Of course, since you have the source code of this particular game, there's an easy way out of the
challenge of discovering which commands it accepts, once you discover the C function which is in
charge of matching the commands you type against the list of verbs which the game understands.)
It might seem as if such a game would be difficult to write, but as we'll see, it can actually be quite easy.
Study the code, starting with main.c, followed by commands.c, followed by object.c and
room.c. (All along you'll want to refer to game.h.) At first it will seem that the the high-level
functions are all shirking their responsibilities--it will seem as if they're doing no real work, but rather
doing only the easy part and then calling some other function to do the rest. At first, it will seem as if
these sub-functions are going to get stuck with all the hard work, but when we get to them, we'll find that
they're nicely simple, too. In fact, the complexity which we might have imagined the program would
contain may seem to melt away before our eyes, as we discover that the program is composed of many
functions each of which is quite easy to understand. If this result surprises you, study it well, because
attacking a problem in this way--making a hard problem tractable by breaking it up into pieces each of
which seems simple but which in concert perform a task which seems harder than the sum of the parts-this process is the essence of good programming.
The game revolves around three data structures, one describing the player (or `àctor''), one describing a
room, and one describing an object. Since the dungeon will typically consist of many rooms, and since
there will typically be many objects scattered among the rooms, we'll typically have many instances of
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html (1 of 2) [22/07/2003 5:37:58 PM]
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html
the room and object structures. The object structure will contain a ``next'' field, capable of pointing at
another object, so that several objects can be strung together as a singly-linked list. We'll use these lists
of objects to represent the set of objects sitting in a room, or in an actor's possession.
Rooms will be linked together, too, but in a more complex way: each room will have up to four pointers
to other rooms. These pointers can be thought of as doorways or corridors: the pointed-to room is the
room that will be gone to if an actor travels in one of the four compass directions out of the original
room. (A room's exits will be stored in an array of four pointers, where element [0] is the north exit,
[1] is south, [2] is east, and [3] is west. Also, since pointers are inherently directional, it will be
necessary for the linked-to room to have its own pointer back to the original room if the actor is to be
able to return. This means that we can create deliberately confusing situations if we wish, such as oneway corridors, or rooms with exits that don't take you back to where you just came from.)
When linking objects and rooms together, we'll make heavy use of null pointers. The end of a linked list
is indicated by a null pointer--that is, the last object in the list has a next pointer of null. A room with
no objects, or an actor with no possessions, will have a null pointer in its contents field. When a room
has a wall instead of a door in one of the four compass directions (that is, when it's impossible to leave a
room in a particular direction) that will be indicated by a null pointer in the corresponding cell of the
exits array.
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html (2 of 2) [22/07/2003 5:37:58 PM]
Copyright
http://www.eskimo.com/~scs/cclass/asgn.int/copyright.html [22/07/2003 5:38:00 PM]
Question 1. What are the four parts of a structure definition?

The keyword struct, the structure tag (optional), the brace-enclosed list of member declarations (optional), and a list of
variables of this structure type to be declared (optional).
Exercise 2. Add a ``help'' command.
Here is the code I added to the big if/else chain in commands.c:
else if(strcmp(verb, "help") == 0)
{
printf("Here are some useful commands:\n");
printf("n, north\tgo north\n");
printf("s, south\tgo south\n");
printf("e, east\t\tgo east\n");
printf("w, west\t\tgo west\n");
printf("look\t\tdescribe current room\n");
printf("examine obj\tdescribe an object\n");
printf("take obj\tpick up an object\n");
printf("drop obj\tdrop an object\n");
printf("i, inventory\tlist your current possessions\n");
printf("There may be other commands, too!\n");
}
(This help text also contains a description of the new `èxamine'' command, below.)
Exercise 3. Add a ``long description'' field to the object and room structures.
I added the line
char *desc;
at the end of the definition of struct object in game.h, and also at the end of struct room. I modified the
initializations of the objects and rooms arrays in main.c to give most of the objects and rooms descriptions:
static struct object objects[] =
{
/* [0] */ {"bed", NULL, "It is an old, four-poster bed with a lace coverlet."},
/* [1] */ {"diamond", NULL,
"The diamond twinkles majestically as you hold it in the light."},
http://www.eskimo.com/~scs/cclass/asgn.int/PS1a.html (1 of 3) [22/07/2003 5:38:03 PM]
/* [2] */ {"kettle", NULL,

"It is a dented old kettle. You're not sure it would even hold
water."},
/* [3] */ {"hammer", NULL},
/* [4] */ {"pliers", &objects[3]},
/* [5] */ {"doormat", NULL, "It says, \"Welcome\", in big, friendly letters."},
/* [6] */ {"pick", NULL, "It is an old miner's pick."},
/* [7] */ {"shovel", &objects[6]},
};
static struct room rooms[] =
{
/* [0] */ {"field", NULL, {&rooms[1], NULL, NULL, NULL},
"You are in an open field, with a house to the north."},
/* [1] */ {"house", NULL, {&rooms[2], &rooms[0], NULL, NULL},
"You are standing just to the south of a plain, white house."},
/* [2] */ {"entry", NULL, {&rooms[3], &rooms[2], &rooms[5], NULL},
"You are in the entryway of the house.\n\
A hall leads north, and there are doors to the south and east."},
/* [3] */ {"hall", NULL, {&rooms[6], &rooms[2], NULL, &rooms[4]},
"You are in a north-south hallway, with a door to the west."},
/* [4] */ {"bedroom", &objects[0], {NULL, NULL, &rooms[3], NULL},
"You are in what looks like a bedroom. The exit is to the east."},
/* [5] */ {"closet", &objects[1], {NULL, NULL, NULL, &rooms[2]}},
/* [6] */ {"kitchen", &objects[2], {&rooms[9], &rooms[3], &rooms[7], NULL},
"You are in the kitchen of the house.\n\
Doorways lead north and south, and there is a stairwell to the east."},
/* [7] */ {"stairway", NULL, {NULL, NULL, &rooms[6], &rooms[8]},
"You are in a stairway twisting down beneath the kitchen."},
/* [8] */ {"basement", &objects[4], {NULL, NULL, &rooms[7], &rooms[10]},
"You are in a dank basement. There is a stairway to the east.\n\
It looks like someone has been digging at the west end!"},
/* [9] */ {"porch", &objects[5], {NULL, &rooms[6], NULL, NULL},
"You are on the back porth of the house.\n"
"There is a doorway to the south."},
/* [10] */ {"tunnel", &objects[7], {NULL, NULL, &rooms[8], NULL},
"You are in a low east-west tunnel, recently dug.\n"
"The earth seems soft; you wonder how safe the ceiling is."},
};
(Initializing long descriptions in this way is not convenient, because the strings are, well, too long. I even embedded \n
within some of the strings, to make them print on several lines, and I continued some of them them over two lines in the
source code, just to make them fit on the page. You will notice that there are two ways of doing this: The descriptions of the
entry, kitchen, and basement are extended onto the next line using \ as a continuation character. The descriptions of the
porch and tunnel are split into two strings, on two separate lines but with no comma or other punctuation between them; this
means that the strings are automatically spliced together at compile time, to form the equivalent of a single string. But if
you don't use either of these methods, if you break a string constant across two lines without using \ or splitting it into two
strings to be concatenated, the compiler complains.)
I added this code to the big if/else chain in commands.c, to implement an `èxamine'' command:
else if(strcmp(verb, "examine") == 0)

{
if(object == NULL)
{
printf("You must tell me what to examine.\n");
return FALSE;
}
objp = findobject(player, object);
if(objp == NULL)
{
printf("I see no %s here.\n", object);
return FALSE;
}
if(objp->desc == NULL || *objp->desc == '\0')
printf("You see nothing special about the %s.\n", object);
else
}
Finally, I rewrote the listroom function in rooms.c to look like this:
void
listroom(struct actor *actor)
{
struct room *roomp = actor->location;
if(roomp == NULL)
{
printf("Where are you?\n");
return;
}
printf("%s\n", roomp->name);
if(roomp->desc != NULL && *roomp->desc != '\0')
printf("%s\n", roomp->desc);
if(roomp->contents != NULL)
{
printf("room contains:\n");
listobjects(roomp->contents);
}
}
Assignment #2
Assignment #2
Assignment #2
Handouts:
Assignment #2
Review Questions:
1. List some differences between text and binary data files.
Exercises:
1. We're going to fix one of the more glaring deficiencies of the initial implementation of the game, namely that the
description of the dungeon takes the form of hard-compiled structure arrays in main.c, such that the source code
must be edited (and in a tedious way, at that) and the game recompiled whenever we want to modify the dungeon.
(We're used to recompiling when we make changes to our programs, but we shouldn't have to recompile when we
make expected changes to our data. Imagine if a spreadsheet program required you to enter your worksheet
expressions or data with a C compiler and recompile each time!)
We're going to add code which can read the dungeon description dynamically, at run time, from a simple, textual data
file. The file will describe the rooms, the objects and which rooms they're initially in, and the exits which interconnect
the rooms. As the file is read, we'll dynamically assign new instances of struct room and struct object to
define the rooms and objects which the data file describes. We won't know how many room and object structures
we'll need, and we certainly won't know what descriptions they will contain, so we'll abandon the rooms and
objects arrays from main.c, along with their cumbersome initializations. We'll add (in objects.c) an
allocation function, newobject(), which will return a pointer to a brand-new object structure which we can use to
contain some new object, usually one we've just read from the data file. Similarly, we'll add a newroom() function
to rooms.c which will return a pointer to a brand-new room structure. (We're not quite ready to do full-blown
dynamic memory allocation yet, so for now, the room and object alloction functions will dole structures out of fixedsize arrays which we'll arbitrarily declare as having size 100. Naturally we'll keep track of how many structures are
actually in use, and complain if an attempt is made to use more than 100. Since the allocation code is isolated down
within rooms.c and object.c, only those modules will need rewriting when we move to a more flexible
allocation scheme; the code which calls newroom() and newobject() won't need to change.)
Appended to this assignment is a listing of a new file, io.c, for doing the data file reading. (This code is also on the
disk or at the ftp site, in the week2 subdirectory.)
Rip out the definitions and initializations of the objects and rooms arrays at the top of main.c; also remove the
initialization of actor (but leave the definition). At the top of main(), before the initial call to listroom, add the
lines
Assignment #2
if(!readdatafile())
exit(1);
gotoroom(&actor, getentryroom());
Add the following code to object.c (it can go near the top):
#define MAXOBJECTS 100
static struct object objects[MAXOBJECTS];
static int nobjects = 0;
struct object *
newobject(char *name)
{
struct object *objp;
if(nobjects >= MAXOBJECTS)
{
fprintf(stderr, "too many objects\n");
exit(1);
}
objp = &objects[nobjects++];
strcpy(objp->name, name);
objp->lnext = NULL;
return objp;
}
(This code is in the file object.xc in the week2 subdirectory.)
Add the following code to rooms.c (it can go near the top):
#define MAXROOMS 100
static struct room rooms[MAXROOMS];
static int nrooms = 0;
struct room *
newroom(char *name)
{
struct room *roomp;
int i;
if(nrooms >= MAXROOMS)
{
fprintf(stderr, "too many rooms\n");
exit(1);
}
/* put actor in initial room */
Assignment #2
roomp = &rooms[nrooms++];
strcpy(roomp->name, name);
roomp->contents = NULL;
for(i = 0; i < NEXITS; i++)
roomp->exits[i] = NULL;
return roomp;
}
struct room *
findroom(char *name)
{
int i;
for(i = 0; i < nrooms; i++)
{
if(strcmp(rooms[i].name, name) == 0)
return &rooms[i];
}
return NULL;
}
struct room *
getentryroom(void)
{
if(nrooms == 0)
return NULL;
return &rooms[0];
}
/* temporary */
(This code is in the file rooms.xc in the week2 subdirectory.)

You'll also need the file fgetline.c, which is appended behind the new io.c, and also in the week2
subdirectory.
After adding all the new code, and any necessary prototype declarations to game.h, the program should again
compile and run, but it will expect to find a data file named dungeon.dat in the current directory. A sample
dungeon.dat file (corresponding to the old, hard-compiled game) is included here (and in the week2
subdirectory). Once you see how it works, you can start extending the dungeon simply by modifying the
dungeon.dat file.
2. (tricky) If you added long descriptions to rooms and objects last week, devise and implement a way for those long
descriptions to be read from the data file.
3. (easier) Expand the set of allowable exits from rooms. Add the possibility of northeast, southeast, northwest, and
southwest exits, or `ùp'' and ``down'' exits, or all six. Add commands to go in these new directions. Modify the data
file reading code in io.c to handle setting up the new exits.
Assignment #2
Question 1. List some differences between text and binary data files.
Text files store lines of characters, intended to be human-readable if printed, displayed to the screen, read into a text editor, etc.
Binary files cntain arbitrary byte values which make little or no sense if interpreted as characters.
Binary files may be smaller, and if read and written in a simpleminded way, they may be more efficient to read and write, and
the code to do so may be easier to write. Text files tend to be larger, to take longer (than binary files) to read and write, and to
require more elaborate code. (However, if binary files are read in safer ways, or written in more portable ways, the code to do
so is as complicated as for text files, and will probably be intermediate in efficiency between text files and simpleminded
binary files.)
Text files are quite portable. Binary files will not necessarily work if they are moved between machines having different word
sizes or data formats, or between compilers which lay out data structures differently in memory.
It's easy to create, inspect, and modify text files using standard tools. Binary file maintenance generally requires speciallywritten tools.
Binary files must be written using fopen mode "wb" and read using mode "rb"
Exercise 2. Devise and implement a way for long descriptions to be read from the data file.
There are many ways of doing this. All are a bit tricky, given certain constraints imposed by the existing code.
The most obvious way is to use lines of the form
desc You are in a dark and gloomy cave.
in the data file. (Such lines could of course be used for objects as well.) The only problem with this approach is that in the
existing data file reading code in io.c, each line read from the data file is immediately broken up into words by calling
getwords, and this break-up happens before it's decided what keyword began the line and what's to be done with the rest of
the information on the line. In the case of long descriptions, we want the rest of the text on the line (after the keyword desc)
to be treated as a single sting, not a series of individual words.
One way around this problem is to make a copy of the line read from the data file before breaking it up with getwords. I
declared a second array
char line2[MAXLINE];
at the top of parsedatafile in io.c, and changed the old line
ac = getwords(line, av, MAXARGS);

to
strcpy(line2, line);
ac = getwords(line2, av, MAXARGS);
Now line2 is broken up by getwords (and av ends up containing pointers into line2), but the contents of line remain
intact.
Then I added this case to the if/else chain that makes up the bulk of parsedatafile:
else if(strcmp(av[0], "desc") == 0)
{
char *p;
for(p = line; *p != '\0' && !isspace(*p); p++) /* skip "desc" */
;
for(; isspace(*p); p++)
/* skip whitespace */
;
if(currentobject != NULL)
currentobject->desc = chkstrdup(p);
else if(currentroom != NULL)
currentroom->desc = chkstrdup(p);
else
fprintf(stderr, "desc not in object or room\n");
}
The code has two more troublesome spots: first, since the line array hasn't been broken into words (which was the point, so
as not to break up the descriptive text), we have to manually skip over the keyword ``desc'' and the whitespace to find the
beginning of the description itself. Second, since I've declared the long descriptions in the object and room structures as char
*, I have to allocate memory to hold a copy of the description. I do that by calling the function chkstrdup, which makes a
duplicate copy of a string in newly-allocated memory, checking the return value of the memory allocation for me so that I don't
have to. (On another sheet you'll find the code for chkstrdup and another function chkmalloc which it calls.
chkstrdup is declared in chkmalloc.h; both chkmalloc.c and chkmalloc.h can be found in the week5
subdirectory of the source tree.) We'll have much more to say about dynamic memory allocation in upcoming weeks, so don't
worry about the details of chkstrdup too much.
If you chose to use character arrays as the long description fields in the object and room structures, you don't have to worry
about memory allocation, but can instead use something like
strcpy(currentobject->desc, p);
to copy the long description to your array. Just make sure you understand why a single call to strcpy would not work if
desc is declared as char *, and why the simple assignment
currentobject->desc = p;
would not work if desc was an array or a pointer (for a different reason in each case).
In a real game, room descriptions are likely to be more like a paragraph in length, so cramming them all onto one desc line in
the data file is quite cumbersome. How could we spread them across multiple lines in the data file? One way is by allowing
multiple desc lines, and arranging that they all get concatenated together to form the complete description. Here's how I
implemented that:
{
char *p;
char *newstr;
for(p = line; *p != '\0' && !isspace(*p); p++) /* skip "desc" */
;
for(; isspace(*p); p++)
/* skip whitespace */
;
{
if(currentobject->desc == NULL)
currentobject->desc = chkstrdup(p);
else
{
newstr = chkmalloc(
strlen(currentobject->desc) + 1
+ strlen(p) + 1);
strcpy(newstr, currentobject->desc);
strcat(newstr, "\n");
strcat(newstr, p);
free(currentobject->desc);
currentobject->desc = newstr;
}
}
{
if(currentroom->desc == NULL)
currentroom->desc = chkstrdup(p);
else
{
newstr = chkmalloc(
strlen(currentroom->desc) + 1
+ strlen(p) + 1);
strcpy(newstr, currentroom->desc);
strcat(newstr, p);
free(currentroom->desc);
currentroom->desc = newstr;
}
}
else
fprintf(stderr, "desc not in object or room\n");
}
Now the memory allocation is more complicated, because when we already have a (partial) long description string for a given
room or object, we have to allocate a new, longer one and free the old one. (There's an easier way to do this, involving another
standard library memory allocation function called realloc. We'll see it later.)
Another option would be to decide that long descriptions span many lines on the data file, followed by some terminator,
without having each descriptive line preceded by any keyword. For example, we might have lines in the data file like this:
desc
You are in a dark and gloomy cave.

The air is stale and damp.
In the shadows you can hear various squeaking and chattering noises.
desc end
Here, the word desc alone on a line indicates that a long description follows, up to the line containing desc end.
I implemented that option like this. Notice that the description-reading code therefore has to make more calls to fgetline
(i.e. besides the main one at the top of the loop in parsedatafile). In this case, we don't have to worry about the allocation
of line vs. line2, and we also don't have to skip over anything to find the start of each descriptive line.
{
char *newstr;
while(fgetline(fp, line, MAXLINE) != EOF)
{
if(strcmp(line, "desc end") == 0)
break;
{
if(currentobject->desc == NULL)
currentobject->desc = chkstrdup(line);
else
{
newstr = chkmalloc(
strlen(currentobject->desc) + 1
+ strlen(line) + 1);
strcpy(newstr, currentobject->desc);
strcat(newstr, line);
free(currentobject->desc);
currentobject->desc = newstr;
}
}
{
if(currentroom->desc == NULL)
currentroom->desc = chkstrdup(line);
else
{
newstr = chkmalloc(
strlen(currentroom->desc) + 1
+ strlen(line) + 1);
strcpy(newstr, currentroom->desc);
strcat(newstr, line);
free(currentroom->desc);
currentroom->desc = newstr;
}
}
}
}
Exercise 3. Expand the set of allowable exits from rooms.

I modified the room structure in game.h like this:
#define NEXITS 10
struct room
{
char name[MAXNAME];
struct object *contents;
struct room *exits[NEXITS];
char *desc;
};
/* long description */
/* direction indices in exits array: */

#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
NORTH
SOUTH
EAST
WEST
NORTHEAST
SOUTHEAST
NORTHWEST
SOUTHWEST
UP
DOWN
0
1
2
3
4
5
6
7
8
9
(By adding the new directions at the end of the array, rather than interspersing them with the original four, any array
initializations such as we used to have in main.c would still work, since it's legal to supply fewer initializers for an array than
the maximum number of elements in the array.)
I added this code to docommand in commands.c:
else if(strcmp(verb, "ne") == 0 || strcmp(verb,
dircommand(player, NORTHEAST);
else if(strcmp(verb, "se") == 0 || strcmp(verb,
dircommand(player, SOUTHEAST);
else if(strcmp(verb, "nw") == 0 || strcmp(verb,
dircommand(player, NORTHWEST);
else if(strcmp(verb, "sw") == 0 || strcmp(verb,
dircommand(player, SOUTHWEST);
else if(strcmp(verb, "up") == 0)
dircommand(player, UP);
else if(strcmp(verb, "down") == 0)
dircommand(player, DOWN);
"northeast") == 0)
"southeast") == 0)
"northwest") == 0)
"southwest") == 0)
I added this code to the roomexits case in parsedatafile in io.c:

else if(strcmp(av[i], "ne") == 0)
dir = NORTHEAST;
else if(strcmp(av[i], "se") == 0)
dir = SOUTHEAST;
else if(strcmp(av[i], "nw") == 0)

dir = NORTHWEST;
else if(strcmp(av[i], "sw") == 0)
dir = SOUTHWEST;
else if(strcmp(av[i], "u") == 0)
dir = UP;
else if(strcmp(av[i], "d") == 0)
dir = DOWN;
Assignment #3
Assignment #3
Assignment #3
Handouts:
Assignment #3
Exercises:
1. Appended to this assignment are listings of new versions of commands.c and parser.c. (I've only printed the
first page of commands.c, because the rest doesn't change.) These versions implement a slightly more sophisticated
scheme for parsing and representing commands; they will handle sentences like
hit nail with hammer
The new scheme involves this structure, in game.h:
struct sentence
{
char *verb;
struct object *object;
char *preposition;
struct object *xobject; /* object of preposition */
};
This structure can contain the verb, object, and prepositional phrase of one of these sentences. (Also, the object of the
verb and the object of the preposition are represented by object pointers, filled in by the command parser, rather than
simple object names.)
Replace your copies of parser.c and commands.c with these new ones, which can be found in the week3
subdirectory. Merge any of the commands you added (or other changes you made) from your old commands.c into
the new one. In main.c, replace the declarations of the variables verb and object with the new variable
struct sentence cmd;
which can contain the whole command, and change the calls to parseline and docommand to
if(!parseline(&player, line, &cmd))
continue;
Assignment #3
docommand(&player, &cmd);
You'll also need to change the prototype declarations for parseline and docommand in game.h.
Although we can now parse these more-complicated commands, we need a bit more machinery before we can
implement any of them. For now, here is a little ``hit'' command (to be added to commands.c) which will begin to
give you a feel for what we'll be able to do:
else if(strcmp(verb, "hit") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to hit.\n");
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me what to hit with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You don't have the %s.\n", cmd->xobject->name);
return FALSE;
}
printf("The %s says, \"Ouch!\"\n", objp->name);
}
Add this command, too.
2. Add some other commands, such as ``break'' or ``cut'' or ``read''. (They won't really do anything yet, other than print
little messages along the lines of the ``hit'' command in exercise 1.)
3. Write a function plural with the prototype
char *plural(char *);
which, given a string, returns a new string with an ``s'' tacked on to the end. Use this new function to rewrite some of
the messages printed by the game from
printf("I see no %s here.\n", word);
to
printf("I don't see any %s.\n", plural(word));
Make an appropriate choice from among the techniques discussed in class for the allocation of the string returned by
plural. (Extra credit: try to make plural a little smarter, by adding `ès'' where appropriate.)
Assignment #3
Exercise 2. Add some other commands, such as ``break'' or ``cut'' or ``read''.

Here is the code I added (to commands.c, of course) to implement all three commands. They are all straightforward
but quite boring, because they don't really do anything yet. (Soon we'll be able to make them more interesting.)
else if(strcmp(verb, "break") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to break.\n");
return FALSE;
}
{
printf("You must tell me what to break with.\n");
return FALSE;
}
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
printf("Oh, dear. Now the %s is broken.\n", objp->name);
}
else if(strcmp(verb, "cut") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to cut.\n");
return FALSE;
}
{
printf("You must tell me what to cut with.\n");
return FALSE;
}
{
return FALSE;
}
printf("The %s is now cut in two.\n", objp->name);
}
else if(strcmp(verb, "read") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to read.\n");
return FALSE;
}
printf("There isn't much to read on the %s.\n", objp->name);
}
Exercise 3. Write a function plural so that the game can print slightly more interesting messages.
Here is my first plural function:
#include <string.h>
#include "game.h"
#define MAXRETBUF 20
char *
plural(char *word)
{
static char retbuf[MAXRETBUF];
strcpy(retbuf, word);
strcat(retbuf, "s");
return retbuf;
}
Notice that the retbuf array is declared static, so that it will persist after plural returns, so that the pointer
that plural returns will remain valid.
Here is the change to (the newer version of) parser.c:
if(ac < 2)
cmd->object = NULL;
else if((cmd->object = findobject(actor, av[1])) == NULL)
{
printf("I don't see any %s.\n", plural(av[1]));

return FALSE;
}
(The only change is to the printf call. You could also make a similar change to the code just below which checks
for the presence of the object of the preposition, if any.)
To keep my compiler happy, I also added the prototype
extern char *plural(char *);
to game.h.
Finally, here is an improved version of the plural function. If the last character of the word being pluralized is s, it
tacks on `ès''. (Unfortunately, if the word is already plural, this means that it ends up changing ``marbles'' to
``marbleses,'' and sounding like Gollum.) Since this version has to compute the length of the original word anyway,
it's easy to make sure that it won't overflow the return buffer. (In this case, since plural's function is relatively
unimportant, I chose to have it return the original word, unpluralized, rather than printing error messages or aborting
or anything.)
#include <string.h>
#include "game.h"
#define MAXRETBUF 20
char *
plural(char *word)
{
static char retbuf[MAXRETBUF];
int len = strlen(word);
if(len + 3 >= MAXRETBUF)
return word;
strcpy(retbuf, word);
if(word[len-1] != 's')
strcat(retbuf, "s");
else
{
/* It ends in s. */
/* maybe we should add "es" */
strcat(retbuf, "es");
/* (or maybe it'a already plural and we should add nothing...) */
}
return retbuf;
}

Assignment #4
Assignment #4
Assignment #4
Handouts:
Assignment #4
Exercises:
1. This week we're going to add containers (so that objects can be put inside of other objects) and attributes, so that we can
remember things about objects (such as whether they're containers, whether they're open, etc.).
The code to implement this changes is scattered, and not all of it is in the distribution directories. You'll have to type some of
this new code in yourself.
Here is the modified object structure, along with a few definitions:
struct object
{
char name[MAXNAME];
unsigned int attrs;
struct object *lnext;
char *desc;
};
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
CONTAINER
CLOSABLE
OPEN
HEAVY
BROKEN
TOOL
SOFT
SHARP
LOCK
KEY
LOCKED
TRANSPARENT
IMMOBILE
/*
/*
/*
/*
contents (if container) */

next in list of contained objects */
(i.e. in this object's container) */
long description */
0x0001
0x0002
0x0004
0x0008
0x0010
0x0020
0x0040
0x0080
0x0100
0x0200
0x0400
0x0800
0x1000
#define Iscontainer(o) ((o)->attrs & CONTAINER)

#define Isopen(o) ((o)->attrs & OPEN)
(This code is in the file game.xh in the week4 subdirectory, although the copy there is missing the new desc field for
objects.)
Assignment #4
Here is new code, for commands.c, to implement new `òpen,'' ``close,'' and ``put (in)'' commands:
else if(strcmp(verb, "open") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to open.\n");
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is already open.\n", objp->name);
return FALSE;
}
if(!(objp->attrs & CLOSABLE))
{
printf("You can't open the %s.\n", objp->name);
return FALSE;
}
objp->attrs |= OPEN;
printf("The %s is now open.\n", objp->name);
}
else if(strcmp(verb, "close") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to close.\n");
return FALSE;
}
{
printf("You can't close the %s.\n", objp->name);
return FALSE;
}
if(!Isopen(objp))
{
printf("The %s is already closed.\n", objp->name);
return FALSE;
}
objp->attrs &= ~OPEN;
printf("The %s is now closed.\n", objp->name);
}
else if(strcmp(verb, "put") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to put.\n");
return FALSE;
}
if(!contains(player->contents, objp))
{
printf("You don't have the %s.\n", objp->name);
Assignment #4
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "in") != 0 ||
{
printf("You must tell me where to put the %s.\n",
objp->name);
return FALSE;
}
if(!Iscontainer(cmd->xobject))
{
printf("You can't put things in the %s.\n",
cmd->xobject->name);
return FALSE;
}
if((cmd->xobject->attrs & CLOSABLE) && !Isopen(cmd->xobject))
{
printf("The %s is closed.\n", cmd->xobject->name);
return FALSE;
}
if(!putobject(player, objp, cmd->xobject))
{
/* shouldn't happen */
printf("You can't put the %s in the %s!\n",
objp->name, cmd->xobject->name);
return FALSE;
}
printf("Now the %s is in the %s.\n",
}
(This code is in the file week4/commands.xc.)
Here is a new version of newobject, for object.c, that initializes the new object attribute and contents fields. (It's also
missing any required initialization of the new long description field; make sure you preserve yours.)
struct object *
{
{
exit(1);
}
objp->lnext = NULL;
objp->attrs = 0;
objp->contents = NULL;
return objp;
Assignment #4
}
Here is the new putobject function, for object.c. (It's in week4/object.xc.)
/* transfer object from actor to container */
putobject(struct actor *actp, struct object *objp, struct object *container)
{
struct object *lp;
struct object *prevlp = NULL;
for(lp = actp->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of actor's list */
if(lp == actp->contents)
/* head of list */
actp->contents = lp->lnext;
else
prevlp->lnext = lp->lnext;
/* splice into container's list */
lp->lnext = container->contents;
container->contents = lp;
return TRUE;
}
prevlp = lp;
}
/* didn't find it (error) */
return FALSE;
}
Here is new code for the parsedatafile function in io.c, to read attributes for objects:
else if(strcmp(av[0], "attribute") == 0)
{
if(currentobject == NULL)
continue;
if(strcmp(av[1], "container") == 0)
currentobject->attrs |= CONTAINER;
else if(strcmp(av[1], "closable") == 0)
currentobject->attrs |= CLOSABLE;
else if(strcmp(av[1], "open") == 0)
currentobject->attrs |= OPEN;
else if(strcmp(av[1], "heavy") == 0)
currentobject->attrs |= HEAVY;
else if(strcmp(av[1], "soft") == 0)
currentobject->attrs |= SOFT;
else if(strcmp(av[1], "sharp") == 0)
currentobject->attrs |= SHARP;
Assignment #4
}
(This code is incomplete; at some point you'll have to add cases for the other attributes defined in game.h.)
Here are the ``break'' and ``cut'' commands from commands.c, modified to make use of a few attributes:
else if(strcmp(verb, "break") == 0)
{
if(objp == NULL)
{
return FALSE;
}
{
printf("You must tell me what to break with.\n");
return FALSE;
}
{
return FALSE;
}
if(!(cmd->xobject->attrs & HEAVY))
{
printf("I don't think the %s is heavy enough to break things
with.\n",
return FALSE;
}
objp->attrs |= BROKEN;
printf("Oh, dear. Now the %s is broken.\n", objp->name);
}
else if(strcmp(verb, "cut") == 0)
{
if(objp == NULL)
{
return FALSE;
}
{
printf("You must tell me what to cut with.\n");
return FALSE;
}
{
return FALSE;
}
if(!(cmd->xobject->attrs & SHARP))
Assignment #4
{
printf("I don't think the %s is sharp enough to cut things with.\n",
return FALSE;
}
if(!(objp->attrs & SOFT))
{
printf("I don't think you can cut the %s with the %s.\n",
return FALSE;
}
printf("The %s is now cut in two.\n", objp->name);
}
Try to get all of this code integrated, compiled, and working. You will probably have to add a few prototypes to game.h.
Once it compiles, before you can play with the new features, you will have to make some changes to the data file,
dungeon.dat, too. Make the hammer heavy by adding the line
attribute heavy
just after the line
object hammer
Make the kettle a container by adding the line
attribute container
just after the line
object kettle
Add a knife object by adding the lines
object knife
attribute sharp
in one of the rooms. Add a box object by adding the lines
object box
attribute closable
Now you should be able to break things with the hammer, and put things into the kettle or box (once you open the box). The
knife won't cut things unless they're soft, so add an object with
attribute soft
and try cutting it with the knife. Once you put things into containers, they'll seem to disappear; we'll fix that in the next two
exercises.
With these changes, the game suddenly explodes in potential complexity. There are lots and lots of features you'll be able to
add, limited only by your own creativity and the amount of time you care to spend/waste. I've suggested several changes and
Assignment #4
improvements below, although you don't have to make all of them, and you're not limited to these suggestions, either. (You
won't be able to add too many more attributes, though, without possibly running out of bits in an int. We'll start using a more
open-ended scheme for implementing attributes next time. In the mean time, you could make the attrs field an unsigned
long int if you were desperate to add more attributes.)
2. Modify the findobject function in object.c so that it can find objects when they're inside containers (both in the
actor's possession and sitting in the room). The original implementation of findobject has separate loops for searching
the actor's list of objects and the room's list of objects. But now, when an object (in either list) is a container, we'll need to
search through its list of (contained) objects, too. It will be easiest if you add a new function, findobj2, which searches for
an object in any list of objects. The code in findobj2 will be just like the old code in findobject for searching the
actor's and room's lists; in fact, once you write findobj2, you'll be able to replace the actor- and room-searching code with
simple calls to
findobj2(actp->contents, name);
and
findobj2(actp->location->contents, name);
3.
4.
5.
6.
7.
8.
Ideally, your scheme should work even to find objects within objects within objects (or deeper); this will involve recursive
calls to findobj2.
Modify the takeobject function in object.c so that it can also take objects which are sitting in containers (both in the
actor's possession and sitting in the room). If a container is CLOSABLE, make sure it's OPEN!
Rewrite the `èxamine'' command (in commands.c) to mention some of the relevant attributes (OPEN, BROKEN, etc.) of the
object being examined. (Hint: this is a good opportunity to use the conditional or ?: operator.)
Modify the listobjects function in object.c to list the contents (if any) of objects which are containers. Should it
print the contents of closed containers? (In a more complicated game, we might worry about transparency...)
Implement objects which can be locked (if they have the LOCK attribute). Add a ``lock'' command to lock them (i.e. to set the
LOCKED attribute) and an `ùnlock'' command to unlock them. Both ``lock'' and `ùnlock'' should require that the user use a
tool which has the KEY attribute. (In other words, ``lock strongbox with key'' should work, but ``lock strongbox with broom''
should not work. But don't worry about trying to make certain keys fit only certain locks.) Don't let objects be opened if they
have the LOCK attribute and are locked. (You'll have to update the attribute-reading code in io.c to handle the LOCK, KEY,
and LOCKED attributes.)
Implement a ``fix'' command which will let the user fix broken objects, as long as the preposition ``with'' appears and the
implement is an object with the TOOL attribute. (You'll have to update the attribute-reading code in io.c to handle the
TOOL attribute.)
Implement immobile objects that can't be picked up. (You'll have to update the attribute-reading code in io.c to handle the
IMMOBILE attribute.)
Exercise 2. Modify the findobject function in object.c so that it can find objects when they're inside
containers.
As suggested in the assignment, we'll need to break the old findobject function up into two functions:
findobject and findobj2. findobject accepts an actor and the name of an object to find (as before);
findobj2 accepts a pointer to a list of objects, and the name of an object to find.
static struct object *findobj2(struct object *, char *);
struct object *
findobject(struct actor *actp, char *name)
{
struct object *lp;
/* first look in actor's possessions: */
lp = findobj2(actp->contents, name);
if(lp != NULL)
return lp;
/* now look in surrounding room: */
if(actp->location != NULL)
{
lp = findobj2(actp->location->contents, name);
if(lp != NULL)
return lp;
}
/* didn't find it */
return NULL;
}
/* find a named object in an object list */
/* (return NULL if not found) */
static struct object *
findobj2(struct object *list, char *name)
{
struct object *lp;

for(lp = list; lp != NULL; lp = lp->lnext)
{
if(strcmp(lp->name, name) == 0)
return lp;
if(lp->contents != NULL)
{
struct object *lp2 = findobj2(lp->contents, name);
if(lp2 != NULL)
return lp2;
}
}
/* didn't find it */
return NULL;
}
The body of findobj2 is the loop that findobject used to use when it searched the player's possessions and the
room's contents; findobject now simply calls findobj2 at those two spots. findobj2 also calls itself,
recursively, when one of the objects it's inspecting (from either list, or any list) is a container object with its own
contents.
I've declared findobj2 as static, which means that it can only be called from within object.c. It's an
auxiliary function, which only findobject calls, so no one outside needs to (or should) be able to call it. The
function prototype for findobj2 includes the keyword static, too.
Exercise 3. Modify the takeobject function in object.c so that it can also take objects which are sitting in
containers.
This is harder than it ought to be, because we have to remove the object from the container's list; however, it's not
trivial to find the object which contains another object. Ideally, all objects would also have pointers back to their
containers (just as actors already have pointers back to their rooms). For now, we'll write a findcontainer
function which is kind of like findobject except that it returns the object's container. As for findobject, we'll
also need a recursive findcont2 function.
I added this code at the end of takeobject (just before the ``didn't find it'' comment and the last return
FALSE; line):
/* perhaps it's in a container */
containerp = findcontainer(objp, actp, roomp);
if(containerp != NULL)
{
/* this test should probably be up in commands.c) */
if((containerp->attrs & CLOSABLE) && !Isopen(containerp))
{
printf("The %s is closed.\n", containerp->name);
return FALSE;
}
/* re-find in container's list, for splicing */
prevlp = NULL;
for(lp = containerp->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of room's list */
if(lp == containerp->contents) /* head of list */
containerp->contents = lp->lnext;
else
/* splice into actor's list */
lp->lnext = actp->contents;
actp->contents = lp;
return TRUE;
}
prevlp = lp;
}
}
As the comment indicates, the test for a closed container probably belongs up in commands.c. (That's where the
other similar tests are; the functions in object.c are supposed to be low-level, and just manipulate data structures.)
But, because of the clumsy way we're currently handling actors, rooms, objects, and containers, it's hard to do it right.
(You may not yet have realized the clumsiness of the current scheme. The problem is that actors are really objects and
rooms are really containers, and since containers are objects, rooms are objects, too. We'll have more to say about this
issue later.)
Here are findcontainer and findcont2:
/* find an object's container (in actor's possesion, or room) */
/* (return NULL if not found) */
findcontainer(struct object *objp, struct actor *actp, struct room *roomp)
{
struct object *lp, *lp2;
/* first look in possessions: */
for(lp = actp->contents; lp != NULL; lp = lp->lnext)

{
{
lp2 = findcont2(objp, lp);
if(lp2 != NULL)
return lp2;
}
}
/* now look in room: */
for(lp = roomp->contents; lp != NULL; lp = lp->lnext)
{
{
lp2 = findcont2(objp, lp);
if(lp2 != NULL)
return lp2;
}
}
return NULL;
}
findcont2(struct object *objp, struct object *container)
{
struct object *lp;
for(lp = container->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
return container;
{
struct object *lp2 = findcont2(objp, lp);
if(lp2 != NULL)
return lp2;
}
}
return NULL;
}
Exercise 4. Rewrite the `èxamine'' command to mention some of the relevant attributes of the object being examined.
Here is the modified code (from docommand in commands.c):
else if(strcmp(verb, "examine") == 0)

{
int printedsomething;
if(objp == NULL)
{
printf("You must tell me what to examine.\n");
return FALSE;
}
printedsomething = FALSE;
if(objp->desc != NULL && *objp->desc != '\0')
{
printedsomething = TRUE;
}
if(Iscontainer(objp))
{
if(objp->attrs & CLOSABLE)
{
printf("The %s is %s.\n", objp->name,
Isopen(objp) ? "open" : "closed");
}
if((!(objp->attrs & CLOSABLE) || Isopen(objp)) &&
objp->contents != NULL)
{
printf("The %s contains:\n", objp->name);
listobjects(objp->contents);
}
}
if(objp->attrs & BROKEN)
{
printf("The %s is broken.\n", objp->name);
}
if(objp->attrs & SHARP)
{
printf("The %s is quite sharp.\n", objp->name);
}
if(!printedsomething)
printf("You see nothing special about the %s.\n", objp->name);
}
There are several conditions under which this code prints something ``special'' about the object being examined: if the
object has a long description, if the object is a container, if the object has certain attributes, etc. Whenever the code
prints one of these messages (i.e. under any circumstances) it sets the Boolean variable printedsomething to
TRUE. At the very end, if we haven't found anything interesting to print (i.e. if printedsomething is still
FALSE), we fall back on the generic ``You see nothing special'' message.
In simpler cases, it's nice to arrange conditionals so that you don't need extra Boolean variables. Here, however, the
condition under which we print ``You see nothing special'' would be so complicated if we tried to express it directly
that it's much easier to just use the little printedsomething variable. (Among other things, using
printedsomething means that it will be easier to add more attribute printouts later, as long as they all set
printedsomething, too.)
Exercise 5. Modify the listobjects function in object.c to list the contents of objects which are containers.
Here is the simple, straightforward implementation:
void
listobjects(struct object *list)
{
struct object *lp;
{
printf("%s\n", lp->name);
{
printf("The %s contains:\n", lp->name);
listobjects(lp->contents);
}
}
}
However, this won't look too good, because when we list, say, the contents of a room, and the room contains a
container and some other objects, the list of objects in the container won't be demarcated from the other objects in the
room. So, a fancier solution would be to indent each container's list relative to the surrounding list. To do this, we add
a ``depth'' parameter which keeps track (with each recursive call) of how deeply the objects and containers we're
listing are. The depth obviously starts out as 0, and to keep all the old callers of listobjects from having to add
this new argument, we rename listobjects as listobjs2, and then have a new, stub version of
listobjects (which everyone else continues to call) which simply calls listobjs2, starting off the recursive
chain at a depth of 0. (Since listobjs2 is only called by listobjects, it's declared static.)
static void listobjs2(struct object *, int);
void
listobjects(struct object *list)
{
listobjs2(list, 0);
}
static void
listobjs2(struct object *list, int depth)
{
struct object *lp;
{
printf("%*s%s\n", depth, "", lp->name);
{
printf("%*sThe %s contains:\n", depth, "", lp->name);
listobjs2(lp->contents, depth + 1);
}
}
}
The indentation is done using a trick: the "%*s" format means to print a string in a field of a given width, where the
width is taken from printf's argument list. The string we ask to be printed is the empty string, because all we want
is a certain number of spaces (that is, the spaces which printf will add to pad our string out to the requested width).
But (once we know the trick, anyway) this is considerably easier than having to write little loops which print a certain
number of spaces.
Exercise 6. Implement objects which can be locked.
Here are the new ``lock'' and `ùnlock'' commands:
else if(strcmp(verb, "lock") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to lock.\n");
return FALSE;
}
if(!(objp->attrs & LOCK))
{
printf("You can't lock the %s.\n", objp->name);
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is open.\n", objp->name);
return FALSE;
}
{
printf("You must tell me what to lock with.\n");
return FALSE;
}
{
return FALSE;
}
if(!(cmd->xobject->attrs & KEY))
{
printf("The %s won't lock the %s.\n",
cmd->xobject->name, objp->name);
return FALSE;
}
if(objp->attrs & LOCKED)
{
printf("The %s is already locked.\n", objp->name);
return FALSE;
}
objp->attrs |= LOCKED;
printf("The %s is now locked.\n", objp->name);
}
else if(strcmp(verb, "unlock") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to unlock.\n");
return FALSE;
}
if(!(objp->attrs & LOCK))
{
printf("You can't unlock the %s.\n", objp->name);
return FALSE;
}
{
printf("You must tell me what to unlock with.\n");
return FALSE;
}
{
return FALSE;
}
if(!(cmd->xobject->attrs & KEY))
{
printf("The %s won't unlock the %s.\n",
cmd->xobject->name, objp->name);
return FALSE;
}
if(!(objp->attrs & LOCKED))
{
printf("The %s is already unlocked.\n", objp->name);
return FALSE;
}
objp->attrs &= ~LOCKED;
printf("The %s is now unlocked.\n", objp->name);
}
Here is the modified `òpen'' command:
else if(strcmp(verb, "open") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to open.\n");
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is already open.\n", objp->name);
return FALSE;
}
{
printf("You can't open the %s.\n", objp->name);
return FALSE;
}
if((objp->attrs & LOCK) && (objp->attrs & LOCKED))
{
printf("The %s is locked.\n", objp->name);
return FALSE;
}
objp->attrs |= OPEN;
printf("The %s is now open.\n", objp->name);
}
(The ``close'' command doesn't need modifying; we'll let open containers and doors swing and latch closed even if
they were already locked.)
Finally, here are a few more lines for io.c, to read the new attributes needed by this and the following two exercises:
else if(strcmp(av[1], "lock") == 0)

currentobject->attrs |= LOCK;
else if(strcmp(av[1], "locked") == 0)
currentobject->attrs |= LOCKED;
else if(strcmp(av[1], "key") == 0)
currentobject->attrs |= KEY;
else if(strcmp(av[1], "tool") == 0)
currentobject->attrs |= TOOL;
else if(strcmp(av[1], "immobile") == 0)
currentobject->attrs |= IMMOBILE;
Exercise 7. Implement a ``fix'' command which will let the user fix broken objects.
Here is the code (for docommand in commands.c, of course):
else if(strcmp(verb, "fix") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to fix.\n");
return FALSE;
}
if(!(objp->attrs & BROKEN))
{
printf("The %s is not broken.\n", objp->name);
return FALSE;
}
{
printf("You must tell me what to fix with.\n");
return FALSE;
}
{
return FALSE;
}
if(!(cmd->xobject->attrs & TOOL))
{
printf("I don't see how you can fix things with a %s.\n",
return FALSE;
}
objp->attrs &= ~BROKEN;
printf("Somehow you manage to fix the %s with the %s.\n",
}
Exercise 8. Implement immobile objects that can't be picked up.
Here is the modified ``take'' command:
else if(strcmp(verb, "take") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to take.\n");
return FALSE;
}
if(contains(player->contents, objp))
{
printf("You already have the %s.\n", objp->name);
return FALSE;
}
if(objp->attrs & IMMOBILE)
{
printf("The %s cannot be picked up.\n", objp->name);
return FALSE;
}
if(!takeobject(player, objp))
{
printf("You can't pick up the %s.\n", objp->name);
return FALSE;
}
printf("Taken.\n");
Assignment #5
Assignment #5
Assignment #5
Handouts:
Assignment #5
Exercises:
1. Bit-masks (such as the attrs field we added to the object structure last time) are a perfectly reasonable data
structure to use for keeping track of a relatively small, well-defined set of attributes. However, as we've already seen,
once we start trying to keep track of the qualities and states of various objects in the game, we quickly find ourselves
using a welter of attributes. There aren't enough bits in an int (or even a long int) to keep track of all the
attributes we might eventually want. Furthermore, it becomes a nuisance to have to #define a new mask in
game.h every time we invent a new attribute, and to add another if clause to parsedatafile in io.c to map
the attribute line in the data file (a string) to the corresponding mask constant.
Therefore, after using it for only a week or two, we're going to rip out the bitmask-based attribute scheme and replace
it with something more open-ended and flexible. Any object may have a list of attributes, with each attribute
represented by an arbitrary string. We'll write a few functions for setting, testing, and clearing these attributes, so that
they'll be convenient for the rest of the program to manipulate. While we're at it, we'll get some more practice doing
dynamic memory allocation and using linked lists.
Since, under this new scheme, an object may have arbitrarily many attributes, and since each attribute will be an
arbitrary-length string, we won't be able to use fixed-size data structures. We'll use dynamically-allocated ones,
instead, which means that we'll be calling malloc a lot. A cardinal rule of using malloc is that you must check its
return value to make sure that it did not return a null pointer. If the program ever runs out of memory, or some other
problem causes malloc to return a null pointer, and if your program accidentally uses that null pointer as if it points
to usable memory, your program can and will fail in mysterious ways. If, on the other hand, you check malloc's
return value, and print a distinct error message when it fails, you'll at least know more or less exactly what the
problem is.
If you have many places in your program where you're doing dynamic allocation, it may become a nuisance to have to
check return values at all of those places. Therefore, a popular strategy is to implement a wrapper function around
malloc. The one we'll be using looks like this:
#include <stdio.h>
#include <stdlib.h>
#include "chkmalloc.h"
void *
chkmalloc(size_t sz)
Assignment #5
{
void *ret = malloc(sz);
if(ret == NULL)
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
return ret;
}
All chkmalloc does is call malloc, check its return value, and return it if it's not NULL. (size_t is an ANSI C
type for representing the sizes of objects, and void * is the generic pointer type which malloc and related
functions return.) But since chkmalloc never returns a null pointer, its caller never has to check, but can begin
using the pointer immediately. chkmalloc can guarantee never to return a null pointer not because it implements
some kind of a magical infinite memory space, but rather because it simply exits, after printing an error message, if
malloc fails. Thus, all tests for malloc failures (and our strategy for dealing with those failures) are centralized in
chkmalloc().
The ``strategy'' of printing an error message and exiting when malloc fails is an expedient one, but it would not be
ideal for all programs. It assumes that (a) an actual out-of-memory condition is unlikely, and (b) the program won't
have any cleanup to do and won't mind if it's summarily terminated. This strategy would not, for example, be
acceptable for a text editor: the editor might use arbitrary amounts of memory when editing a large file, and the user
would certainly not appreciate being dumped out, without having the opportunity to save any work, after trying to add
(say) the 1,000,001st character to a huge file which had just been typed in from scratch over many hours. However,
for our little adventure game, this is a perfectly adequate strategy, for now at least.
You might argue that, if out-of-memory conditions are unlikely, we might as well skip chkmalloc, call malloc
directly, and not check its return value. This would be a bad idea, because while actual out-of-memory conditions
may be rare, other kinds of malloc failures are not so rare, especially in a program under development. If you
misuse the memory which malloc gives you, perhaps by asking for 16 bytes of memory and then writing 17
characters to it (and this is all too easy to do), malloc tends to notice, or at least to be broken by your carelessness,
and will return a null pointer next time you call it. When malloc returns a null pointer for this reason, it can be
difficult to track down the actual error (because it typically occurred somewhere in your code before the call to
malloc that failed), but if we blindly used the null pointer which malloc returned to us, we'd only defer the
eventual crash even farther, and it might be quite mysterious, much more so than a definitive `òut of memory''
message. Therefore, using a wrapper function like chkmalloc is a definite improvement, because we get error
messages as soon as malloc fails, and we don't have to scatter tests all through our code to get them.
All we do have to do, anywhere we call chkmalloc, is #include "chkmalloc.h", which contains
chkmalloc's external function prototype declaration, as well as a second convenience function which we'll meet in
a minute:
extern void *chkmalloc(size_t);
extern char *chkstrdup(char *);
Both chkmalloc.c and chkmalloc.h are in the week5 subdirectory of the source directories.
With chkmalloc in hand, we can begin implementing the new attribute scheme. First, we define this list structure:
struct list
{
Assignment #5
char *item;
struct list *next;
};
Then, we rewrite the object structure to use a list, instead of an unsigned int, for attrs:
struct object
{
char name[MAXNAME];
struct list *attrs;
struct object *lnext;
/*
/*
/*
/*
contents (if container) */

next in list of contained objects */
(i.e. in this object's container) */
long description */
char *desc;
};
#define Iscontainer(o) hasattr(o, "container")
#define Isopen(o) hasattr(o, "open")
Notice that we have also rewritten the Iscontainer() and Isopen() macros, to use the new hasattr
function, which we'll see in a minute. (This suggests another benefit of having used those macros: none of the code
that ``called'' Iscontainer() or Isopen() will need rewriting. As we'll see, though, the code that has been
using raw bit operations to test and set the other attributes will need rewriting.) The old attribute bits (CONTAINER,
OPEN, etc.) are no longer needed.
We rewrite the newobject function in object.c slightly, to initialize attrs to a null pointer:
struct object *
{
{
exit(1);
}
objp->lnext = NULL;
objp->attrs = NULL;
objp->desc = NULL;
return objp;
}
(Actually, we could have left it as objp->attrs = 0, because 0 is also an acceptable null pointer constant.)
Assignment #5
Now, here are the three new functions for testing, setting, and clearing (`ùnsetting'') attribute strings:
/* see if the object has the attribute */
int
hasattr(struct object *objp, char *attr)
{
struct list *lp;
for(lp = objp->attrs; lp != NULL; lp = lp->next)
{
if(strcmp(lp->item, attr) == 0)
return TRUE;
}
return FALSE;
}
/* set an attribute of an object (if it's not set already) */
void
setattr(struct object *objp, char *attr)
{
struct list *lp;
if(hasattr(objp, attr))
return;
lp = chkmalloc(sizeof(struct list));
lp->item = chkstrdup(attr);
lp->next = objp->attrs;
objp->attrs = lp;
}
/* clear an attribute of an object */
void
unsetattr(struct object *objp, char *attr)
{
struct list *lp;
struct list *prevlp;
for(lp = objp->attrs; lp != NULL; lp = lp->next)
{
if(strcmp(lp->item, attr) == 0)
{
if(lp == objp->attrs)
objp->attrs = lp->next;
else
prevlp->next = lp->next;
free(lp->item);
free(lp);
return;
Assignment #5
}
prevlp = lp;
}
}
(These functions are in the file object.xc in the week5 subdirectory.)
hasattr returns TRUE if an object has a certain attribute; it simply searches through the object's attribute list
looking for a matching string. setattr sets an attribute; if the object does not already have the attribute, setattr
allocates a new list structure, by allocating a new list node, allocating and copying the string, and splicing the new
node into the attribute list. unsetattr clears an attribute by finding it in the list and freeing both the list node
structure and the string (if a matching attribute is found).
To explain the call to chkstrdup in setattr, we must say and think a little more about memory allocation. It
turns out that, to be on the safe side, setattr must allocate memory for the string defining the attribute. It cannot
assume that the string which was passed to it was allocated in such a way that it would be guaranteed to persist for the
life of this attribute on this object. For example, when we're reading a data file, the passed-in attribute string will be
sitting in a buffer holding one line we've just read from the data file, and that buffer will be overwritten when we read
the next line. Even if the passed-in string were allocated in such a way that it would stick around, setattr still
couldn't use it directly, but would still have to make a copy, because unsetattr is always going to free the string,
so setattr better have allocated it. Many times, the string passed to setattr will be a pointer to a string constant
in the source code, and if setattr simply copied the pointer rather than allocating and copying the string,
unsetattr might later try to free that pointer, an operation which would fail since the pointer wasn't originally
obtained from malloc.
setattr could call strlen to get the length of the string, add 1 for the terminating \0, call malloc or
chkmalloc, and copy the string in, but since this is a common pattern (and since it's easy to forget to add 1), we
encapsulate it in the function chkstrdup. chkstrdup accepts a string, and returns a pointer to malloc'ed
memory holding a copy of the string. The ``chk'' in its name reflects the fact that, like chkmalloc, it never returns
a null pointer, even when malloc fails. Here is chkstrdup's definition:
#include <string.h>
char *
chkstrdup(char *str)
{
char *ret = chkmalloc(strlen(str) + 1);
strcpy(ret, str);
return ret;
}
(chkstrdup is also in chkmalloc.c.)
Now that we have the hasattr, setattr, and unsetattr functions, we unfortunately have quite a few changes
to make. Everywhere we have the pattern
object->attrs & ATTRIBUTE
(where object is any struct object pointer and ATTRIBUTE is any attribute), we must replace it with
hasattr(object, "attribute")
Assignment #5
Everywhere we have the pattern

object->attrs |= ATTRIBUTE
we must replace it with
setattr(object, "attribute")
And everywhere we have the pattern
object->attrs &= ~ATTRIBUTE
we must replace it with
unsetattr(object, "attribute")
(As we mentioned, though, anywhere we ``call'' the Iscontainer() or Isopen() macros to test the
CONTAINER or OPEN attributes, we won't have to change anything.)
Finally, the attribute-reading code in io.c is considerably simplified. Here is the relevant code from
parsedatafile:
else if(strcmp(av[0], "attribute") == 0)
{
continue;
setattr(currentobject, av[1]);
}
So, exercise 1 (which all of this text so far has been an elaborate prelude to) is to add the functions and source files
mentioned so far, change all of the code that uses attrs to use the new hasattr, setattr, and unsetattr
functions (the changes should be confined to commands.c, and to one line in object.c), and make sure that the
program still works.
2. If you didn't use dynamically-allocated memory to hold long object and room descriptions, make that change now.
3. Rewrite newobject in object.c and newroom in rooms.c to dynamically allocate new structures, rather than
parceling them out of the static objects and rooms arrays. (Eliminate those arrays.)
Eliminating the rooms array has one unintended consequence. Although using arrays to hold the objects and rooms
was generally a nuisance in the long run, we did take advantage of it in one situation: during the initial setup of the
dungeon, while reading the data file, we hooked up named room exits to other rooms bu calling the findroom
function. findroom had to be able to find a named room anywhere in the dungeon, which meant that it needed a
way to scan through the list of all rooms. When rooms were held in an array, scanning through the array was easy.
When we move to dynamically allocated room structures, we must preserve this ability.
The fact that what we need is the ability to ``scan through the list of all rooms'' suggests the solution: we'll keep a
linked list of all the rooms. We don't need to define and implement a new list structure or anything; we can just add a
``next'' pointer to struct room. This next pointer won't be used for any purpose which the user playing the game
will be able to see--as far as the user is concerned, the only connections (pointers) between rooms are the ones
Assignment #5
described by the room's explicit exits. This new next field for struct room will be for our own, internal, behindthe-scenes use, so that we can implement this list of all rooms.
4. Improve the code in io.c so that room exit lists can be placed directly in the room descriptions, rather than at the
end. If an exit refers to a room you've already seen, you can hook up the exit immediately. But if the exit refers to a
room you haven't seen yet, you'll have to allocate some temporary data structure to keep track of the source room,
direction, and destination. Then, after you've read the entire data file, you can go back through that list of unresolved
exits, since all of the destinations should exist by that point. (If any don't, you can print an error message.)
Exercise 2. If you didn't use dynamically-allocated memory to hold long object and room descriptions, make that
change now.
See the published answer to assignment 2, exercise 2.
Exercise 3. Rewrite newobject and newroom to dynamically allocate new structures, rather than parceling
them out of the static objects and rooms arrays.
Rewriting newobject in object.c is easy enough:
struct object *
{
objp = chkmalloc(sizeof(struct object));
objp->lnext = NULL;
objp->attrs = NULL;
objp->desc = NULL;
return objp;
}
However, it isn't quite so simple for newroom, because we need a way of looking through all the rooms we've got,
for example in the findroom function. Therefore, we can't allocate room structures in isolation.
As suggested in the assignment, one possibility is to keep a linked list of all rooms allocated, by adding an extra
``next'' field to struct room in game.h:
struct room
{
char name[MAXNAME];
struct room *exits[NEXITS];
char *desc;
struct room *next;
};
/* list of all rooms */
Now I can rewrite newroom in rooms.c:

static struct room *roomlist = NULL;
struct room *
newroom(char *name)
{
struct room *roomp;
int i;
roomp = chkmalloc(sizeof(struct room));
roomp->next = roomlist;
roomlist = roomp;
/* splice into list of all rooms */
strcpy(roomp->name, name);
roomp->contents = NULL;
for(i = 0; i < NEXITS; i++)
roomp->exits[i] = NULL;
roomp->desc = NULL;
return roomp;
}
Splicing the new room into the list of all rooms is straightforward.
findroom must be rewritten slightly, to walk over the new room list rather than the old rooms array:
struct room *
findroom(char *name)
{
struct room *roomp;
for(roomp = roomlist; roomp != NULL; roomp = roomp->next)
{
if(strcmp(roomp->name, name) == 0)
return roomp;
}
return NULL;
}
We also have to decide what to do with the getentryroom function. (This is the function that gets called to
decide which room the actor should initially be placed in.) It used to return, rather arbitrarily, the first room in the
rooms array, which always happened to correspond to the first room in the data file, which was a reasonable
choice. If we make the obvious change to getentryroom, and have it return the ``first'' room in the new room
list:
struct room *
getentryroom(void)
{
return roomlist;
}
/* temporary */
it will not be so reasonable a choice, because our easy implementation of the list-splicing code in newroom above
adds new rooms to the head of the list, such that the first room added, i.e. the first room in the data file, ends up at
the end of the list. Depending on what your data file looks like, the game with the modifications shown so far will
start out by dumping the player unceremoniously onto the back porch or into the basement (which might actually
end up being an advantage for the player, if the basement tunnel is where the treasure is!).
If you wish, you can rewrite newroom to place new rooms at the end of the list, or getentryroom to return the
tail of the list, to solve this problem. However, a better fix would be to allow the data file to explicitly specify what
the entry room should be, rather than making the game code assume anything. We'll leave that for another exercise.
Exercise 4. Improve the code in io.c so that room exit lists can be placed directly in the room descriptions,
rather than at the end.
The basic change is to the parsedatafile function. Whereas it used to parse lines at the end of the data file
like
roomexits kitchen s:hall e:stairway n:porch
we'll now make it parse lines like
exit s hall
exit e stairway
exit n porch
which will occur within the description of the particular room (in this case, the kitchen) itself. (I've simplified
things by putting three exits on three separate lines and eliminating the colon syntax; there was no particular reason
for me to have done it in that more complicated way in the first place.)
Here is the new code for parsedatafile:
else if(strcmp(av[0], "exit") == 0)
{
struct room *roomp;
if(ac < 3)
{
fprintf(stderr, "missing exit or room name\n");
continue;
}
if(currentroom == NULL)
{
fprintf(stderr, "exit not in room\n");
continue;
}
roomp = findroom(av[2]);
if(roomp != NULL)
{
/* already have room, so connect */
connectexit(currentroom, av[1], roomp);
}
else
{
/* haven't seen room yet; stash for later */
stashexit(currentroom, av[1], av[2]);
}
}
This makes use of two auxiliary functions:
static void
connectexit(struct room *room, char *dirname, struct room *nextroom)
{
int dir;
if(strcmp(dirname, "n") == 0)
dir = NORTH;
else if(strcmp(dirname, "e") == 0)
dir = EAST;
else if(strcmp(dirname, "w") == 0)
dir = WEST;
else if(strcmp(dirname, "s") == 0)
dir = SOUTH;
else if(strcmp(dirname, "ne") == 0)
dir = NORTHEAST;
else if(strcmp(dirname, "se") == 0)
dir = SOUTHEAST;
else if(strcmp(dirname, "nw") == 0)
dir = NORTHWEST;
else if(strcmp(dirname, "sw") == 0)
dir = SOUTHWEST;
else if(strcmp(dirname, "u") == 0)
dir = UP;
else if(strcmp(dirname, "d") == 0)
dir = DOWN;
else
{
fprintf(stderr, "no such direction \"%s\"\n", dirname);

return;
}
room->exits[dir] = nextroom;
}
struct stashedexit
{
struct room *room;
char *dirname;
char *nextroom;
struct stashedexit *next;
};
static struct stashedexit *stashedexits = NULL;
static void
stashexit(struct room *room, char *dirname, char *nextroom)
{
struct stashedexit *ep = chkmalloc(sizeof(struct stashedexit));
ep->room = room;
ep->dirname = chkstrdup(dirname);
ep->nextroom = chkstrdup(nextroom);
ep->next = stashedexits;
stashedexits = ep;
}
The stashexit function builds a linked list of stashed exits, allocating a new instance of the new struct
stashedexit for each one, each containing the room pointer and the names of the exit direction and destination
room. (The strings must also be copied to dynamically-allocated memory, because on entry to stashexit they
are pointers back into the line or line2 array in parsedatafile, and those strings will be overwritten when
we read the next line in the data file.)
Finally, we must arrange to resolve the stashed exits when we're done reading the data file and have had a chance
to see the descriptions for all the rooms. Here is the change to the readdatafile function:
static void connectexit(struct room *, char *, struct room *);
static void stashexit(struct room *, char *, char *);
static void resolveexits(void);
readdatafile()
{
char *datfile = "dungeon.dat";
FILE *fp = fopen(datfile, "r");
if(fp == NULL)
{
fprintf(stderr, "can't open %s\n", datfile);
return FALSE;
}
parsedatafile(fp);
resolveexits();
fclose(fp);
return TRUE;
}
And here is the third auxiliary function:
static void
resolveexits()
{
struct stashedexit *ep, *nextep;
for(ep = stashedexits; ep != NULL; ep = nextep)
{
struct room *roomp = findroom(ep->nextroom);
if(roomp == NULL)
fprintf(stderr, "still no such room \"%s\"\n", ep->nextroom);
else
connectexit(ep->room, ep->dirname, roomp);
nextep = ep->next;
free(ep->dirname);
free(ep->nextroom);
free(ep);
}
stashedexits = NULL;
}
One aspect of this last function deserves mention. Why does it not use a more usual list-traversal loop, such as
for(ep = stashedexits; ep != NULL; ep = ep->next)
What's with that temporary variable nextep? This loop is doing two things at once: traversing the linked list in
order to resolve the stashed exits, and freeing the list as it goes. But when it frees the node (the struct
stashedexit) it's working on, it also frees--and hence loses--the next pointer which that structure contains!
Therefore, it makes a copy of the next pointer before it frees the structure.
Assignment #6
Assignment #6
Assignment #6
Handouts:
Assignment #6
Exercises:
1. In this exercise, we'll almost completely rewrite commands.c so that instead of having a giant if/else
chain, with many calls to strcmp and the code for all the actions interspersed, there will be one function per
command, a table relating command verbs to the corresponding functions, and some much simpler code
which will look up a command verb in the table and decide which function to call. The functions in the table
will, of course, be referred to by function pointers.
In the week6 subdirectory of the distribution materials are two new files, cmdtab.c and cmdtab.h, and
a modified (though incomplete) copy of commands.c. cmdtab.h defines the command table structure,
struct cmdtab:
struct cmdtab
{
char *name;
int (*func)(struct actor *, struct object *, struct sentence *);
};
extern struct cmdtab *findcmd(char *, struct cmdtab [], int);
With this structure in hand, the first thing to do is to declare a bunch of functions, one for each command,
and then build the ``command table,'' which is an array of struct cmdtab. All of the command functions
are going to take as arguments the actor (struct actor *), the object being acted upon (struct
object *, if any), and the complete sentence structure (struct sentence *) describing the
command we're trying to interpret/implement. (The object being acted upon is the direct object of the
sentence and could also be pulled out of the sentence structure; we're passing it as a separate argument
because that might make things more convenient for the command functions themselves.) Here, then, is the
command table, incorporating all the commands we've ever used (your copy of the game might not include
all of the commands referenced here):
Assignment #6
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
dircmd(struct actor *, struct object *, struct sentence *);

takecmd(struct actor *, struct object *, struct sentence *);
dropcmd(struct actor *, struct object *, struct sentence *);
lookcmd(struct actor *, struct object *, struct sentence *);
invcmd(struct actor *, struct object *, struct sentence *);
examcmd(struct actor *, struct object *, struct sentence *);
hitcmd(struct actor *, struct object *, struct sentence *);
breakcmd(struct actor *, struct object *, struct sentence *);
fixcmd(struct actor *, struct object *, struct sentence *);
cutcmd(struct actor *, struct object *, struct sentence *);
readcmd(struct actor *, struct object *, struct sentence *);
opencmd(struct actor *, struct object *, struct sentence *);
closecmd(struct actor *, struct object *, struct sentence *);
lockcmd(struct actor *, struct object *, struct sentence *);
unlockcmd(struct actor *, struct object *, struct sentence *);
putcmd(struct actor *, struct object *, struct sentence *);
helpcmd(struct actor *, struct object *, struct sentence *);
quitcmd(struct actor *, struct object *, struct sentence *);
static struct cmdtab commands[] =

{
"n",
dircmd,
"north",
dircmd,
"s",
dircmd,
"south",
dircmd,
"e",
dircmd,
"east",
dircmd,
"w",
dircmd,
"west",
dircmd,
"ne",
dircmd,
"northeast",
dircmd,
"se",
dircmd,
"southeast",
dircmd,
"nw",
dircmd,
"northwest",
dircmd,
"sw",
dircmd,
"southwest",
dircmd,
"up",
dircmd,
"down",
dircmd,
"take",
takecmd,
"drop",
dropcmd,
"look",
lookcmd,
"i",
invcmd,
"inventory",
invcmd,
"examine",
examcmd,
"hit",
hitcmd,
"break",
breakcmd,
Assignment #6
"fix",
"cut",
"read",
"open",
"close",
"lock",
"unlock",
"put",
"?",
"help",
"quit",
};
fixcmd,
cutcmd,
readcmd,
opencmd,
closecmd,
lockcmd,
unlockcmd,
putcmd,
helpcmd,
helpcmd,
quitcmd,
With this structure defined, the body of docommand is drastically simplified. In fact, since we'll break the
searching of the command table out to a separate function, all the new docommand has to do is call that
function, then (if the command is found) call the command function associated with the command:
docommand(struct actor *player, struct sentence *cmd)
{
struct cmdtab *cmdtp;
cmdtp = findcmd(cmd->verb, commands, Sizeofarray(commands));
if(cmdtp != NULL)
{
(*cmdtp->func)(player, cmd->object, cmd);
return TRUE;
}
else
{
printf("I don't know the word \"%s\".\n", cmd->verb);
return FALSE;
}
}
The new findcmd function wants to know (as its third argument) the number of entries in the command
table. It would be a nuisance if we had to count them, and we'd have to keep updating our count each time
we added an entry to the table. The built-in sizeof operator can tell us the size of the array, but it tells us
the size in bytes, while we want to know the number of entries. But the number of entries is the size of the
array in bytes divided by the size of one entry in bytes, or
sizeof(commands) / sizeof(commands[0])
(here we get a handle on ``the size of one entry'' by arbitrarily taking sizeof(commands[0])). Since
this is a common operation, I like to define a function-like macro to encapsulate it:
Assignment #6
#define Sizeofarray(a) (sizeof(a) / sizeof(a[0]))

With that definition in place, the number of entries in the commands array is simply
Sizeofarray(commands).
Here is the findcmd function, from cmdtab.c:
#include <string.h>
#include "cmdtab.h"
struct cmdtab *
findcmd(char *cmd, struct cmdtab cmdtab[], int ncmds)
{
int i;
for(i = 0; i < ncmds; i++)
{
if(strcmp(cmdtab[i].name, cmd) == 0)
return &cmdtab[i];
}
return NULL;
}
The `ònly'' thing left to do is to write all of the individual command functions. The code for all of them will
come from the corresponding if clause from the old, bulky version of docommand. For example, here is
takecmd:
static int
takecmd(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(objp == NULL)
{
printf("You must tell me what to take.\n");
return FALSE;
}
if(contains(player->contents, objp))
{
printf("You already have the %s.\n", objp->name);
return FALSE;
}
if(hasattr(objp, "immobile"))
{
printf("The %s cannot be picked up.\n", objp->name);
Assignment #6
return FALSE;
}
if(!takeobject(player, objp))
{
printf("You can't pick up the %s.\n", objp->name);
return FALSE;
}
printf("Taken.\n");
return SUCCESS;
}
With one exception, all of the rest of the command functions are similar; they just have
static int
xxxcmd(struct actor *actor, struct object *objp,
{
...
return SUCCESS;
}
wrapped around the code that used to be in one of the
else if(strcmp(verb, "xxx") == 0)
{
...
}
blocks.
(Although the copy of commands.c in the week6 subdirectory reflects this structure, you probably won't
be able to use that file directly, because it does not reflect all of the many changes and additions we've been
making to commands.c over the past several weeks.)
The integer code SUCCESS that each of these functions is returning isn't really used yet, so you don't have to
worry about what it's for. (Right now, docommand is ignoring the return value of the called command
function.)
The one command function that doesn't quite fit the pattern is dircmd, because it's called for any of the
direction commands, and has to take another look at the command verb to see what direction it represents:
static int
dircmd(struct actor *player, struct object *objp,
{
Assignment #6
struct room *roomp = player->location;

char *verb = cmd->verb;
int dir;
if(strcmp(verb, "n") == 0 || strcmp(verb, "north") == 0)
dir = NORTH;
else if(strcmp(verb, "s") == 0 || strcmp(verb, "south") == 0)
dir = SOUTH;
else if(strcmp(verb, "e") == 0 || strcmp(verb, "east") == 0)
dir = EAST;
else if(strcmp(verb, "w") == 0 || strcmp(verb, "west") == 0)
dir = WEST;
else if(strcmp(verb, "ne") == 0 || strcmp(verb, "northeast") ==
dir = NORTHEAST;
else if(strcmp(verb, "se") == 0 || strcmp(verb, "southeast") ==
dir = SOUTHEAST;
else if(strcmp(verb, "nw") == 0 || strcmp(verb, "northwest") ==
dir = NORTHWEST;
else if(strcmp(verb, "sw") == 0 || strcmp(verb, "southwest") ==
dir = SOUTHWEST;
else if(strcmp(verb, "up") == 0)
dir = UP;
else if(strcmp(verb, "down") == 0)
dir = DOWN;
else
return ERROR;
0)
0)
0)
0)
if(roomp == NULL)
{
printf("Where are you?\n");
return ERROR;
}
/* If the exit does not exist, or if gotoroom() fails,
/* the player can't go that way.
*/
*/
if(roomp->exits[dir] == NULL || !gotoroom(player, roomp->exits[dir]))

{
printf("You can't go that way.\n");
return ERROR;
}
/* player now in new room */
listroom(player);
return SUCCESS;
}
This function also returns another status code, ERROR (which is also ignored, for now, by docommand). If
Assignment #6
your copy of game.h doesn't include them, you can put the lines
#define ERROR
0
#define SUCCESS 1
in it to #define these values.
2. Modify io.c to recognize a new entrypoint line in the data file, which will list the name of the room
which the actor should be placed in to start. What if the entrypoint line precedes the description of the
room it names? How should the code in io.c record the entry-point room so that getentryroom in
rooms.c can use it? (There are many answers to these questions; pick one, and implement it.)
3. Another part of the game that has the same sort of large, if/strcmp/else structure as did docommand in
commands.c is parsedatafile in io.c. Think about what it would take to use the cmdtab structure,
and the findcmd function, to streamline the code that processes lines in the data file.
(If you try it, you'll probably run into some difficulties which will make it rather hard to finish, so save a
copy of io.c before you start! In the answers handed out next week, I'll describe the difficulties, and outline
how they might be circumvented.)
Exercise 2. Modify io.c to recognize a new entrypoint line in the data file.
Here is the code I added to io.c. Rather than calling findroom right away, and face the possibility of the
room not being defined yet, I chose to just stash the name of the room away (as a string), and then try
looking it up after the data file had been completely read. I added this declaration at the beginning of
parsedatafile:
char *entryroom = NULL;
I added this case to the main if/else chain in parsedatafile:
else if(strcmp(av[0], "entrypoint") == 0)
{
struct room *roomp;
if(ac < 2)
{
fprintf(stderr, "missing entry room name\n");
continue;
}
/* don't bother to look up yet; just save name */
entryroom = chkstrdup(av[1]);
}
I added this code at the end of parsedatafile:
if(entryroom != NULL)
{
struct room *roomp = findroom(entryroom);
if(roomp != NULL)
setentryroom(roomp);
else
fprintf(stderr, "can't find entry room %s\n", entryroom);
}
Finally, I added the new setentryroom function in rooms.c:
void
setentryroom(struct room *roomp)
{
entryroom = roomp;
}
Exercise 3. Think about what it would take to use the cmdtab structure, and the findcmd function, to
streamline the code that processes lines in the data file.
It would initially seem straightforward to take each case in the if/else chain, break it out to a separate
function, build a table (an array of struct cmdtab) linking the first words on lines in the data file to the
new functions for parsing those lines, and finally call findcmd (after reading each line) to decide which
function to call.
The first problem you might face, though, would be breaking up the old parsedatafile function. It
contains a number of local variables (especially currentroom and currentobject) which several of
the parsing cases need access to. Once you broke those parsing cases out into separate functions, they'd need
access to the (formerly local) variables somehow. You'd probably have to make them global, although you
could restrict them to the source file io.c by declaring them static.
The next problem you'd face would be deciding which information to pass to the broken-out functions. Most
of them use the av array to inspect the various arguments and other words which appeared on the line they're
parsing, but for at least one way I wrote the long description reading code, it needed to use the original copy
of the complete data file input line (that is, the line array), the copy that hadn't been broken apart by
getwords. The obvious information to pass to each parsing function is the ac count and the av array
(which are, not coincidentally, quite similar to the argc and argv with which main is traditionally called).
Would you have to pass along the unbroken line to all functions, for the benefit of just the one or two that
needed it? (Remember, all the functions must accept the same argument list, because one function call, using
the function pointer in the func field of the matching struct cmdtab entry, will be calling all of them.)
Finally, once you decided what information to pass to each individual line parsing function, you'd discover
(if you've got strict prototype checking turned on in your compiler, and if you used cmdtab.c and
cmdtab.h as they appeared in last week's handout), that the compiler doesn't want you to create an array of
struct cmdtab with the func fields pointing at your shiny new functions, because the func field
(again, as it appeared in last week's handout), is specifically a pointer to a function accepting a struct
actor *, a struct object *, and a struct sentence *, which is almost certainly not the set of
data you chose to pass to each data file parsing function. You'd either have to revert the declaration of the
func field to
int (*func)();
(as it was on the disk) and turn off strict prototype checking, or write a second whole version of struct
cmdtab and a second whole version of findcmd that used func fields with a different prototype, or play
some (fairly ugly) games with function pointer casts.
Assignment #7
Assignment #7
Assignment #7
Handouts:
Assignment #7
Exercises:
1. Adding new command verbs to the game quickly becomes cumbersome. As long as they all go into commands.c,
they can potentially be invoked on any object and with any tool. So if a new command is (or ought to be) specific to
some tool or direct object, the code to implement it has to spend most of its time verifying that the overcreative player
isn't trying to apply the new command verb in some wildly imaginative but utterly inappropriate way (``sharpen gum
with feather''; ``pierce idea with boot''; etc.). It would be far better, in several respects, if scraps of code, for
implementing certain command verbs, could be attached directly to objects, so that they would only be invoked if
those objects were involved (and, additionally, so that the same verb might possibly perform different actions if
applied to different objects). The notion of attaching code to data structures is one of the fundamentals of Object
Oriented Programming, and it's a curious coincidence (or is it?) that the data structures which in this program we'd
like to attach code to are in fact what we call `òbjects.''
We're going to add a new field, of type pointer-to-function, to our struct object. This field will contain a
pointer to a per-object function which handles one or more command verbs unique to that object. (If an object has no
unique verbs it wishes to process, its function pointer will be null.) Here is the new field for struct object in
game.h:
int (*func)(struct actor *, struct object *, struct sentence *);
Notice that func takes the same arguments as all the other command functions which we split commands.c up into
last week.
If an object contains some custom verb-handling code, it will be called from docommand in commands.c. Before
trying all of the default commands, it first sees if either of the two objects which a command might reference (the
direct object, or the object of the preposition) has its own function. If so, it calls that (those) function(s).
Now, those functions might or might not know how to handle the particular verb the user typed. If not, control should
flow through to the default, global command functions. (For example, you can still take or drop or examine just
about any object, even if it has its own special code for when you try to do other things with it, and the object's
function shouldn't have to--and couldn't--repeat all the other code for all the other verbs.) To coordinate things, then,
each function that tries to implement one of the user's commands can return various status codes.
There are four status codes (which go in game.h):
#define FAILURE
/* command completed unsuccessfully */
Assignment #7
#define SUCCESS
#define CONTINUE
#define ERROR
1
2
3
/* command completed successfully */

/* command not completed */
/* internal error */
FAILURE means that the command completed, but unsuccessfully (the player couldn't do what he tried to do).
SUCCESS means that the command completed, successfully. CONTINUE means that the the function which was just
called did not implement the command, and that the game system (i.e. the docommand function) should keep trying,
by calling other functions (if any). Finally, ERROR indicates an internal error of some kind.
Here is the new code for docommand. This goes at the top of the function, before it calls findcmd:
if(cmd->object != NULL && cmd->object->func != NULL)
{
r = (*cmd->object->func)(player, cmd->object, cmd);
if(r != CONTINUE)
return r;
}
if(cmd->xobject != NULL && cmd->xobject->func != NULL)
{
r = (*cmd->xobject->func)(player, cmd->xobject, cmd);
if(r != CONTINUE)
return r;
}
You'll also need to add the line

objp->func = NULL;
to the newobject function in object.c.
Next we'll need some actual functions (special-purpose ones) for various objects to use. Here are a few examples:
int hammerfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "break") != 0)
return CONTINUE;
if(objp != cmd->xobject)
return CONTINUE;
if(cmd->object == NULL)
{
return FAILURE;
}
{
return FAILURE;
Assignment #7
}
setattr(cmd->object, "broken");
printf("Oh, dear. Now the %s is broken.\n", cmd->object->name);
return SUCCESS;
}
int toolfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "fix") != 0)
return CONTINUE;
return CONTINUE;
{
printf("You must tell me what to fix.\n");
return FAILURE;
}
if(!hasattr(cmd->object, "broken"))
{
printf("The %s is not broken.\n", cmd->object->name);
return FAILURE;
}
{
return FAILURE;
}
unsetattr(cmd->object, "broken");
printf("Somehow you manage to fix the %s.\n", cmd->object->name);
return SUCCESS;
}
int cutfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "cut") != 0)
return CONTINUE;
return CONTINUE;
{
return FAILURE;
}
{
Assignment #7

return FAILURE;
}
if(!hasattr(cmd->object, "soft"))
{
printf("I don't think you can cut the %s with the %s.\n",
cmd->object->name, cmd->xobject->name);
return FAILURE;
}
printf("The %s is now cut in two.\n", cmd->object->name);
return SUCCESS;
}
int playfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "play") != 0)
return CONTINUE;
if(objp != cmd->object)
return CONTINUE;
if(!hasattr(cmd->object, "immobile"))
{
if(!contains(player->contents, cmd->object))
{
printf("You don't have the %s.\n", cmd->object->name);
return FAILURE;
}
}
printf("You're not quite ready for Carnegie hall,\n");
printf("but you do manage to force out a recognizable tune.\n");
return SUCCESS;
}
I put these functions in a new source file, objfuncs.c. (It is not on the disk.)
The first three of these are basically simplifications of the breakcmd, fixcmd, and cutcmd functions (or, at any
rate, the code for ``break'', ``fix'', and ``cut'') from commands.c. Since these functions will be attached to certain
objects, and called only when those objects are onvolved in the command, these functions do not have to do quite so
much checking. For example, hammerfunc does not have to make sure that the implement being used is heavy,
because hammerfunc will only be attached to the hammer (or perhaps to other objects which it might be appropriate
to smash things with).
However, there are two new things these functions do need to check. Since they will be called first, before the default
functions in commands.c, they must defer to those default functions for most verbs. They must check to see that
they're being called for a verb they recognize, and if not, return the status code CONTINUE indicating that some other
code down the line should keep trying. Also, these functions can be called when the object they're attached to is either
the direct object of the sentence or the object of a preposition. When one of these functions is called for an object
Assignment #7
which appears as the direct object, the objp parameter is a copy of cmd->object, as before. When they're called
for objects which are objects of prepositions, however, objp is a copy of that object pointer (see the new code for
docommand above). So these commands must typically check that they're being used in the way they expect--for
example, if hammerfunc didn't check to make sure that it was being used as a prepositional object, it might
incorrectly attempt to handle a nonsense sentence like
break hammer
The next step is to hook these functions up to the relevant objects. To do that, we'll need to invent a new object
descriptor line for the data file, and add some new code to readdatafile in io.c to parse it. We'll rig it up so
that we can say
object hammer
attribute heavy
func hmmerfunc
(We'll keep the attribute ``heavy'' on the hammer, because it might be useful elsewhere, although we won't be needing
it for the old breakcmd function any more.) The new code for io.c is pretty simple, but before we can write it, we
have to remember that there's a significant distinction between the name by which we know a function and the
internal address, or function pointer, by which the compiler knows it. When we write
func hammerfunc
in the data file, it's obvious to us that we want the hammer object's func pointer to be hooked up to hammerfunc,
but it is not at all obvious to the compiler. In particular, the compiler is not going to be looking at our data file at all!
Code in io.c is going to be looking at the data file, and it's going to be doing it as the program is running, well after
the compiler has finished its work. So we're going to have to ``play compiler'' just a bit, to match up the name of a
function with the function itself.
This will actually be rather simple to do, because matching up a name (i.e. a string) with a function is exactly what the
struct cmdtab structure and findcmd function do. So if we build a second array of cmdtab structures, holding
the names and function pointers for functions which it's appropriate to attach to objects, all we have to add to
readdatafile is this scrap of code:
else if(strcmp(av[0], "func") == 0)
{
struct cmdtab *cmdtp = findcmd(av[1], objfuncs, nobjfuncs);
if(cmdtp != NULL)
currentobject->func = cmdtp->func;
else
fprintf(stderr, "unknown object func %s\n", av[1]);
}
The new array of cmdtab structures is called objfuncs. I chose to define it at the end of the new source file
objfuncs.c:
struct cmdtab objfuncs[] =
{
"hammerfunc",
hammerfunc,
"toolfunc",
toolfunc,
Assignment #7
"cutfunc",
"playfunc",
};
cutfunc,
playfunc,
#define Sizeofarray(a) (sizeof(a) / sizeof(a[0]))

int nobjfuncs = Sizeofarray(objfuncs);
So that readdatafile can access the objfuncs array, we need these two lines at the top of io.c:
extern struct cmdtab objfuncs[];
extern int nobjfuncs;
An explanation of the nobjfuncs variable is in order. When we called findcmd in commands.c to search our
main list of commends, the call looked like
cmdtp = findcmd(cmd->verb, commands, Sizeofarray(commands));
But our little Sizeofarray() macro uses sizeof, and sizeof works (rather obviously) only for objects which
the compiler knows the size of. In io.c, where we have
extern struct cmdtab objfuncs[];
the code sizeof(objfuncs) would not work, because in that source file, all the compiler knows about
objfuncs is that it is an array which is defined (and which therefore has its size set) somewhere else. So we need to
do the computation of the number of elements in the array within the source file objfuncs.c where the array is
defined, and store this number in a second global variable, nobjfuncs, so that code in io.c can get its hands on it.
Having made these changes, you should be able to add the line
func hammerfunc
to the description of the hammer in dungeon.dat, and the line
func toolfunc
to the description of the pliers or some other tool-like object, and the line
func cutfunc
to the description of the knife or some other sharp object. You can also add the lines
object violin
func playfunc
object end
object piano
attribute immobile
func playfunc
object end
Assignment #7
to try out the sample ``play'' function. Notice that playfunc makes you pick up the violin before playing it, but
since the piano is immobile, you can just walk up to it and start playing.
2. Rewrite the lockcmd and unlockcmd functions (from Assignment 4) as object functions.
3. Invent some more object functions and some objects for them to be attached to. Examples: weapons, other tools,
magic wands...
4. Since object functions (if present) are called before the default functions in commands.c, they can not only
supplement the commands in commands.c, they can also override (on a per-object basis) the default behavior of an
existing verb. Invent a shower object. (If your dungeon doesn't already have a bathroom, perhaps you'll need to add
that, too.) Rig it up so that if the player types
take shower
the game will print neither ``Taken'' nor ``The shower cannot be picked up'' but rather something cute like `Àfter
spending 20 luxurious minutes in the shower, singing three full-length show tunes, and using up all the hot water, you
emerge clean and refreshed.''
5. Since per-object functions are called before default functions, and since they can defer to those default functions by
returning CONTINUE, another kind of customization is possible. A per-object function can recognize a verb, do some
processing based upon it, but then return CONTINUE, so that the default machinery will also act. Implement a magic
sword object so that when the player picks it up, the messages `Às you pick it up, the sword briefly glows blue'' and
``Taken'' are printed. The first message is printed by the sword's custom function, and the second one by the normal
``take'' machinery in commands.c.
6. Implement a container of radioactive waste. Arrange that if the player tries to pick it up, the message ``the container is
far too dangerous to pick up'' is printed, unless the player is wearing a Hazmat suit, in which case the container can be
picked up normally. Also implement the Hazmat suit object. (You can either just have the player pick up the Hazmat
suit, or for extra credit, arrange that the Hazmat suit implement a custom ``wear'' verb.)
Exercise 2. Rewrite the lockcmd and unlockcmd functions as object functions.

I stated this problem slightly incorrectly. You only need to write one function, which will recognize and implement both the
verbs ``lock'' and `ùnlock''. You can write the function in two ways, one way so that it can be attached to objects that can
be locked and unlocked, and one way so that it can be attached to objects like keys that do the locking and unlocking. Here
is the function written so as to be attachable to lockable objects:
int lockfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "lock") != 0 && strcmp(cmd->verb, "unlock") != 0)
return CONTINUE;
return CONTINUE;
{
printf("You must tell me what to %s with.\n", cmd->verb);
return FAILURE;
}
if(!hasattr(cmd->object, "lock"))
{
printf("You can't lock the %s.\n", cmd->object->name);
return FAILURE;
}
if(Isopen(cmd->object))
{
printf("The %s is open.\n", cmd->object->name);
return FAILURE;
}
{
return FAILURE;
}
if(!hasattr(cmd->xobject, "key"))
{
printf("The %s won't %s the %s.\n",
cmd->xobject->name, cmd->verb, objp->name);
return FAILURE;
}
if(strcmp(cmd->verb, "lock") == 0)
{
if(hasattr(cmd->object, "locked"))
{
printf("The %s is already locked.\n", cmd->object->name);
return FAILURE;
}
setattr(cmd->object, "locked");
printf("The %s is now locked.\n", cmd->object->name);
return SUCCESS;
}
else if(strcmp(cmd->verb, "unlock") == 0)
{
if(!hasattr(cmd->object, "locked"))
{
printf("The %s is already unlocked.\n", cmd->object->name);
return FAILURE;
}
unsetattr(cmd->object, "locked");
printf("The %s is now unlocked.\n", cmd->object->name);
return SUCCESS;
}
/* shouldn't get here */
return CONTINUE;
}
Here it is written so as to be attachable to keylike objects:
int keyfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "lock") != 0 && strcmp(cmd->verb, "unlock") != 0)
return CONTINUE;
return CONTINUE;
{
printf("You must tell me what to %s.\n", cmd->verb);
return FAILURE;
}
if(!hasattr(cmd->object, "lock"))
{
printf("You can't %s the %s.\n", cmd->verb, cmd->object->name);
return FAILURE;
}
if(Isopen(cmd->object))
{
printf("The %s is open.\n", cmd->object->name);
return FAILURE;
}
{
return FAILURE;
}
if(strcmp(cmd->verb, "lock") == 0)
{
if(hasattr(cmd->object, "locked"))
{
printf("The %s is already locked.\n", cmd->object->name);
return FAILURE;
}
setattr(cmd->object, "locked");
printf("The %s is now locked.\n", cmd->object->name);
return SUCCESS;
}
else if(strcmp(cmd->verb, "unlock") == 0)
{
if(!hasattr(cmd->object, "locked"))
{
printf("The %s is already unlocked.\n", cmd->object->name);
return FAILURE;
}
unsetattr(cmd->object, "locked");
printf("The %s is now unlocked.\n", cmd->object->name);
return SUCCESS;
}
/* shouldn't get here */
return CONTINUE;
}
Exercise 4. Invent a (silly) shower object.
int showerfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "take") != 0)
return CONTINUE;
return CONTINUE;
printf("After spending 20 luxurious minutes in the shower,\n");
printf("singing three full-length show tunes, and using up\n");
printf("all the hot water, you emerge clean and refreshed.\n");

return SUCCESS;
}
Exercise 5. Implement a magic sword object.
int swordfunc(struct actor *player, struct object *objp,
{
return CONTINUE;
return CONTINUE;
printf("As you pick it up, the sword briefly glows blue.\n");
return CONTINUE;
}
Exercise 6. Implement a container of radioactive waste.
Here are sample descriptions for dungeon.dat:
object barrel
desc It is a large yellow barrel with a funny symbol and
the word "HAZARDOUS" on it.
func wastebarrelfunc
object suit
desc It is a hooded suit made out of some strange plasticlike yellow fabric.
func hazmatsuitfunc
Here is the waste barrel function:
int wastebarrelfunc(struct actor *player, struct object *objp,
{
struct object *hazmatsuit;
return CONTINUE;
return CONTINUE;
hazmatsuit = findobject(player, "suit");
if(hazmatsuit == NULL || !contains(player->contents, hazmatsuit))
{
printf("The container is far too dangerous to pick up.\n");
return FAILURE;
}
return CONTINUE;
}
Here is a function so that you can wear the Hazmat suit:

int hazmatsuitfunc(struct actor *player, struct object *objp,
{
if(strcmp(cmd->verb, "wear") != 0)
return CONTINUE;
return CONTINUE;
/* implicitly pick up, if necessary */
if(!contains(player->contents, cmd->object))
{
if(!takeobject(player, cmd->object))
return FAILURE;
}
printf("You feel much safer wearing the bright yellow Hazmat suit.\n");
return SUCCESS;
}
Assignment #8
Assignment #8
Assignment #8
This handout presents several more potential additions and improvements to the game, not couched so much in terms of an
assignment, but more as food for thought.
1. Last week we added a func field to struct object, so that an object could contain a pointer to a function
implementing actions specific to that object. Another ability this gives us is to write new, object-specific code which
is intended to be fired up not directly, in response to a user's typed command, but rather, indirectly, at other spots in
the game where we're working with objects. We'll do this by letting objects optionally define special ``verbs'' (with
names beginning with periods, by convention) which we'll ``call'' when we need to.
Suppose we wanted the name or description of an object (printed in the listing of a room's contents or the actor's
possessions, or in response to the `èxamine'' command) to vary, depending on the state of the object. (We've already
done that, in a crude way, by having the `èxamine'' command look at a few of the object attributes we've defined.)
For example, we might want the player to be able to find an object which is described only as
a wad of rubberized fabric
until the player also finds the `àir pump'' object, and thinks to type
inflate wad with pump
at which point the object will be described as
an inflatable boat
To do this, we'll let an object define a pseudoverb ``.list''. Then, any time we would have printed the object's name
field in a list of objects, we'll let it print its own name, if it wants to. For example, the hypothetical rubber boat might
have object-specific code like this:
int boatfunc(struct actor *actp, struct object *objp,
{
if(strcmp(cmd->verb, ".list") == 0)
{
if(hasattr(objp, "inflated"))
printf("an inflatable boat");
else
printf("a wad of rubberized fabric");
return SUCCESS;
}
return CONTINUE;
}
Assignment #8
When we're printing a list of objects, we'll have to ``call'' the ``.list'' command (more specifically, call the object's
func, if it has one, passing a command verb of ``.list''). Setting up one of these special-purpose commands as if it
were a ``sentence'' the user typed will be a tiny bit of a nuisance, so we'll write an intermediate function to do it:
int objcall(struct actor *actp, struct object *objp, char *command)
{
struct sentence cmd;
if(objp->func == NULL)
return ERROR;
cmd.verb = command;
cmd.object = objp;
cmd.preposition = NULL;
cmd.xobject = NULL;
return (*objp->func)(actp, objp, &cmd);
}
Now we can rewrite the object-listing code in object.c. Where it used to say something like
it can now say
/* if obj has a function, try letting it list itself */
if(lp->func != NULL && objcall(actp, lp, ".list") == SUCCESS)
printf("\n");
else
If objcall returns SUCCESS, the object has printed itself, and all we have to do is append the trailing newline.
Otherwise, we print the object's name, as before.
(This modification to the object-listing code has one problem. If the listobjects function in object.c is going
to call objcall in this way, it needs a pointer to the actor, but it doesn't have it. So we're going to have to rewrite
the listobjects function, and each of the places it's called, to pass the actor pointer as an additional argument. As
it happens, two of the other improvements suggested in this assignment end up requiring exactly the same change. It
turns out that listobjects probably should have been accepting the actor pointer as an extra argument all along, if
there are so many good reasons why it ends up needing it.)
There are several other situations where we can use custom, `ìnternal'' functions like these. We can arrange that an
object also be able to print its long description, by interpreting a ``.describe'' verb. (We'd rewrite the `èxamine''
command to print an object's desc field only if the object didn't succeed at performing the ``.describe'' verb.) We
could also implement customized ``hooks'' into the action of picking up an object. In last week's assignment (exercise
5), we implemented an object with a custom ``take'' command which printed some text but then returned CONTINUE,
so that the rest of the default ``take'' machinery would handle the actual picking up of the object. But we might want
the custom ``take'' action (and any messages it prints) to happen after the default machinery has performed most of
the picking up of the object. To do this, we could define a new special verb ``.take2'', and rewrite the ``take'' code in
commands.c to call .take2 at the very end, after the call to takeobject.
2. When we started writing object-specific functions last week, we placed them in a file called objfuncs.c. Although
(as we've begun to see) it can be a fantastically powerful ability to allow objects to ``point at'' arbitrarily-complicated
functions written just for them, in practice, writing these functions will still be a nuisance. Suppose we've just been
playing with the data file, and we've come up with a fun new object to put in the dungeon for the player to play with,
and the object needs some new, special-purpose code to make it do its stuff. We'd have to write that code (in C, of
Assignment #8
course), and then recompile the whole game, before we could hook the new function up to the new object. The whole
point of the data file is that the information about objects (and the rest of the dungeon) can be read out of it; we
stopped editing the source files to change the dungeon way back during the first week, when we introduced the data
file in the first place.
So, another major leap will be to put actual code (not just a pointer to a function) in objects. This code will, however,
not be written in C, for a variety of reasons. (A sufficient reason is that there's no easy way to get the C compiler to
compile scraps of source code we're reading from a data file and to attach the resultant object code to some data
structures in the already-running program.) So, we'll take the (seemingly big, but it's not that bad) step of defining our
own little miniature language for describing what objects can do, writing an interpreter (not a compiler) which reads
and implements this language, and then writing the code scraps for objects in the little language.
Therefore, we'll add another field to struct object, right next to the func field we added last week:
char *script;
For objects which contain these new, ``scripted'' functions, func will be a pointer to the script-interpreting function
(one function, the same function, for all objects which are scripted), and the script field will contain the text of the
script to be interpreted.
First, a description of the little language. It is optimized for the kinds of things that we want our object functions (at
least the simple ones) to be able to do: check attributes of the tool or direct object, set attributes of the tool or direct
object, and print messages. Here is a sample script:
entry "break"
chktoolqual "heavy"
csetobjstate "broken"
message "It is broken."
return 1
This is some code which might be suitable for the hammer object. The entry line says that this is code for the verb
``break''. (There can potentially be many entry lines, if a tool can be used in multiple ways.) The chktoolqual
line makes sure that the tool has the given attribute, and prints an appropriate message (and fails) if it does not. The
csetobjstate conditionally sets the state of the direct object, unless the object already has that state, in which
case it prints a message like ``The x is already broken'', and fails. Finally, the message line prints a message, and the
return line returns.
The code for the interpreter which will execute these little scripts is too large to print out, but it's in the week8
subdirectory of the distributed source code. It interfaces with the rest of the game in two places. We must arrange for
the interpreter to be called at the right time(s), but that's done already: if an object's func pointer points to the new
interpreter function, the code we added to docommand last week will call it. But we must also be able to read in a
script (from the data file) along with the rest of a description of an object. Here is a new case for parsedatafile:
else if(strcmp(av[0], "script") == 0)
{
/* XXX cumbersome un-getwords required */
int i, l = 0;
continue;
for(i = 1; i < ac; i++)
l += strlen(av[i]) + 1;
Assignment #8
currentobject->script = chkmalloc(l);
*currentobject->script = '\0';
for(i = 1; i < ac; i++)
{
if(i > 1)
strcat(currentobject->script, " ");
strcat(currentobject->script, av[i]);
}
currentobject->func = interp;
}
The script is read all from one line (which is a nuisance, but this is a preliminary implementation). The long
description reading code faced the same problem; this code uses a different solution, by un-doing the action of
getwords and rebuilding a single string (in malloc'ed memory) which the script field can point to.
Integrating the interpreter code requires paying attention to a few other details. You'll need to add the prototype
extern int interp(struct actor *, struct object *, struct sentence *);
to game.h if it's not there already, and the line
objp->script = NULL;
to the newobject function in object.c. Also, for the moment, we'll have to pay attention to the numeric values
of the status codes (SUCCESS, FAILURE, CONTINUE, ERROR) we defined last week, because the simple interpreter
doesn't know how to handle the symbolic names. Instead, we'll have to say return 1 for SUCCESS, etc.
3. This game is a prime candidate for moving from C to C++. Actors and rooms are really special cases of objects:
Rooms contain objects, just like actors and container objects do, and rooms also contain actors. Rather than having
separate structures for rooms, actors, and objects, we'd like to say things like `à room is just like an object, but it also
has exits.'' If we rigged it up right (and cleaned up a number of messes which we've perpetrated along the way
because of the fact that we hadn't been using this structure) we could have a single piece of code which would transfer
objects between rooms and actors (``take''), between actors and rooms (``drop''), between actors and (container)
objects (``put in'') and which would even transfer actors between rooms (when the actor moved from one room to
another). This function (and much of the rest of the game) could operate on generic `òbjects,'' without worrying
about whether they were simple objects, actors, or rooms. Only those portions of the game specific to operating on
actors or rooms would look at the additional fields differentiating an actor or room from an object.
The notion that some data structures are extensions of others, that an ``x'' is just like a ``y'' except for some extra stuff,
is the another cornerstone (perhaps the cornerstone) of Object-Oriented Programming. The formal term for this idea is
inheritance, and the data structures are usually spoken of as being objects, which is precisely what the word `òbject''
is doing in `Òbject-Oriented Programming.'' (It's again more than an coincidence, but rather an indication that our
game is a perfect application of C++ or another object-oriented language, that we were already calling our
fundamental data structures `òbjects.'')
The changes to the game to let it use C++ are extensive, and I'm not going to try to present them all here. (I've
completed many of the changes, however, and you can find them in subdirectories of the week8 directory associated
with the on-line web pages for this class.) The basic idea is that our old struct object is no longer just a
structure; it is a class:
class object
Assignment #8
{
public:
object(const char *);
~object(void);
char name[MAXNAME];
struct list *attrs;
object *contents;
object *container;
object *lnext;
char *desc;
};
/* next in list of contained objects */

/* (i.e. in this object's container) */
(Among other things, all objects now contain pointers back to their containers, analogous to the way struct
actor used to contain a pointer back to its location.)
We write a constructor for new instances of this class, replacing our old newobject function in object.c:
object::object(const char *newname)
{
strcpy(name, newname);
lnext = NULL;
attrs = NULL;
contents = NULL;
container = NULL;
desc = NULL;
}
Wherever we used to write something like
objp = newobject(name);
we instead use the C++ new operator:
objp = new object(name);
(Actually, the only place we called newobject was in readdatafile in io.c, when setting
currentobject. It is entirely a coincidence, though not a surprising one, that the use of the C++ new operator
here looks so eerily similar to the old newobject call.)
Going back to game.h, we define the actor and room classes as being derived from class object:
class actor : public object
{
public:
actor();
};
class room : public object
{
Assignment #8
public:
room(const char *);
room *exits[NEXITS];
struct room *next;
};
/* list of all rooms */
These say that an actor is just like an object (we don't actually have any actor-specific information at the moment),
and that a room is just like an object except that it has an exits array and an extra next pointer so that we can construct
a list of all rooms.
Here is the new, general-purpose object-transferring function, for object.c:
/* transfer object from one general container to another */
transferobject(object *objp, object *newcontainer)
{
object *lp;
object *prevlp = NULL;
if(objp->container != NULL)
{
object *oldc = objp->container;
for(lp = oldc->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of old container's list */
if(lp == oldc->contents)
/* head of list */
oldc->contents = lp->lnext;
else
break;
}
prevlp = lp;
}
}
/* splice into new container's list */
if(newcontainer != NULL)
{
objp->lnext = newcontainer->contents;
newcontainer->contents = objp;
}
objp->container = newcontainer;
return TRUE;
}
Notice that the parameters are declared as being of type `òbject *''. In C++, every structure and class you declare
has its tag name implicitly defined as a typedef, so that the keyword struct or class is no longer needed in later
Assignment #8
declarations. (Theoretically, we should seek out every instance of struct object in the game and replace them
with object or perhaps class object, and similarly for every struct actor and struct room. The
compiler I'm using, the C++ version of the GNU C Compiler, namely g++, doesn't seem to be complaining about
stray struct keywords referring to what I've actually redefined as classes, but I'm not sure what the formal rules of
C++ say.)
In any case, even though transferobject looks like it's defined only for use on objects, since actors and rooms
are now also objects, we can also use transferobject to move objects to and from the actor, and even to move
the actor between rooms. That is, we can rewrite the other transfer functions from object.c and rooms.c very
simply:
takeobject(actor *actp, object *objp)
{
return transferobject(objp, actp);
}
dropobject(actor *actp, object *objp)
{
return transferobject(objp, actp->container);
}
putobject(actor *actp, object *objp, object *container)
{
return transferobject(objp, container);
}
int
gotoroom(actor *actor, room *room)
{
return transferobject(actor, room);
}
4. Another improvement which you might be interested in (and another sweeping one) would be to rewrite the game so
that several people could play it at once, over the network. Instead of one instance of struct actor, representing
one player sitting at one keyboard and viewing output on one screen, we could have arbitrarily many actor
structures (just as we now have arbitrarily many objects and rooms), with each actor representing a player sitting
somewhere on the network, typing input and receiving output over a network connection.
The changes to make a multiplayer version of the game are actually rather straightforward and self-contained, with
the glaring exception of the fact that everywhere we used to call printf to print some text to the user's screen, we
must instead call some special output function of our own which knows how to send the text to the network
connection of the appropriate player.
To represent the network connection, we'll add a file descriptor to the actor structure. If we're using Unix-style
networking, the file descriptor will just be an integer:
int remotefd;
If we've gone over to the C++ style of doing things, this means that class actor now does have something to
distinguish it from a plain object:
class actor : public object
Assignment #8
{
public:
actor();
int remotefd;
struct actor *next;
};
/* list of all actors */
(It turns out we're also going to need a list of all players, just like we need a list of all rooms.)
Then, using what we learned about variable-length argument lists in chapter 25 of the class notes, we can write an
output function which is like printf except that it writes the message to a particular player's network connection:
void output(struct actor *actp, char *msg, ...)
{
char tmpbuf[200];
/* XXX */
va_list argp;
va_start(argp, msg);
vsprintf(tmpbuf, msg, argp);
va_end(argp);
write(actp->remotefd, tmpbuf, strlen(tmpbuf));
}
We call vsprintf to ``print'' the printf-style message to a temporary string buffer, then use the low-level Unix
write function to write the string to the network connection. (This function has a significant limitation as written: if
a single message is ever more than 200 characters long, the tmpbuf array will overflow, with results which might
range from annoying to catastrophic. It's unfortunately tricky to this sort of thing right; the comment /* XXX */ is a
reminder that the tmpbuf array with its fixed size of 200 is a weakness in the program.)
The rest of the code for the multiplayer version of the game is too bulky to include here, but I've placed it in
subdirectories of the week8 directory, too. See the README file there for more information.
C Programming
the comp.lang.c FAQ List, by yours truly

the Lysator C archive
The official ISO/IEC JTC1/SC22/WG14-C home page, including C9X drafts and related material
(see also Dave Keaton's home page, which seems to contain some of the same stuff)
the notes for some C classes I teach (a sort of on-line tutorial)
J. Blustein's C Programming Language Resources
Vinit Carpenter's ``Learn C/C++ Today''
A C and C++ page by Jack Klein
other tutorials and resources listed in question 18.9 of the FAQ List (this link will break soon,
when I forget to change it; please mail me when it does)
the International Obfuscated C Code Contest
Ted Nokonechny's ``C Programming Tips''
Peter Seebach's `Ùseless C Page''
http://www.eskimo.com/~scs/C.html [22/07/2003 5:38:45 PM]
Essays
This is a (somewhat random) collection of some of the longer essays I've posted to Usenet over the years,
along with a few similarly long e-mail messages.
On correctness:
comp.lang.c, 1995-08-17
On precedence vs. order of evaluation:
comp.lang.c, 1996-07-25
comp.lang.c, 1996-07-28
comp.lang.c, 2001-05-12
On undefined behavior:
comp.lang.c, 1995-03-11
comp.lang.c, 1995-03-21
comp.lang.c, 1998-11-05
On strings in C, and assigning them, and the choice between char * and char [] for holding them:
comp.lang.c, 1998-09-26
On the difference between prefix and postfix ++ in C:
comp.lang.c, 1999-01-18
On the eternal question of whether "void main()" is acceptable, or whether main() must be
declared as returning int:
comp.lang.c, 1996-08-23
comp.lang.c, 1999-03-01
On the obscure but fascinating "inverse varargs problem":
comp.unix.wizards/comp.lang.c, 1989-06-04
comp.lang.c, 1992-07-14
mail message, 1993-03-07
http://www.eskimo.com/~scs/readings/index2.html (1 of 2) [22/07/2003 5:38:48 PM]
Essays
On elegant vs. mundane software:
eskimo's lobby newsgroup, 1998-01-17
I probably won't always keep this index up-to-date, so if you like, you can "click here" for a guaranteed
current list of files in this directory.
[back to scs home page]
http://www.eskimo.com/~scs/readings/index2.html (2 of 2) [22/07/2003 5:38:48 PM]
Correctness
[This article was originally posted on August 17, 1995, in the midst of a thread on whether it's a good
idea to use void main() and the like.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: Re: Why the warning???
Message-ID: <DDGMr2.My3@eskimo.com>
X-Scs-References: <199508171432.HAA14633@mail.eskimo.com>
References: <00001a80+00003293@msn.com> <dqua.808547526@dqua>
<40s7l2$mcr@garuda.csulb.edu>
Date: Thu, 17 Aug 1995 14:54:37 GMT
Earlier in the thread, someone expressed a popular sentiment: "I use what works." Raise your hand if
you've ever said this. (My hand's up.) Now raise your hand if you've ever spent hours going through
some code weeding out gratuitous nonportabilities so that you could get it to compile under a compiler or
operating system other than the original author used. (My hand's still up.)
There's always a tradeoff to be made between portability and convenience. Almost no code is truly
portable, even code that tries very hard to be, so it's reasonable to accept certain nonportabilities in one's
code, in the interests of expediency. But unilaterally saying "I use what works" is an extremely risky
attitude. It's one thing to make an informed decision: to weigh the costs of being strictly portable against
the risks of using something nonportable. It's quite another thing to declare that you can't even be
bothered to think about the tradeoffs, and to throw yourself and your code (and perhaps your company) at
the mercy of future compilers you'll want to use.
Ask yourself: how long do you expect your code to last? If it's a quick hack that you don't expect to use
past next week, then most of this is moot. But if it's production program, a product, that you or someone
depends on or makes money on, you probably hope (you ought to hope) that it will be successful and
long-lived. And it's likely, even though you can't imagine it today, that some day you'll find yourself
wanting to compile it under some completely different compiler, or for some completely different
machine or operating system. And, when that day comes, the more your attitude during development has
been "I use what works," the more miserable the porting task will be.
When, for the platform and compiler you're using today, there's no alternative to a given bit of
nonportability (for instance, some operating-system-dependent interaction which C doesn't define),
there's no point spending lots of time desperately trying to make it portable. You can spend the time
porting that bit of the code when and if the time comes. (You might spend a bit of time today making
sure that the nonportable bit is cleanly isolated.)
When, on the other hand, there's a perfectly valid, equally easy, portable alternative, such as using int
main() in preference to void main(), or /* */ comments in preference to // comments, it really
seems foolish not to avail yourself of as many Standards as you can.
http://www.eskimo.com/~scs/readings/correctness.950817.html (1 of 2) [22/07/2003 5:38:50 PM]
Correctness
You can't imagine what kind of compiler you might suddenly find yourself wanting to use a year from
now. Neither can I. But one thing we do know is that if it is a C compiler, it is extremely likely to comply
as closely as it can to the ANSI/ISO C Standard. So the more closely we can make the code we write
today conform to that Standard, the fewer problems we are likely to have down the road.
void main() and // comments aren't perfect examples, because // comments (for no good reason
that I know of) are likely to be in the next major revision of the C Standard, and void main() works
under most compilers and is likely to remain so (even though it doesn't have to) if for no other reason
than that so terribly much code uses it. One could, I suppose, make an informed decision to keep using
those two, if one really thought they were preferable to the strictly-portable alternatives. But in the more
general case -- for the other errors and nonportabilities your code is all too likely to contain if all you care
about is that it works today -- someone, someday, is likely to curse you if you've made their life
miserable, or to sing your praises if you've expended a modicum of thought and effort towards giving
your code a fighting chance of being portable down the road.
Steve Summit
scs@eskimo.com
http://www.eskimo.com/~scs/readings/correctness.950817.html (2 of 2) [22/07/2003 5:38:50 PM]
precedence vs. order of evaluation, #3
[This article was originally posted on July 26, 1996, in the midst of a discussion on precedence and order of
evaluation. I have corrected one or two errors which appeared in the original posting.]
Subject: precedence vs. order of evaluation (was: HELP: Comma operator)
Message-ID: <1996Jul25.1157.scs.0001@eskimo.com>
Lines: 434
References: <BRAD.96Jul23005938@rosalyn.cs.washington.edu>
<01bb78bb$62355360$87ee6fce@timpent.airshields.com>
<TANMOY.96Jul23132821@qcd.lanl.gov>
<01bb796a$4e873a40$87ee6fce@timpent.airshields.com>
<01bb798a$11913f80$87ee6fce@timpent.airshields.com>
Date: Thu, 25 Jul 1996 18:57:00 GMT
Tim Behrendsen (tim@airshields.com) wrote:
>>>
>>>
>>>
>>>
I think I'm going to have to disagree with you; what do you think
side effects are? The assignment operator is just an operator with
it's own precedence. Side effects come into play based on the order
of evaluation, which is *purely* defined by the precedence rules.
This is a common misconception, but it's quite false. Order of evaluation is affected by much more than
"precedence rules"; precedence affects order of evaluation much less than many people think.
> There is only one thing that is relevent when you are talking about
> expression evaluation, and that is the precedence rules. Everything else
> falls out from those.
There's the misconception again. Order of evaluation does not "fall completely out of" precedence rules. It's true
that precedence seems to have something to do with expression evaluation; in the expression
a + b * c
we frequently say things like, "The multiplication happens before the addition, because * has higher precedence
than +." But it turns out this is a risky thing to say; because it gives the listener the impression that precedence
dictates order of evaluation, which as we'll see it does not. In my classes, I try desperately not to use the words
"happens before" when explaining operator precedence, although I'm afraid I rarely succeed.
Let's imagine that we had some kind of omniscient processor, which could look at an expression and instantly give
its value, without doing step-by-step evaluations. If we gave it the expression
1 + 2 * 3
it would give us the result 7. Precedence tells us why it gave us that answer and not 9. That is, even if thoughts of
"order of evaluation" never enter our heads, precedence is an important and independent concept, because it tells
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (1 of 8) [22/07/2003 5:38:54 PM]
us what an expression means.

Another way to think about precedence is that it controls how the compiler parses expressions. The expression
1 + 2 * 3
results in the "parse tree"
+
/ \
1
*
/ \
2
Once it has determined the parse tree, the compiler sets about emitting instructions (in some order) which will
implement that parse tree. The shape of the tree may determine the ordering of some of the instructions, and
precedence affects the shape of the tree. But the order of some operations may not even be strictly determined by
the shape of the tree, let alone the precedence. So while precedence certainly has an influence on order of
evaluation, precedence does not "determine" order of evaluation. I'd say that precedence is about 40% related to
order of evaluation -- order of evaluation depends about 40% on precedence, and 60% on other things. (This 40%
figure is of course meaningless; it's just a number I pulled out of the air that "sounds right" [Barry, 1994]. The
point I'm trying to make is that precedence is not entirely disconnected from order of evaluation, but the
connection is nowhere near 100%.)
We've seen how precedence can affect or influence order of evaluation. Now let's start looking at all of the ways it
does not, or, stated another way, at all of the ways that order of evaluation is determined by things other than
precedence. (As we'll see, many of the "other things" turn out to be the whim of the compiler.)
Most of you have seen this example before, but I'll drag it out again. Suppose we have an expression containing
three function calls:
f() + g() * h()
Now, the result of calling g() will be multiplied by the result of calling h() "before" that product is added to the
result of calling f(), and indeed, precedence tells us that. But, unless we have multiple parallel processors, those
three functions f(), g(), and h() are going to be called in some order. What order will they be called in?
I don't know what order they'll be called in, and you don't know what order they'll be called in. Precedence doesn't
tell us, and in fact nothing in K&R, the ANSI/ISO C Standard, or Dan Streetmentioner's Boffo C Primer Triple
Plus will tell us. (C: The Complete Reference might tell us, but that's another story.) There's absolutely nothing
preventing the compiler from calling f() first, even though its result will be needed "last."
If you have any doubts about this, I encourage you to compile and run this little program:
#include <stdio.h>
main() { printf("%d\n", f() + g() * h()); return 0; }
f() { printf("f\n"); return 1; }

g() { printf("g\n"); return 2; }
h() { printf("h\n"); return 3; }
For the very first two compilers I tried this program with, one printed:
g
h
f
7
But the other printed:
f
g
h
7
Even if your compiler(s) prints g h f 7 "as expected," bear with me while we look at this from yet another
angle.
What do you do when the precedence is "wrong," that is, when you want to override the default precedence? You
use explicit parentheses, of course. Parentheses force the operators and operands within an expression to be
grouped in a certain way, and if you believe that precedence dictates order of evaluation, you might believe that
parentheses dictate order of evaluation, too. But they do not.
Let's go back to the f() + g() * h() example. Suppose your compiler were emitting code which called f()
first, and for some reason you wanted it to call f() last. How could you force this? Could you write
f() + (g() * h())
where the parentheses are supposed to force the stuff inside them to happen first? You could try it, but I doubt it
would make any difference. Those parentheses would tell the compiler that the result of g() was supposed to be
multipled by the result of h() before the product was added to the result of f(), but the compiler was already
going to do it that way, anyway, based on the default precedence. It would still be free to call f() first. (Indeed,
for the compiler of mine that calls f() first, it still calls f() first even if I add the extra parentheses.)
Suppose, on the other hand, that your compiler is calling g() and h() first, and f() last. What if, for some
reason, you want it to call f() first? How can you force this? If you belong to the hit-or-miss, aimless
meandering, or drug-induced hallucination schools of programming, you might try
(f()) + g() * h()
but I'll pay you (or anyone) $100 if you can show me a compiler (other than one you wrote for the purpose) that
can be forced to call f() first by putting a pair of parentheses around it like that.
In ANSI/ISO Standard C, when you care about the order in which things happen, you must take care to ensure that
order by using "sequence points." Sequence points have nothing to do with precedence, and only partly to do with
expressions; in my head they're up there next to statements. In general, when you want to control the order in
which two things happen, you make the two things separate statements, and you put one statement after the other
in the order you desire, perhaps using control flow constructs (e.g. if statements, loops, etc.) to control whether a
statement gets executed at all, or how many times. Indeed, "the end of an expression statement" is essentially one
of the defined sequence points in Standard C.
K&R C didn't have "sequence points"; Kernighan and Ritchie said that "explicit temporary variables can be used to
force a particular order of evaluation" and "intermediate results can be stored in temporary variables to ensure a
particular sequence" [K&R1 p. 49; K&R2 p. 53]. In fact, although Standard C gives us sequence points, it doesn't
give us a tool to get a hold on f() in f() + g() * h(); if we needed f() called first, we'd have to write
t = f(); t + g() * h()
or
t = f(), t + g() * h()
The most obvious sequence points in Standard C are the ones at the ends of full expressions in expression
statements, that is, as exemplified by the semicolon in the first line just above. Other sequence points are at the
comma operator (as in the second line just above), at the && and || operators, in the ?: operator, and just before a
function call. (I do not claim that this is an exhaustive list.) On those (hopefully rare) occasions when you care
about the ordering of operations within a single expression, you have to make sure that the expression contains one
or more sequence points, and in places which will in fact constrain the order to the one you want. Most of the time,
though, when you have a troublesome expression with too many ordering dependencies, the right response is the
same as that of the doctor in the old joke about the patient who complains that his hand hurts if he shakes it in a
certain way: "Well, don't do that, then." Break the expression up into separate statements, using temporary
variables if you have to, and you'll be much happier.
(I confess that this advice may be easier to give than to apply. In the article that started this thread, the poster
thought that the insertion of a sequence point or two, in the form of comma operators, should have constrained the
order of evaluation sufficiently, but unfortunately it did not, because there still weren't enough sequence points. It
didn't help that the poster thought that full parenthesization should have resolved any remaining evaluation-order
problems.)
If the f() + g() * h() example seemed artificial and hence unconvincing, let's look at a morphologically
identical but eminently realistic example. Suppose we're reading a two-byte integer from a binary data file in bigendian (most significant byte first) order. We could call fread() and then swap bytes if necessary, but that's a
cheesy solution, because if (as is conventional) we implement the "if necessary" test with an #ifdef, we've
condemned everyone who ever compiles our program to choose an appropriate setting for the #ifdef macro.
Instead, let's call getc() twice, to read the first and then the second byte, and combine them like this:
i = (firstbyte << 8) | secondbyte;
(Remember, the first byte we read is the most-significant byte, and the second byte is the least-significant.) So
could we write that as

i = (getc(ifp) << 8) | getc(ifp);
/* WRONG */
? No! We could not! We do not know which call to getc() will happen first, and we very definitely care, because
the order of those two calls is precisely what will determine whether we read the integer in big-endian or littleendian order. So, instead, we should write
i = getc(ifp) << 8;
i |= getc(ifp);
/* MSB */
/* LSB */
By writing this as two separate statements, we get a sequence point, which ensures the order of evaluation we
need. If, on the other hand, we wanted to read in little-endian order, we'd write
i = getc(ifp);
i |= getc(ifp) << 8;
/* LSB */
/* MSB */
But, to repeat, we could not write

i = (getc(ifp) << 8) | getc(ifp);
/* "big-endian", but WRONG */
i = getc(ifp) | (getc(ifp) << 8);
/* "little-endian", WRONG */
or
Both of these seem to depend (and, in fact, would depend, if we wrote them) on a left-to-right ordering of the
actual calls to getc(), which is not guaranteed. (And yes, I know that getc() is usually implemented as a
macro, which means that both of these last two expressions would in fact expand to big, complicated expressions
without any necessary function calls, but they would still not have any internal sequence points which would force
the two getc's to happen in a well-defined order.)
***
Unless you want to be a language lawyer, you don't have to memorize the definition of a sequence point or the list
of sequence points which Standard C guarantees, or even use the words "sequence point" in casual conversation.
(As you saw above, I can't remember the list off the top of my head with confidence, either, and that's because I'm
not usually interested in being a language lawyer.) What I recommend you do, instead, is develop a sense of
expressions that are "clean" versus expressions that are "ugly," and shy away from the ugly ones. ("Well, don't do
that, then.") Ugly expressions are those that have multiple assignments in them, or comma operators, or multiple
modifications of the same object, or which are so complicated and hard to understand that all the king's horses and
all of comp.std.c require at least a week to figure out what they're guaranteed to do (or not). The cleanest
expressions are those that calculate a single value, and perhaps assign it to a single location, without caring about
the order in which various sub-operations (such as interior function calls) take place.
In between the cleanest expressions and those that are unremittingly ugly are several classes of mildly-tricky
expressions which are well-defined and are useful and are commonly used and which I'm not trying to discourage
you from using. It's easier to present these by example, as exceptions to my qualifications above of ugly
expressions as those containing comma operators, ordering dependencies, or multiple assignments.
1. It's perfectly okay to depend on precedence, as long as you understand what it does and doesn't guarantee
you. Precedence tells you that in the expression a + b * c, the left-hand operand of the * operator is b,
not a + b. If you insist, precedence tells you that in that expression, "the multiplication happens before the
addition." But precedence does not tell you what order the function calls would happen in in f() + g()
* h(), and precedence would not give meaning to the expression
i++ + j * i++
/* XXX WRONG */
(In particular, you cannot say that since the multiplication happens "first," the second i++ must happen
before the first.) This last expression is, you guessed it, unremittingly ugly.
2. You can use the &&; and || operators to guarantee that thing B happens after thing A, and not at all if thing
A tells the whole story. Both
n > 0 && sum / n > 0
and
p == NULL || *p == '\0'
are perfectly valid, perfectly acceptable expressions.
3. You can use the comma operator when you have two (or more) things to do in the first or third controlling
expressions of a for() loop. There are precious few other realistic opportunities for using comma
operators; chances are, if you find yourself using comma operators anywhere else, you're doing something
tricky and are skirting the edge of ugliness. (It's no coincidence that Java, I gather, doesn't allow comma
operators anywhere but the first and third expressions of a for loop.)
4. You can use multiple ++ and -- operators in a single expression, perhaps in conjunction with an
assignment operator, as long as you're certain that the several things being modified are all distinct, and that
you aren't ever in a position of using something that you've already modified. The expressions
a[i++] = 0
and
*p++ = 0
are perfectly acceptable, because the things being modified, i and a[i] in the first expression and p and
*p in the second, are probably distinct. For the same reason, a[0] = i++ is okay, too. Expressions like
a[i++] = b[j++]
and
*p++ = *q++
are also okay, because i, j, a[i], and b[j], and p, q, *p, and *q, are probably all distinct. Finally, the
old standby
i = i + 1
is perfectly acceptable, because although it both uses and modifies i, it demonstrably uses the old value to
compute the new value.
On the other hand, the expressions
i++ * i++
/* XXX WRONG */
i = i++
/* XXX WRONG */
and
are no good, among other things because they modify i twice. And
a[i] = i++
/* XXX WRONG */
a[i++] = i
/* XXX WRONG */
and
are both wrong, because there's no telling whether i is used before it's modified, or vice versa. (If you think
that either of these somehow does guarantee whether the plain i on one side means the value of i before or
after the incrementation performed by the i++ on the other, think again.)
5. Finally, it's perfectly acceptable to write something like
i = j = 0
even though it contains multiple assignments, because again, it's an unambiguous idiom and it's clear that
two different variables are being assigned to.
Now, I'm acutely aware that these guidelines (or any like them) can never be complete, because there are an
infinite number of expressions out there, and some of them are clean and legal even though my "rules"
might seem to disallow them and my exceptions don't cover them, and others of them are unremittingly
ugly even though my "rules" don't disallow them or my exceptions seem to allow them. Somehow,
successful programmers learn to calibrate their own inner aesthetic such that the expressions which aren't
"ugly" are the ones that (a) are well-defined, and (b) are sufficient for writing real programs. I encourage all
of you to nurture such an aesthetic as well, and to learn to write programs without "ugly" expressions, rather
than wasting time trying to figure out what they do (and whether they're guaranteed to do it). If you've got
an example of a well-defined expression which you feel is acceptable but which the guidelines I've
presented seem to discourage, or of an ill-defined expression which these guidelines don't seem to
discourage, I encourage you to post it here or mail it to me; I'd love to expand and refine these guidelines.
(Equally importantly, I'd love to figure out how to word them so that they're useful and meaningful to
someone who hasn't already developed a sense for good vs. ugly expressions.)
Also, if you're still with me, I'd like to repeat the main moral of this article, which is that "precedence" is
not the same thing as "order of evaluation." The one certainly has something to do with the other (sort of
like pointers and arrays), but it's a delicate relationship to describe precisely, so unless you care to figure out
the whole story (and figure out what number I should have used above instead of 40%), you might actually
be better off if you remember the statement that "Precedence has nothing to do with order of evaluation."
This statement isn't true, of course, but it's much closer to the truth than "Precedence has everything to do
with order of evaluation," /* XXX WRONG */
which is not only false, but quite misleading.
Finally, for anyone who is still unsure on any of this, I urge you to read section 3 of the FAQ list, or chapter
3 of the FAQ list book, because there's more on this in there. (Of course, it's written by the same mope
you're reading now, so if you don't like something I've written here, I can't guarantee that you'll find relief
in the FAQ list.)
Steve Summit
scs@eskimo.com
Expressions
3. Expressions
3.1 Why doesn't the code "a[i] = i++;" work?
3.2 Under my compiler, the code "int i = 7; printf("%d\n", i++ * i++);" prints 49.
Regardless of the order of evaluation, shouldn't it print 56?
3.3 How could the code "int i = 3; i = i++;" ever give 7?
3.4 Don't precedence and parentheses dictate order of evaluation?
3.5 But what about the && and || operators?
3.8 What's a ``sequence point''?
3.9 So given "a[i] = i++;" we don't know which cell of a[] gets written to, but i does get
incremented by one.
3.12 If I'm not using the value of the expression, should I use i++ or ++i to increment a variable?
3.14 Why doesn't the code "int a = 1000, b = 1000; long int c = a * b;" work?
3.16 Can I use ?: on the left-hand side of an assignment expression?
top
http://www.eskimo.com/~scs/C-faq/s3.html [22/07/2003 5:38:56 PM]
Question 3.1
Question 3.1
Why doesn't this code:
a[i] = i++;
work?
The subexpression i++ causes a side effect--it modifies i's value--which leads to undefined behavior
since i is also referenced elsewhere in the same expression. (Note that although the language in K&R
suggests that the behavior of this expression is unspecified, the C Standard makes the stronger statement
that it is undefined--see question 11.33.)
References: K&R1 Sec. 2.12
K&R2 Sec. 2.12
ANSI Sec. 3.3
ISO Sec. 6.3
Question 11.33
Question 11.33
People seem to make a point of distinguishing between implementation-defined, unspecified, and
undefined behavior. What's the difference?
Briefly: implementation-defined means that an implementation must choose some behavior and
document it. Unspecified means that an implementation should choose some behavior, but need not
document it. Undefined means that absolutely anything might happen. In no case does the Standard
impose requirements; in the first two cases it occasionally suggests (and may require a choice from
among) a small set of likely behaviors.
Note that since the Standard imposes no requirements on the behavior of a compiler faced with an
instance of undefined behavior, the compiler can do absolutely anything. In particular, there is no
guarantee that the rest of the program will perform normally. It's perilous to think that you can tolerate
undefined behavior in a program; see question 3.2 for a relatively simple example.
If you're interested in writing portable code, you can ignore the distinctions, as you'll want to avoid code
that depends on any of the three behaviors.
See also questions 3.9, and 11.34.
ISO Sec. 3.10, Sec. 3.16, Sec. 3.17
Rationale Sec. 1.6
Question 3.2
Question 3.2
Under my compiler, the code
int i = 7;
printf("%d\n", i++ * i++);
prints 49. Regardless of the order of evaluation, shouldn't it print 56?
Although the postincrement and postdecrement operators ++ and -- perform their operations after
yielding the former value, the implication of `àfter'' is often misunderstood. It is not guaranteed that an
increment or decrement is performed immediately after giving up the previous value and before any
other part of the expression is evaluated. It is merely guaranteed that the update will be performed
sometime before the expression is considered ``finished'' (before the next ``sequence point,'' in ANSI C's
terminology; see question 3.8). In the example, the compiler chose to multiply the previous value by
itself and to perform both increments afterwards.
The behavior of code which contains multiple, ambiguous side effects has always been undefined.
(Loosely speaking, by ``multiple, ambiguous side effects'' we mean any combination of ++, --, =, +=, =, etc. in a single expression which causes the same object either to be modified twice or modified and
then inspected. This is a rough definition; see question 3.8 for a precise one, and question 11.33 for the
meaning of `ùndefined.'') Don't even try to find out how your compiler implements such things (contrary
to the ill-advised exercises in many C textbooks); as K&R wisely point out, `ìf you don't know how they
are done on various machines, that innocence may help to protect you.''
K&R2 Sec. 2.12 p. 54
ANSI Sec. 3.3
ISO Sec. 6.3
CT&P Sec. 3.7 p. 47
PCS Sec. 9.5 pp. 120-1

Question 3.8
Question 3.8
How can I understand these complex expressions? What's a ``sequence point''?
A sequence point is the point (at the end of a full expression, or at the ||, &&, ?:, or comma operators,
or just before a function call) at which the dust has settled and all side effects are guaranteed to be
complete. The ANSI/ISO C Standard states that
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be stored.
The second sentence can be difficult to understand. It says that if an object is written to within a full
expression, any and all accesses to it within the same expression must be for the purposes of computing
the value to be written. This rule effectively constrains legal expressions to those in which the accesses
demonstrably precede the modification.
References: ANSI Sec. 2.1.2.3, Sec. 3.3, Appendix B
ISO Sec. 5.1.2.3, Sec. 6.3, Annex C
H&S Sec. 7.12.1 pp. 228-9
Question 3.9
Question 3.9
So given
a[i] = i++;
we don't know which cell of a[] gets written to, but i does get incremented by one.
No. Once an expression or program becomes undefined, all aspects of it become undefined. See
questions 3.2, 3.3, 11.33, and 11.35.
Question 3.3
Question 3.3
I've experimented with the code
int i = 3;
i = i++;
on several compilers. Some gave i the value 3, some gave 4, but one gave 7. I know the behavior is
undefined, but how could it give 7?
Undefined behavior means anything can happen. See questions 3.9 and 11.33. (Also, note that neither
i++ nor ++i is the same as i+1. If you want to increment i, use i=i+1 or i++ or ++i, not some
combination. See also question 3.12.)
Question 11.34
Question 11.34
I'm appalled that the ANSI Standard leaves so many issues undefined. Isn't a Standard's whole job to
standardize these things?
It has always been a characteristic of C that certain constructs behaved in whatever way a particular
compiler or a particular piece of hardware chose to implement them. This deliberate imprecision often
allows compilers to generate more efficient code for common cases, without having to burden all
programs with extra code to assure well-defined behavior of cases deemed to be less reasonable.
Therefore, the Standard is simply codifying existing practice.
A programming language standard can be thought of as a treaty between the language user and the
compiler implementor. Parts of that treaty consist of features which the compiler implementor agrees to
provide, and which the user may assume will be available. Other parts, however, consist of rules which
the user agrees to follow and which the implementor may assume will be followed. As long as both sides
uphold their guarantees, programs have a fighting chance of working correctly. If either side reneges on
any of its commitments, nothing is guaranteed to work.
References: Rationale Sec. 1.1
ANSI/ISO Standard C

11.1 What is the `ÀNSI C Standard?''
11.2 How can I get a copy of the Standard?
11.3 My ANSI compiler is complaining about prototype mismatches for parameters declared float.
11.4 Can you mix old-style and new-style function syntax?
11.5 Why does the declaration "extern f(struct x *p);" give me a warning message?
11.8 Why can't I use const values in initializers and array dimensions?
11.9 What's the difference between const char *p and char * const p?
11.10 Why can't I pass a char ** to a function which expects a const char **?
11.12 Can I declare main as void, to shut off these annoying ``main returns no value'' messages?
11.13 But what about main's third argument, envp?
11.14 I believe that declaring void main() can't fail, since I'm calling exit instead of returning.
11.15 The book I've been using always uses void main().
11.16 Is exit(status) truly equivalent to returning the same status from main?
11.17 How do I get the ANSI ``stringizing'' preprocessing operator `#' to stringize the macro's value
instead of its name?
11.18 What does the message ``warning: macro replacement within a string literal'' mean?
11.19 I'm getting strange syntax errors inside lines I've #ifdeffed out.
11.20 What are #pragmas ?
ANSI/ISO Standard C
11.21 What does ``#pragma once'' mean?

11.22 Is char a[3] = "abc"; legal?
11.24 Why can't I perform arithmetic on a void * pointer?
11.25 What's the difference between memcpy and memmove?
11.26 What should malloc(0) do?
11.27 Why does the ANSI Standard not guarantee more than six case-insensitive characters of external
identifier significance?
11.29 My compiler is rejecting the simplest possible test programs, with all kinds of syntax errors.
11.30 Why are some ANSI/ISO Standard library routines showing up as undefined, even though I've got
an ANSI compiler?
11.31 Does anyone have a tool for converting old-style C programs to ANSI C, or for automatically
generating prototypes?
11.32 Why won't frobozz-cc, which claims to be ANSI compliant, accept this code?
11.33 What's the difference between implementation-defined, unspecified, and undefined behavior?
11.34 I'm appalled that the ANSI Standard leaves so many issues undefined.
11.35 I just tried some allegedly-undefined code on an ANSI-conforming compiler, and got the results I
expected.
top
Question 12.9
Question 12.9
Someone told me it was wrong to use %lf with printf. How can printf use %f for type double, if
scanf requires %lf?
It's true that printf's %f specifier works with both float and double arguments. Due to the
``default argument promotions'' (which apply in variable-length argument lists such as printf's,
whether or not prototypes are in scope), values of type float are promoted to double, and printf
therefore sees only doubles. See also questions 12.13 and 15.2.
References: K&R1 Sec. 7.3 pp. 145-47, Sec. 7.4 pp. 147-50
K&R2 Sec. 7.2 pp. 153-44, Sec. 7.4 pp. 157-59
ANSI Sec. 4.9.6.1, Sec. 4.9.6.2
ISO Sec. 7.9.6.1, Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64, Sec. 15.11 pp. 366-78
CT&P Sec. A.1 pp. 121-33
Question 12.13
Question 12.13
Why doesn't this code:
double d;
scanf("%f", &d);
work?
Unlike printf, scanf uses %lf for values of type double, and %f for float. See also question
12.9.
Question 12.12
Question 12.12
Why doesn't the call scanf("%d", i) work?
The arguments you pass to scanf must always be pointers. To fix the fragment above, change it to
scanf("%d", &i) .
Question 12.11
Question 12.11
How can I print numbers with commas separating the thousands?
What about currency formatted numbers?
The routines in <locale.h> begin to provide some support for these operations, but there is no
standard routine for doing either task. (The only thing printf does in response to a custom locale
setting is to change its decimal-point character.)
ISO Sec. 7.4
H&S Sec. 11.6 pp. 301-4
Question 12.10
Question 12.10
How can I implement a variable field width with printf? That is, instead of %8d, I want the width to
be specified at run time.
printf("%*d", width, n) will do just what you want. See also question 12.15.
K&R2 Sec. 7.2
ANSI Sec. 4.9.6.1
ISO Sec. 7.9.6.1
H&S Sec. 15.11.6
CT&P Sec. A.1
Question 12.15
Question 12.15
How can I specify a variable width in a scanf format string?
You can't; an asterisk in a scanf format string means to suppress assignment. You may be able to use
ANSI stringizing and string concatenation to accomplish about the same thing, or to construct a scanf
format string on-the-fly.
Question 12.17
Question 12.17
When I read numbers from the keyboard with scanf "%d\n", it seems to hang until I type one extra
line of input.
Perhaps surprisingly, \n in a scanf format string does not mean to expect a newline, but rather to read
and discard characters as long as each is a whitespace character. See also question 12.20.
References: K&R2 Sec. B1.3 pp. 245-6
ANSI Sec. 4.9.6.2
ISO Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64
Question 12.20
Question 12.20
Why does everyone say not to use scanf? What should I use instead?
scanf has a number of problems--see questions 12.17, 12.18, and 12.19. Also, its %s format has the
same problem that gets() has (see question 12.23)--it's hard to guarantee that the receiving buffer
won't overflow.
More generally, scanf is designed for relatively structured, formatted input (its name is in fact derived
from ``scan formatted''). If you pay attention, it will tell you whether it succeeded or failed, but it can tell
you only approximately where it failed, and not at all how or why. It's nearly impossible to do decent
error recovery with scanf; usually it's far easier to read entire lines (with fgets or the like), then
interpret them, either using sscanf or some other techniques. (Routines like strtol, strtok, and
atoi are often useful; see also question 13.6.) If you do use sscanf, don't forget to check the return
value to make sure that the expected number of items were found.
Question 12.18
Question 12.18
I'm reading a number with scanf %d and then a string with gets(), but the compiler seems to be
skipping the call to gets()!
scanf %d won't consume a trailing newline. If the input number is immediately followed by a newline,
that newline will immediately satisfy the gets().
As a general rule, you shouldn't try to interlace calls to scanf with calls to gets() (or any other input
routines); scanf's peculiar treatment of newlines almost always leads to trouble. Either use scanf to
read everything or nothing.
ISO Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64
Question 12.23
Question 12.23
Why does everyone say not to use gets()?
Unlike fgets(), gets() cannot be told the size of the buffer it's to read into, so it cannot be
prevented from overflowing that buffer. As a general rule, always use fgets(). See question 7.1 for a
code fragment illustrating the replacement of gets() with fgets().
References: Rationale Sec. 4.9.7.2
H&S Sec. 15.7 p. 356
Question 7.1
Question 7.1
Why doesn't this fragment work?
char *answer;
printf("Type something:\n");
gets(answer);
printf("You typed \"%s\"\n", answer);
The pointer variable answer, which is handed to gets() as the location into which the response
should be stored, has not been set to point to any valid storage. That is, we cannot say where the pointer
answer points. (Since local variables are not initialized, and typically contain garbage, it is not even
guaranteed that answer starts out as a null pointer. See questions 1.30 and 5.1.)
The simplest way to correct the question-asking program is to use a local array, instead of a pointer, and
let the compiler worry about allocation:
#include <stdio.h>
#include <string.h>
char answer[100], *p;
printf("Type something:\n");
fgets(answer, sizeof answer, stdin);
if((p = strchr(answer, '\n')) != NULL)
*p = '\0';
printf("You typed \"%s\"\n", answer);
This example also uses fgets() instead of gets(), so that the end of the array cannot be overwritten.
(See question 12.23. Unfortunately for this example, fgets() does not automatically delete the trailing
\n, gets() would.) It would also be possible to use malloc() to allocate the answer buffer.
Question 7.2
Question 7.2
I can't get strcat to work. I tried
char *s1 = "Hello, ";
char *s2 = "world!";
char *s3 = strcat(s1, s2);
but I got strange results.
As in question 7.1, the main problem here is that space for the concatenated result is not properly
allocated. C does not provide an automatically-managed string type. C compilers only allocate memory
for objects explicitly mentioned in the source code (in the case of ``strings,'' this includes character arrays
and string literals). The programmer must arrange for sufficient space for the results of run-time
operations such as string concatenation, typically by declaring arrays, or by calling malloc.
strcat performs no allocation; the second string is appended to the first one, in place. Therefore, one
fix would be to declare the first string as an array:
char s1[20] = "Hello, ";
Since strcat returns the value of its first argument (s1, in this case), the variable s3 is superfluous.
The original call to strcat in the question actually has two problems: the string literal pointed to by
s1, besides not being big enough for any concatenated text, is not necessarily writable at all. See
question 1.32.
References: CT&P Sec. 3.2 p. 32
Question 1.32
Question 1.32
What is the difference between these initializations?
char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A string literal can be used in two slightly different ways. As an array initializer (as in the declaration of
char a[]), it specifies the initial values of the characters in that array. Anywhere else, it turns into an
unnamed, static array of characters, which may be stored in read-only memory, which is why you can't
safely modify it. In an expression context, the array is converted at once to a pointer, as usual (see section
6), so the second declaration initializes p to point to the unnamed array's first element.
(For compiling old code, some compilers have a switch controlling whether strings are writable or not.)
See also questions 1.31, 6.1, 6.2, and 6.8.
ANSI Sec. 3.1.4, Sec. 3.5.7
ISO Sec. 6.1.4, Sec. 6.5.7
H&S Sec. 2.7.4 pp. 31-2
Arrays and Pointers

6.1 I had the definition char a[6] in one source file, and in another I declared extern char *a.
Why didn't it work?
6.2 But I heard that char a[] was identical to char *a.
6.3 So what is meant by the `èquivalence of pointers and arrays'' in C?
6.4 Why are array and pointer declarations interchangeable as function formal parameters?
6.7 How can an array be an lvalue, if you can't assign to it?
6.8 What is the real difference between arrays and pointers?
6.9 Someone explained to me that arrays were really just constant pointers.
6.11 I came across some ``joke'' code containing the `èxpression'' 5["abcdef"] . How can this be
legal C?
6.12 What's the difference between array and &array?
6.13 How do I declare a pointer to an array?
6.14 How can I set an array's size at run time?
6.15 How can I declare local arrays of a size matching a passed-in array?
6.16 How can I dynamically allocate a multidimensional array?
6.17 Can I simulate a non-0-based array with a pointer?
6.18 My compiler complained when I passed a two-dimensional array to a function expecting a pointer to
a pointer.
6.19 How do I write functions which accept two-dimensional arrays when the ``width'' is not known at
compile time?
Arrays and Pointers
6.20 How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when
passing them to functions?
6.21 Why doesn't sizeof properly report the size of an array which is a parameter to a function?
top
Question 16.8
Question 16.8
What do ``Segmentation violation'' and ``Bus error'' mean?
These generally mean that your program tried to access memory it shouldn't have, invariably as a result
of improper pointer use. Likely causes are inadvertent use of null pointers (see also questions 5.2 and
5.20) or uninitialized, misaligned, or otherwise improperly allocated pointers (see questions 7.1 and 7.2);
corruption of the malloc arena (see question 7.19); and mismatched function arguments, especially
involving pointers; two possibilities are scanf (see question 12.12) and fprintf (make sure it
receives its first FILE * argument).
Question 6.7
Question 6.7
How can an array be an lvalue, if you can't assign to it?
The ANSI C Standard defines a ``modifiable lvalue,'' which an array is not.

ISO Sec. 6.2.2.1
H&S Sec. 7.1 p. 179
Question 6.8
Question 6.8
Practically speaking, what is the difference between arrays and pointers?
Arrays automatically allocate space, but can't be relocated or resized. Pointers must be explicitly assigned
to point to allocated space (perhaps using malloc), but can be reassigned (i.e. pointed at different
objects) at will, and have many other uses besides serving as the base of blocks of memory.
Due to the so-called equivalence of arrays and pointers (see question 6.3), arrays and pointers often seem
interchangeable, and in particular a pointer to a block of memory assigned by malloc is frequently
treated (and can be referenced using []) exactly as if it were a true array. See questions 6.14 and 6.16.
(Be careful with sizeof, though.)
Question 6.14
Question 6.14
How can I set an array's size at run time?
How can I avoid fixed-sized arrays?
The equivalence between arrays and pointers (see question 6.3) allows a pointer to malloc'ed memory
to simulate an array quite effectively. After executing
#include <stdlib.h>
int *dynarray = (int *)malloc(10 * sizeof(int));
(and if the call to malloc succeeds), you can reference dynarray[i] (for i from 0 to 9) just as if
dynarray were a conventional, statically-allocated array (int a[10]). See also question 6.16.
Question 6.13
Question 6.13
How do I declare a pointer to an array?
Usually, you don't want to. When people speak casually of a pointer to an array, they usually mean a
pointer to its first element.
Instead of a pointer to an array, consider using a pointer to one of the array's elements. Arrays of type T
decay into pointers to type T (see question 6.3), which is convenient; subscripting or incrementing the
resultant pointer will access the individual members of the array. True pointers to arrays, when
subscripted or incremented, step over entire arrays, and are generally useful only when operating on
arrays of arrays, if at all. (See question 6.18.)
If you really need to declare a pointer to an entire array, use something like `ìnt (*ap)[N];'' where
N is the size of the array. (See also question 1.21.) If the size of the array is unknown, N can in principle
be omitted, but the resulting type, ``pointer to array of unknown size,'' is useless.
ISO Sec. 6.2.2.1
Question 1.22
Question 1.22
How can I declare a function that can return a pointer to a function of the same type? I'm building a state
machine with one function for each state, each of which returns a pointer to the function for the next
state. But I can't find a way to declare the functions.
You can't quite do it directly. Either have the function return a generic function pointer, with some
judicious casts to adjust the types as the pointers are passed around; or have it return a structure
containing only a pointer to a function returning that structure.
Question 1.25
Question 1.25
My compiler is complaining about an invalid redeclaration of a function, but I only define it once and
call it once.
Functions which are called without a declaration in scope (perhaps because the first call precedes the
function's definition) are assumed to be declared as returning int (and without any argument type
information), leading to discrepancies if the function is later declared or defined otherwise. Non-int
functions must be declared before they are called.
Another possible source of this problem is that the function has the same name as another one declared in
some header file.
K&R2 Sec. 4.2 p. 72
ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
H&S Sec. 4.7 p. 101
Question 15.1
Question 15.1
I heard that you have to #include <stdio.h> before calling printf. Why?
So that a proper prototype for printf will be in scope.

A compiler may use a different calling sequence for functions which accept variable-length argument
lists. (It might do so if calls using variable-length argument lists were less efficient than those using fixedlength.) Therefore, a prototype (indicating, using the ellipsis notation ``...'', that the argument list is of
variable length) must be in scope whenever a varargs function is called, so that the compiler knows to use
the varargs calling mechanism.
References: ANSI Sec. 3.3.2.2, Sec. 4.1.6
ISO Sec. 6.3.2.2, Sec. 7.1.7
Rationale Sec. 3.3.2.2, Sec. 4.1.6
H&S Sec. 9.2.4 pp. 268-9, Sec. 9.6 pp. 275-6
Question 14.13
Question 14.13
I'm having trouble with a Turbo C program which crashes and says something like ``floating point
formats not linked.''
Some compilers for small machines, including Borland's (and Ritchie's original PDP-11 compiler), leave
out certain floating point support if it looks like it will not be needed. In particular, the non-floating-point
versions of printf and scanf save space by not including code to handle %e, %f, and %g. It happens
that Borland's heuristics for determining whether the program uses floating point are insufficient, and the
programmer must sometimes insert an extra, explicit call to a floating-point library routine to force
loading of floating-point support. (See the comp.os.msdos.programmer FAQ list for more information.)
Floating Point
14. Floating Point

14.1 When I set a float variable to 3.1, why is printf printing it as 3.0999999?
14.2 Why is sqrt(144.) giving me crazy numbers?
14.3 I keep getting `ùndefined: sin'' compilation errors.
14.4 My floating-point calculations are acting strangely and giving me different answers on different
machines.
14.5 What's a good way to check for ``close enough'' floating-point equality?
14.6 How do I round numbers?
14.7 Where is C's exponentiation operator?
14.8 The pre-#defined constant M_PI seems to be missing from <math.h>.
14.9 How do I test for IEEE NaN and other special values?
14.11 What's a good way to implement complex numbers in C?
14.12 I'm looking for some mathematical library code.
14.13 I'm having trouble with a Turbo C program which crashes and says something like ``floating point
formats not linked.''
top
Question 14.1
Question 14.1
When I set a float variable to, say, 3.1, why is printf printing it as 3.0999999?
Most computers use base 2 for floating-point numbers as well as for integers. In base 2, 1/1010 (that is,
1/10 decimal) is an infinitely-repeating fraction: its binary representation is 0.0001100110011... .
Depending on how carefully your compiler's binary/decimal conversion routines (such as those used by
printf) have been written, you may see discrepancies when numbers (especially low-precision
floats) not exactly representable in base 2 are assigned or read in and then printed (i.e. converted from
base 10 to base 2 and back again). See also question 14.6.
Question 14.6
Question 14.6
How do I round numbers?
The simplest and most straightforward way is with code like

(int)(x + 0.5)
This technique won't work properly for negative numbers, though.
Question 14.5
Question 14.5
What's a good way to check for ``close enough'' floating-point equality?
Since the absolute accuracy of floating point values varies, by definition, with their magnitude, the best
way of comparing two floating point values is to use an accuracy threshold which is relative to the
magnitude of the numbers being compared. Rather than
double a, b;
...
if(a == b)
/* WRONG */
use something like

#include <math.h>
if(fabs(a - b) <= epsilon * fabs(a))
for some suitably-chosen epsilon.
References: Knuth Sec. 4.2.2 pp. 217-8
Question 14.4
Question 14.4
My floating-point calculations are acting strangely and giving me different answers on different
machines.
First, see question 14.2.

If the problem isn't that simple, recall that digital computers usually use floating-point formats which
provide a close but by no means exact simulation of real number arithmetic. Underflow, cumulative
precision loss, and other anomalies are often troublesome.
Don't assume that floating-point results will be exact, and especially don't assume that floating-point
values can be compared for equality. (Don't throw haphazard ``fuzz factors'' in, either; see question 14.5.)
These problems are no worse for C than they are for any other computer language. Certain aspects of
floating-point are usually defined as ``however the processor does them'' (see also question 11.34),
otherwise a compiler for a machine without the ``right'' model would have to do prohibitively expensive
emulations.
This article cannot begin to list the pitfalls associated with, and workarounds appropriate for, floatingpoint work. A good numerical programming text should cover the basics; see also the references below.
References: Kernighan and Plauger, The Elements of Programming Style Sec. 6 pp. 115-8
Knuth, Volume 2 chapter 4
David Goldberg, ``What Every Computer Scientist Should Know about Floating-Point Arithmetic''
Question 14.2
Question 14.2
I'm trying to take some square roots, but I'm getting crazy numbers.
Make sure that you have #included <math.h>, and correctly declared other functions returning
double. (Another library routine to be careful with is atof, which is declared in <stdlib.h>.) See
also question 14.3.
Question 14.3
Question 14.3
I'm trying to do some simple trig, and I am #including <math.h>, but I keep getting `ùndefined: sin''
compilation errors.
Make sure you're actually linking with the math library. For instance, under Unix, you usually need to
use the -lm option, at the end of the command line, when compiling/linking. See also questions 13.25
and 13.26.
Question 13.25
Question 13.25
I keep getting errors due to library functions being undefined, but I'm #including all the right header files.
In some cases (especially if the functions are nonstandard) you may have to explicitly ask for the correct
libraries to be searched when you link the program. See also questions 11.30, 13.26, and 14.3.
Question 11.30
Question 11.30
Why are some ANSI/ISO Standard library routines showing up as undefined, even though I've got an
ANSI compiler?
It's possible to have a compiler available which accepts ANSI syntax, but not to have ANSI-compatible
header files or run-time libraries installed. (In fact, this situation is rather common when using a nonvendor-supplied compiler such as gcc.) See also questions 11.29, 13.25, and 13.26.
Question 11.29
Question 11.29
My compiler is rejecting the simplest possible test programs, with all kinds of syntax errors.
Perhaps it is a pre-ANSI compiler, unable to accept function prototypes and the like.
Question 1.31
Question 1.31
This code, straight out of a book, isn't compiling:
f()
{
char a[] = "Hello, world!";
}
Perhaps you have a pre-ANSI compiler, which doesn't allow initialization of `àutomatic aggregates'' (i.e.
non-static local arrays, structures, and unions). As a workaround, you can make the array global or
static (if you won't need a fresh copy during any subsequent calls), or replace it with a pointer (if the
array won't be written to). (You can always initialize local char * variables to point to string literals,
but see question 1.32.) If neither of these conditions hold, you'll have to initialize the array by hand with
strcpy when f is called. See also question 11.29.
Declarations and Initializations

1.1 How do you decide which integer type to use?
1.4 What should the 64-bit type on new, 64-bit machines be?
1.7 What's the best way to declare and define global variables?
1.11 What does extern mean in a function declaration?
1.12 What's the auto keyword good for?
1.14 I can't seem to define a linked list node which contains a pointer to itself.
1.21 How do I declare an array of N pointers to functions returning pointers to functions returning
pointers to characters?
1.22 How can I declare a function that returns a pointer to a function of its own type?
1.25 My compiler is complaining about an invalid redeclaration of a function, but I only define it once
and call it once.
1.30 What can I safely assume about the initial values of variables which are not explicitly initialized?
1.31 Why can't I initialize a local array with a string?
1.32 What is the difference between char a[] = "string"; and char *p = "string"; ?
1.34 How do I initialize a pointer to a function?
top
Question 12.2
Question 12.2
Why does the code while(!feof(infp)) { fgets(buf, MAXLINE, infp);
fputs(buf, outfp); } copy the last line twice?
In C, EOF is only indicated after an input routine has tried to read, and has reached end-of-file. (In other
words, C's I/O is not like Pascal's.) Usually, you should just check the return value of the input routine
(fgets in this case); often, you don't need to use feof at all.
ANSI Sec. 4.9.3, Sec. 4.9.7.1, Sec. 4.9.10.2
ISO Sec. 7.9.3, Sec. 7.9.7.1, Sec. 7.9.10.2
H&S Sec. 15.14 p. 382
Question 12.4
Question 12.4
My program's prompts and intermediate output don't always show up on the screen, especially when I
pipe the output through another program.
It's best to use an explicit fflush(stdout) whenever output should definitely be visible. Several
mechanisms attempt to perform the fflush for you, at the ``right time,'' but they tend to apply only
when stdout is an interactive terminal. (See also question 12.24.)
ISO Sec. 7.9.5.2
Question 12.24
Question 12.24
Why does errno contain ENOTTY after a call to printf?
Many implementations of the stdio package adjust their behavior slightly if stdout is a terminal. To
make the determination, these implementations perform some operation which happens to fail (with
ENOTTY) if stdout is not a terminal. Although the output operation goes on to complete successfully,
errno still contains ENOTTY. (Note that it is only meaningful for a program to inspect the contents of
errno after an error has been reported.)
References: ANSI Sec. 4.1.3, Sec. 4.9.10.3
ISO Sec. 7.1.4, Sec. 7.9.10.3
CT&P Sec. 5.4 p. 73
PCS Sec. 14 p. 254
Question 1.7
Question 1.7
What's the best way to declare and define global variables?
First, though there can be many declarations (and in many translation units) of a single ``global'' (strictly
speaking, `èxternal'') variable or function, there must be exactly one definition. (The definition is the
declaration that actually allocates space, and provides an initialization value, if any.) The best
arrangement is to place each definition in some relevant .c file, with an external declaration in a header
(``.h'') file, which is #included wherever the declaration is needed. The .c file containing the definition
should also #include the same header file, so that the compiler can check that the definition matches
the declarations.
This rule promotes a high degree of portability: it is consistent with the requirements of the ANSI C
Standard, and is also consistent with most pre-ANSI compilers and linkers. (Unix compilers and linkers
typically use a ``common model'' which allows multiple definitions, as long as at most one is initialized;
this behavior is mentioned as a ``common extension'' by the ANSI Standard, no pun intended. A few very
odd systems may require an explicit initializer to distinguish a definition from an external declaration.)
It is possible to use preprocessor tricks to arrange that a line like
DEFINE(int, i);
need only be entered once in one header file, and turned into a definition or a declaration depending on
the setting of some macro, but it's not clear if this is worth the trouble.
It's especially important to put global declarations in header files if you want the compiler to catch
inconsistent declarations for you. In particular, never place a prototype for an external function in a .c
file: it wouldn't generally be checked for consistency with the definition, and an incompatible prototype
is worse than useless.
K&R2 Sec. 4.4 pp. 80-1
ANSI Sec. 3.1.2.2, Sec. 3.7, Sec. 3.7.2, Sec. F.5.11
ISO Sec. 6.1.2.2, Sec. 6.7, Sec. 6.7.2, Sec. G.5.11
H&S Sec. 4.8 pp. 101-104, Sec. 9.2.3 p. 267
CT&P Sec. 4.2 pp. 54-56
Question 1.7
Question 10.6
Question 10.6
I'm splitting up a program into multiple source files for the first time, and I'm wondering what to put in .c
files and what to put in .h files. (What does ``.h'' mean, anyway?)
As a general rule, you should put these things in header (.h) files:
macro definitions (preprocessor #defines)
structure, union, and enumeration declarations
typedef declarations
external function declarations (see also question 1.11)
global variable declarations
It's especially important to put a declaration or definition in a header file when it will be shared between
several other files. (In particular, never put external function prototypes in .c files. See also question 1.7.)
On the other hand, when a definition or declaration should remain private to one source file, it's fine to
leave it there.
H&S Sec. 9.2.3 p. 267
CT&P Sec. 4.6 pp. 66-7
Question 1.11
Question 1.11
What does extern mean in a function declaration?
It can be used as a stylistic hint to indicate that the function's definition is probably in another source file,
but there is no formal difference between
extern int f();
and
int f();
ISO Sec. 6.1.2.2, Sec. 6.5.1
H&S Secs. 4.3,4.3.1 pp. 75-6
Question 1.12
Question 1.12
What's the auto keyword good for?
Nothing; it's archaic. See also question 20.37.

ANSI Sec. 3.1.2.4, Sec. 3.5.1
ISO Sec. 6.1.2.4, Sec. 6.5.1
H&S Sec. 4.3 p. 75, Sec. 4.3.1 p. 76
Question 20.37
Question 20.37
What was the entry keyword mentioned in K&R1?
It was reserved to allow the possibility of having functions with multiple, differently-named entry points,
la FORTRAN. It was not, to anyone's knowledge, ever implemented (nor does anyone remember what
sort of syntax might have been imagined for it). It has been withdrawn, and is not a keyword in ANSI C.
References: K&R2 p. 259 Appendix C
Question 20.36
Question 20.36
When will the next International Obfuscated C Code Contest (IOCCC) be held? How can I get a copy of
the current and previous winning entries?
The contest schedule is tied to the dates of the USENIX conferences at which the winners are announced.
At the time of this writing, it is expected that the yearly contest will open in October. To obtain a current
copy of the rules and guidelines, send e-mail with the Subject: line ``send rules'' to:
{apple,pyramid,sun,uunet}!hoptoad!judges or judges@toad.com
(Note that these are not the addresses for submitting entries.)
Contest winners should be announced at the winter USENIX conference in January, and are posted to the
net sometime thereafter. Winning entries from previous years (back to 1984) are archived at ftp.uu.net
(see question 18.16) under the directory pub/ioccc/.
As a last resort, previous winners may be obtained by sending e-mail to the above address, using the
Subject: ``send YEAR winners'', where YEAR is a single four-digit year, a year range, or `àll''.
Question 20.35
Question 20.35
What is ``Duff's Device''?
It's a devastatingly deviously unrolled byte-copying loop, devised by Tom Duff while he was at
Lucasfilm. In its ``classic'' form, it looks like:
register n = (count +
switch (count % 8)
{
case 0:
do { *to =
case 7:
*to =
case 6:
*to =
case 5:
*to =
case 4:
*to =
case 3:
*to =
case 2:
*to =
case 1:
*to =
} while
}
7) / 8;
/* count > 0 assumed */
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
(--n > 0);
where count bytes are to be copied from the array pointed to by from to the memory location pointed
to by to (which is a memory-mapped device output register, which is why to isn't incremented). It
solves the problem of handling the leftover bytes (when count isn't a multiple of 8) by interleaving a
switch statement with the loop which copies bytes 8 at a time. (Believe it or not, it is legal to have
case labels buried within blocks nested in a switch statement like this. In his announcement of the
technique to C's developers and the world, Duff noted that C's switch syntax, in particular its ``fall
through'' behavior, had long been controversial, and that ``This code forms some sort of argument in that
debate, but I'm not sure whether it's for or against.'')
Question 20.34
Question 20.34
Here's a good puzzle: how do you write a program which produces its own source code as its output?
It is actually quite difficult to write a self-reproducing program that is truly portable, due particularly to
quoting and character set difficulties.
Here is a classic example (which is normally presented on one line, although it will ``fix'' itself the first
time it's run):
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";
main(){printf(s,34,s,34);}
(This program, like many of the genre, assumes that the double-quote character " has the value 34, as it
does in ASCII.)
Question 20.32
Question 20.32
Will 2000 be a leap year? Is (year % 4 == 0) an accurate test for leap years?
Yes and no, respectively. The full expression for the present Gregorian calendar is
year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)
See a good astronomical almanac or other reference for details. (To forestall an eternal debate: references
which claim the existence of a 4000-year rule are wrong.)
Question 20.31
Question 20.31
How can I find the day of the week given the date?
Use mktime or localtime (see questions 13.13 and 13.14, but beware of DST adjustments if
tm_hour is 0), or Zeller's congruence (see the sci.math FAQ list), or this elegant code by Tomohiko
Sakamoto:
dayofweek(y, m, d)
/* 0 = Sunday */
int y, m, d;
/* 1 <= m <= 12, y > 1752 or so */
{
static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
y -= m < 3;
return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
}
ISO Sec. 7.12.2.3
Question 13.13
Question 13.13
I know that the library routine localtime will convert a time_t into a broken-down struct tm,
and that ctime will convert a time_t to a printable string. How can I perform the inverse operations
of converting a struct tm or a string into a time_t?
ANSI C specifies a library routine, mktime, which converts a struct tm to a time_t.

Converting a string to a time_t is harder, because of the wide variety of date and time formats which
might be encountered. Some systems provide a strptime function, which is basically the inverse of
strftime. Other popular routines are partime (widely distributed with the RCS package) and
getdate (and a few others, from the C news distribution). See question 18.16.
References: K&R2 Sec. B10 p. 256
ANSI Sec. 4.12.2.3
ISO Sec. 7.12.2.3
H&S Sec. 18.4 pp. 401-2
Question 13.12
Question 13.12
How can I get the current date or time of day in a C program?
Just use the time, ctime, and/or localtime functions. (These routines have been around for years,
and are in the ANSI standard.) Here is a simple example:
#include <stdio.h>
#include <time.h>
main()
{
time_t now;
time(&now);
printf("It's %.24s.\n", ctime(&now));
return 0;
}
References: K&R2 Sec. B10 pp. 255-7
ANSI Sec. 4.12
ISO Sec. 7.12
H&S Sec. 18
Question 13.11
Question 13.11
How can I sort more data than will fit in memory?
You want an `èxternal sort,'' which you can read about in Knuth, Volume 3. The basic idea is to sort the
data in chunks (as much as will fit in memory at one time), write each sorted chunk to a temporary file,
and then merge the files. Your operating system may provide a general-purpose sort utility, and if so, you
can try invoking it from within your program: see questions 19.27 and 19.30.
References: Knuth Sec. 5.4 pp. 247-378
Sedgewick Sec. 13 pp. 177-187
Question 13.10
Question 13.10
How can I sort a linked list?
Sometimes it's easier to keep the list in order as you build it (or perhaps to use a tree instead). Algorithms
like insertion sort and merge sort lend themselves ideally to use with linked lists. If you want to use a
standard library function, you can allocate a temporary array of pointers, fill it in with pointers to all your
list nodes, call qsort, and finally rebuild the list pointers based on the sorted array.
References: Knuth Sec. 5.2.1 pp. 80-102, Sec. 5.2.4 pp. 159-168
Sedgewick Sec. 8 pp. 98-100, Sec. 12 pp. 163-175
Question 13.9
Question 13.9
Now I'm trying to sort an array of structures with qsort. My comparison function takes pointers to
structures, but the compiler complains that the function is of the wrong type for qsort. How can I cast
the function pointer to shut off the warning?
The conversions must be in the comparison function, which must be declared as accepting ``generic
pointers'' (const void *) as discussed in question 13.8 above. The comparison function might look
like
int mystructcmp(const void *p1, const void *p2)
{
const struct mystruct *sp1 = p1;
const struct mystruct *sp2 = p2;
/* now compare sp1->whatever and sp2-> ... */
(The conversions from generic pointers to struct mystruct pointers happen in the initializations
sp1 = p1 and sp2 = p2; the compiler performs the conversions implicitly since p1 and p2 are
void pointers. Explicit casts, and char * pointers, would be required under a pre-ANSI compiler. See
also question 7.7.)
If, on the other hand, you're sorting pointers to structures, you'll need indirection, as in question 13.8:
sp1 = *(struct mystruct **)p1 .
In general, it is a bad idea to insert casts just to ``shut the compiler up.'' Compiler warnings are usually
trying to tell you something, and unless you really know what you're doing, you ignore or muzzle them at
your peril. See also question 4.9.
ISO Sec. 7.10.5.2
H&S Sec. 20.5 p. 419

Question 13.9
Question 13.8
Question 13.8
I'm trying to sort an array of strings with qsort, using strcmp as the comparison function, but it's not
working.
By `àrray of strings'' you probably mean `àrray of pointers to char.'' The arguments to qsort's
comparison function are pointers to the objects being sorted, in this case, pointers to pointers to char.
strcmp, however, accepts simple pointers to char. Therefore, strcmp can't be used directly. Write an
intermediate comparison function like this:
/* compare strings via pointers */
int pstrcmp(const void *p1, const void *p2)
{
return strcmp(*(char * const *)p1, *(char * const *)p2);
}
The comparison function's arguments are expressed as ``generic pointers,'' const void *. They are
converted back to what they ``really are'' (char **) and dereferenced, yielding char *'s which can be
passed to strcmp. (Under a pre-ANSI compiler, declare the pointer parameters as char * instead of
void *, and drop the consts.)
(Don't be misled by the discussion in K&R2 Sec. 5.11 pp. 119-20, which is not discussing the Standard
library's qsort).
ISO Sec. 7.10.5.2
H&S Sec. 20.5 p. 419
Question 13.7
Question 13.7
I need some code to do regular expression and wildcard matching.
Make sure you recognize the difference between classic regular expressions (variants of which are used
in such Unix utilities as ed and grep), and filename wildcards (variants of which are used by most
operating systems).
There are a number of packages available for matching regular expressions. Most packages use a pair of
functions, one for ``compiling'' the regular expression, and one for `èxecuting'' it (i.e. matching strings
against it). Look for header files named <regex.h> or <regexp.h>, and functions called
regcmp/regex, regcomp/regexec, or re_comp/re_exec. (These functions may exist in a
separate regexp library.) A popular, freely-redistributable regexp package by Henry Spencer is available
from ftp.cs.toronto.edu in pub/regexp.shar.Z or in several other archives. The GNU project has a package
called rx. See also question 18.16.
Filename wildcard matching (sometimes called ``globbing'') is done in a variety of ways on different
systems. On Unix, wildcards are automatically expanded by the shell before a process is invoked, so
programs rarely have to worry about them explicitly. Under MS-DOS compilers, there is often a special
object file which can be linked in to a program to expand wildcards while argv is being built. Several
systems (including MS-DOS and VMS) provide system services for listing or opening files specified by
wildcards. Check your compiler/library documentation.
Question 13.6
Question 13.6
How can I split up a string into whitespace-separated fields?
How can I duplicate the process by which main() is handed argc and argv?
The only Standard routine available for this kind of ``tokenizing'' is strtok, although it can be tricky to
use and it may not do everything you want it to. (For instance, it does not handle quoting.)
ANSI Sec. 4.11.5.8
ISO Sec. 7.11.5.8
H&S Sec. 13.7 pp. 333-4
PCS p. 178
Question 13.5
Question 13.5
Why do some versions of toupper act strangely if given an upper-case letter?
Why does some code call islower before toupper?
Older versions of toupper and tolower did not always work correctly on arguments which did not
need converting (i.e. on digits or punctuation or letters already of the desired case). In ANSI/ISO
Standard C, these functions are guaranteed to work appropriately on all character arguments.
ISO Sec. 7.3.2
H&S Sec. 12.9 pp. 320-1
PCS p. 182
Question 13.2
Question 13.2
Why does strncpy not always place a '\0' terminator in the destination string?
strncpy was first designed to handle a now-obsolete data structure, the fixed-length, not-necessarily\0-terminated ``string.'' (A related quirk of strncpy's is that it pads short strings with multiple \0's,
out to the specified length.) strncpy is admittedly a bit cumbersome to use in other contexts, since you
must often append a '\0' to the destination string by hand. You can get around the problem by using
strncat instead of strncpy: if the destination string starts out empty, strncat does what you
probably wanted strncpy to do. Another possibility is sprintf(dest, "%.*s", n, source)
.
When arbitrary bytes (as opposed to strings) are being copied, memcpy is usually a more appropriate
routine to use than strncpy.
Question 13.1
Question 13.1
How can I convert numbers to strings (the opposite of atoi)? Is there an itoa function?
Just use sprintf. (Don't worry that sprintf may be overkill, potentially wasting run time or code
space; it works well in practice.) See the examples in the answer to question 7.5; see also question 12.21.
You can obviously use sprintf to convert long or floating-point numbers to strings as well (using
%ld or %f).
K&R2 Sec. 3.6 p. 64
Stdio
12. Stdio
12.1 What's wrong with the code "char c; while((c = getchar()) != EOF) ..."?
12.2 Why won't the code `` while(!feof(infp)) { fgets(buf, MAXLINE, infp);
fputs(buf, outfp); } '' work?
12.4 My program's prompts and intermediate output don't always show up on the screen.
12.5 How can I read one character at a time, without waiting for the RETURN key?
12.6 How can I print a '%' character with printf?
12.9 How can printf use %f for type double, if scanf requires %lf?
12.10 How can I implement a variable field width with printf?
12.11 How can I print numbers with commas separating the thousands?
12.12 Why doesn't the call scanf("%d", i) work?
12.13 Why doesn't the code "double d; scanf("%f", &d);" work?
12.15 How can I specify a variable width in a scanf format string?
12.17 When I read numbers from the keyboard with scanf "%d\n", it seems to hang until I type one
extra line of input.
12.18 I'm reading a number with scanf %d and then a string with gets(), but the compiler seems to
be skipping the call to gets()!
12.19 I'm re-prompting the user if scanf fails, but sometimes it seems to go into an infinite loop.
12.20 Why does everyone say not to use scanf? What should I use instead?
12.21 How can I tell how much destination buffer space I'll need for an arbitrary sprintf call? How
can I avoid overflowing the destination buffer with sprintf?
Stdio
12.23 Why does everyone say not to use gets()?

12.24 Why does errno contain ENOTTY after a call to printf?
12.25 What's the difference between fgetpos/fsetpos and ftell/fseek?
12.26 Will fflush(stdin) flush unread characters from the standard input stream?
12.30 I'm trying to update a file in place, by using fopen mode "r+", but it's not working.
12.33 How can I redirect stdin or stdout from within a program?
12.34 Once I've used freopen, how can I get the original stream back?
12.38 How can I read a binary data file properly?
top
Question 12.5
Question 12.5
How can I read one character at a time, without waiting for the RETURN key?
See question 19.1.
Question 12.6
Question 12.6
How can I print a '%' character in a printf format string? I tried \%, but it didn't work.
Simply double the percent sign: %% .

\% can't work, because the backslash \ is the compiler's escape character, while here our problem is that
the % is printf's escape character.
K&R2 Sec. 7.2 p. 154
ANSI Sec. 4.9.6.1
ISO Sec. 7.9.6.1
Question 19.17
Question 19.17
Why can't I open a file by its explicit path? The call
fopen("c:\newdir\file.dat", "r")
is failing.
The file you actually requested--with the characters \n and \f in its name--probably doesn't exist, and
isn't what you thought you were trying to open.
In character constants and string literals, the backslash \ is an escape character, giving special meaning
to the character following it. In order for literal backslashes in a pathname to be passed through to
fopen (or any other routine) correctly, they have to be doubled, so that the first backslash in each pair
quotes the second one:
fopen("c:\\newdir\\file.dat", "r");
Alternatively, under MS-DOS, it turns out that forward slashes are also accepted as directory separators,
so you could use
fopen("c:/newdir/file.dat", "r");
(Note, by the way, that header file names mentioned in preprocessor #include directives are not string
literals, so you may not have to worry about backslashes there.)
Question 19.16
Question 19.16
How can I delete a file?
The Standard C Library function is remove. (This is therefore one of the few questions in this section
for which the answer is not `Ìt's system-dependent.'') On older, pre-ANSI Unix systems, remove may
not exist, in which case you can try unlink.
References: K&R2 Sec. B1.1 p. 242
ANSI Sec. 4.9.4.1
ISO Sec. 7.9.4.1
H&S Sec. 15.15 p. 382
PCS Sec. 12 pp. 208,220-221
POSIX Sec. 5.5.1, Sec. 8.2.4
Question 19.15
Question 19.15
How can I recover the file name given an open stream or file descriptor?
This problem is, in general, insoluble. Under Unix, for instance, a scan of the entire disk (perhaps
involving special permissions) would theoretically be required, and would fail if the descriptor were
connected to a pipe or referred to a deleted file (and could give a misleading answer for a file with
multiple links). It is best to remember the names of files yourself when you open them (perhaps with a
wrapper function around fopen).
Question 2.24
Question 2.24
Is there an easy way to print enumeration values symbolically?
No. You can write a little function to map an enumeration constant to a string. (If all you're worried
about is debugging, a good debugger should automatically print enumeration constants symbolically.)
Question 2.22
Question 2.22
What is the difference between an enumeration and a set of preprocessor #defines?
At the present time, there is little difference. Although many people might have wished otherwise, the C
Standard says that enumerations may be freely intermixed with other integral types, without errors. (If
such intermixing were disallowed without explicit casts, judicious use of enumerations could catch
certain programming errors.)
Some advantages of enumerations are that the numeric values are automatically assigned, that a debugger
may be able to display the symbolic values when enumeration variables are examined, and that they obey
block scope. (A compiler may also generate nonfatal warnings when enumerations and integers are
indiscriminately mixed, since doing so can still be considered bad style even though it is not strictly
illegal.) A disadvantage is that the programmer has little control over those nonfatal warnings; some
programmers also resent not having control over the sizes of enumeration variables.
References: K&R2 Sec. 2.3 p. 39, Sec. A4.2 p. 196
ANSI Sec. 3.1.2.5, Sec. 3.5.2, Sec. 3.5.2.2, Appendix E
ISO Sec. 6.1.2.5, Sec. 6.5.2, Sec. 6.5.2.2, Annex F
H&S Sec. 5.5 pp. 127-9, Sec. 5.11.2 p. 153
Question 2.20
Question 2.20
Can I initialize unions?
ANSI Standard C allows an initializer for the first member of a union. There is no standard way of
initializing any other member (nor, under a pre-ANSI compiler, is there generally any way of initializing
a union at all).
ANSI Sec. 3.5.7
ISO Sec. 6.5.7
H&S Sec. 4.6.7 p. 100
Question 10.8
Question 10.8
Where are header (``#include'') files searched for?
The exact behavior is implementation-defined (which means that it is supposed to be documented; see
question 11.33). Typically, headers named with <> syntax are searched for in one or more standard
places. Header files named with "" syntax are first searched for in the ``current directory,'' then (if not
found) in the same standard places.
Traditionally (especially under Unix compilers), the current directory is taken to be the directory
containing the file containing the #include directive. Under other compilers, however, the current
directory (if any) is the directory in which the compiler was initially invoked. Check your compiler
documentation.
ANSI Sec. 3.8.2
ISO Sec. 6.8.2
H&S Sec. 3.4 p. 55
Question 17.8
Question 17.8
What is Hungarian Notation''? Is it worthwhile?
Hungarian Notation is a naming convention, invented by Charles Simonyi, which encodes things about a
variable's type (and perhaps its intended use) in its name. It is well-loved in some circles and roundly
castigated in others. Its chief advantage is that it makes a variable's type or intended use obvious from its
name; its chief disadvantage is that type information is not necessarily a worthwhile thing to carry around
in the name of a variable.
References: Simonyi and Heller, ``The Hungarian Revolution''
Question 17.5
Question 17.5
I came across some code that puts a (void) cast before each call to printf. Why?
printf does return a value, though few programs bother to check the return values from each call.
Since some compilers (and lint) will warn about discarded return values, an explicit cast to (void) is
a way of saying ``Yes, I've decided to ignore the return value from this call, but please continue to warn
me about other (perhaps inadvertently) ignored return values.'' It's also common to use void casts on
calls to strcpy and strcat, since the return value is never surprising.
H&S Sec. 6.2.9 p. 172, Sec. 7.13 pp. 229-30
Question 17.4
Question 17.4
Why do some people write if(0 == x) instead of if(x == 0)?
It's a trick to guard against the common error of writing

if(x = 0)
If you're in the habit of writing the constant before the ==, the compiler will complain if you accidentally
type
if(0 = x)
Evidently it can be easier to remember to reverse the test than it is to remember to type the doubled =
sign.
Question 17.3
Question 17.3
Here's a neat trick for checking whether two strings are equal:
if(!strcmp(s1, s2))
Is this good style?
It is not particularly good style, although it is a popular idiom. The test succeeds if the two strings are
equal, but the use of ! (``not'') suggests that it tests for inequality.
A better option is to use a macro:
#define Streq(s1, s2) (strcmp((s1), (s2)) == 0)
Opinions on code style, like those on religion, can be debated endlessly. Though good style is a worthy
goal, and can usually be recognized, it cannot be rigorously codified. See also question 17.10.
Question 17.1
Question 17.1
What's the best style for code layout in C?
K&R, while providing the example most often copied, also supply a good excuse for disregarding it:
The position of braces is less important, although people hold passionate beliefs. We have
It is more important that the layout chosen be consistent (with itself, and with nearby or common code)
than that it be ``perfect.'' If your coding environment (i.e. local custom or company policy) does not
suggest a style, and you don't feel like inventing your own, just copy K&R. (The tradeoffs between
various indenting and brace placement options can be exhaustively and minutely examined, but don't
warrant repetition here. See also the Indian Hill Style Guide.)
The elusive quality of ``good style'' involves much more than mere code layout details; don't spend time
on formatting to the exclusion of more substantive code quality issues.
K&R2 Sec. 1.2 p. 10
Question 7.19
Question 7.19
My program is crashing, apparently somewhere down inside malloc, but I can't see anything wrong
with it.
It is unfortunately very easy to corrupt malloc's internal data structures, and the resulting problems can
be stubborn. The most common source of problems is writing more to a malloc'ed region than it was
allocated to hold; a particularly common bug is to malloc(strlen(s)) instead of strlen(s) +
1. Other problems may involve using pointers to freed storage, freeing pointers twice, freeing
pointers not obtained from malloc, or trying to realloc a null pointer (see question 7.30).
Memory Allocation
7.1 Why doesn't the code ``char *answer; gets(answer);'' work?
7.2 I can't get strcat to work. I tried ``char *s3 = strcat(s1, s2);'' but I got strange results.
7.3 But the man page for strcat says that it takes two char *'s as arguments. How am I supposed to
know to allocate things?
7.5 I have a function that is supposed to return a string, but when it returns to its caller, the returned
string is garbage.
7.6 Why am I getting ``warning: assignment of pointer from integer lacks a cast'' for calls to malloc?
7.7 Why does some code carefully cast the values returned by malloc to the pointer type being
allocated?
7.8 Why does so much code leave out the multiplication by sizeof(char) when allocating strings?
7.14 I've heard that some operating systems don't actually allocate malloc'ed memory until the program
tries to use it. Is this legal?
7.16 I'm allocating a large array for some numeric work, but malloc is acting strangely.
7.17 I've got 8 meg of memory in my PC. Why can I only seem to malloc 640K or so?
7.19 My program is crashing, apparently somewhere down inside malloc.
7.20 You can't use dynamically-allocated memory after you free it, can you?
7.21 Why isn't a pointer null after calling free?
7.22 When I call malloc to allocate memory for a local pointer, do I have to explicitly free it?
7.23 When I free a dynamically-allocated structure containing pointers, do I have to free each subsidiary
pointer first?
Memory Allocation
7.24 Must I free allocated memory before the program exits?

7.25 Why doesn't my program's memory usage go down when I free memory?
7.26 How does free know how many bytes to free?
7.27 So can I query the malloc package to find out how big an allocated block is?
7.30 Is it legal to pass a null pointer as the first argument to realloc?
7.31 What's the difference between calloc and malloc?
7.32 What is alloca and why is its use discouraged?
top
Question 7.3
Question 7.3
But the man page for strcat says that it takes two char *'s as arguments. How am I supposed to
know to allocate things?
In general, when using pointers you always have to consider memory allocation, if only to make sure that
the compiler is doing it for you. If a library function's documentation does not explicitly mention
allocation, it is usually the caller's problem.
The Synopsis section at the top of a Unix-style man page or in the ANSI C standard can be misleading.
The code fragments presented there are closer to the function definitions used by an implementor than
the invocations used by the caller. In particular, many functions which accept pointers (e.g. to structures
or strings) are usually called with the address of some object (a structure, or an array--see questions 6.3
and 6.4). Other common examples are time (see question 13.12) and stat.
Question 12.38
Question 12.38
How can I read a binary data file properly? I'm occasionally seeing 0x0a and 0x0d values getting
garbled, and it seems to hit EOF prematurely if the data contains the value 0x1a.
When you're reading a binary data file, you should specify "rb" mode when calling fopen, to make
sure that text file translations do not occur. Similarly, when writing binary data files, use "wb".
Note that the text/binary distinction is made when you open the file: once a file is open, it doesn't matter
which I/O calls you use on it. See also question 20.5.
ISO Sec. 7.9.5.3
H&S Sec. 15.2.1 p. 348
Question 20.5
Question 20.5
How can I write data files which can be read on other machines with different word size, byte order, or
floating point formats?
The most portable solution is to use text files (usually ASCII), written with fprintf and read with
fscanf or the like. (Similar advice also applies to network protocols.) Be skeptical of arguments which
imply that text files are too big, or that reading and writing them is too slow. Not only is their efficiency
frequently acceptable in practice, but the advantages of being able to interchange them easily between
machines, and manipulate them with standard tools, can be overwhelming.
If you must use a binary format, you can improve portability, and perhaps take advantage of prewritten
I/O libraries, by making use of standardized formats such as Sun's XDR (RFC 1014), OSI's ASN.1
(referenced in CCITT X.409 and ISO 8825 ``Basic Encoding Rules''), CDF, netCDF, or HDF. See also
questions 2.12 and 12.38.
References: PCS Sec. 6 pp. 86,88
Question 20.3
Question 20.3
How do I access command-line arguments?
They are pointed to by the argv array with which main() is called.
K&R2 Sec. 5.10 pp. 114-118
ANSI Sec. 2.1.2.2.1
ISO Sec. 5.1.2.2.1
H&S Sec. 20.1 p. 416
PCS Sec. 5.6 pp. 81-2, Sec. 11 p. 159, pp. 339-40 Appendix F
Schumacher, ed., Software Solutions in C Sec. 4 pp. 75-85
Question 19.41
Question 19.41
But I can't use all these nonstandard, system-dependent functions, because my program has to be ANSI
compatible!
You're out of luck. Either you misunderstood your requirement, or it's an impossible one to meet.
ANSI/ISO Standard C simply does not define ways of doing these things. (POSIX defines a few.) It is
possible, and desirable, for most of a program to be ANSI-compatible, deferring the system-dependent
functionality to a few routines in a few files which are rewritten for each system ported to.
Question 19.40b
Question 19.40b
How do I use BIOS calls? How can I write ISR's? How can I create TSR's?
These are very particular to specific systems (PC compatibles running MS-DOS, most likely). You'll get
much better information in a specific newsgroup such as comp.os.msdos.programmer or its FAQ list;
another excellent resource is Ralf Brown's interrupt list.
http://www.eskimo.com/~scs/C-faq/q19.40b.html [22/07/2003 5:41:51 PM]
Question 19.40
Question 19.40
How do I... Use sockets? Do networking? Write client/server applications?
All of these questions are outside of the scope of this list and have much more to do with the networking
facilities which you have available than they do with C. Good books on the subject are Douglas Comer's
three-volume Internetworking with TCP/IP and W. R. Stevens's UNIX Network Programming. (There is
also plenty of information out on the net itself.)
Question 19.39
Question 19.39
How can I handle floating-point exceptions gracefully?
On many systems, you can define a routine matherr which will be called when there are certain
floating-point errors, such as errors in the math routines in <math.h>. You may also be able to use
signal (see question 19.38) to catch SIGFPE. See also question 14.9.
Question 19.38
Question 19.38
How can I trap or ignore keyboard interrupts like control-C?
The basic step is to call signal, either as

#include <signal.h>
signal(SIGINT, SIG_IGN);
to ignore the interrupt signal, or as
extern void func(int);
signal(SIGINT, func);
to cause control to transfer to function func on receipt of an interrupt signal.
On a multi-tasking system such as Unix, it's best to use a slightly more involved technique:
extern void func(int);
if(signal(SIGINT, SIG_IGN) != SIG_IGN)
signal(SIGINT, func);
The test and extra call ensure that a keyboard interrupt typed in the foreground won't inadvertently
interrupt a program running in the background (and it doesn't hurt to code calls to signal this way on
any system).
On some systems, keyboard interrupt handling is also a function of the mode of the terminal-input
subsystem; see question 19.1. On some systems, checking for keyboard interrupts is only performed
when the program is reading input, and keyboard interrupt handling may therefore depend on which
input routines are being called (and whether any input routines are active at all). On MS-DOS systems,
setcbrk or ctrlbrk functions may also be involved.
References: ANSI Secs. 4.7,4.7.1
ISO Secs. 7.7,7.7.1
H&S Sec. 19.6 pp. 411-3
PCS Sec. 12 pp. 210-2
POSIX Secs. 3.3.1,3.3.4
Question 19.38
Question 19.37
Question 19.37
How can I implement a delay, or time a user's response, with sub-second resolution?
Unfortunately, there is no portable way. V7 Unix, and derived systems, provided a fairly useful ftime
routine with resolution up to a millisecond, but it has disappeared from System V and POSIX. Other
routines you might look for on your system include clock, delay, gettimeofday, msleep, nap,
napms, setitimer, sleep, times, and usleep. (A routine called wait, however, is at least
under Unix not what you want.) The select and poll calls (if available) can be pressed into service to
implement simple delays. On MS-DOS machines, it is possible to reprogram the system timer and timer
interrupts.
Of these, only clock is part of the ANSI Standard. The difference between two calls to clock gives
elapsed execution time, and if CLOCKS_PER_SEC is greater than 1, the difference will have subsecond
resolution. However, clock gives elapsed processor time used by the current program, which on a
multitasking system may differ considerably from real time.
If you're trying to implement a delay and all you have available is a time-reporting function, you can
implement a CPU-intensive busy-wait, but this is only an option on a single-user, single-tasking machine
as it is terribly antisocial to any other processes. Under a multitasking operating system, be sure to use a
call which puts your process to sleep for the duration, such as sleep or select, or pause in
conjunction with alarm or setitimer.
For really brief delays, it's tempting to use a do-nothing loop like
long int i;
for(i = 0; i < 1000000; i++)
;
but resist this temptation if at all possible! For one thing, your carefully-calculated delay loops will stop
working next month when a faster processor comes out. Perhaps worse, a clever compiler may notice that
the loop does nothing and optimize it away completely.
References: H&S Sec. 18.1 pp. 398-9
PCS Sec. 12 pp. 197-8,215-6
POSIX Sec. 4.5.2
Question 19.37
Question 19.36
Question 19.36
How can I read in an object file and jump to routines in it?
You want a dynamic linker or loader. It may be possible to malloc some space and read in object files,
but you have to know an awful lot about object file formats, relocation, etc. Under BSD Unix, you could
use system and ld -A to do the linking for you. Many versions of SunOS and System V have the -ldl
library which allows object files to be dynamically loaded. Under VMS, use
LIB$FIND_IMAGE_SYMBOL. GNU has a package called ``dld''. See also question 15.13.
Question 15.13
Question 15.13
How can I call a function with an argument list built up at run time?
There is no guaranteed or portable way to do this. If you're curious, however, this list's editor has a few
wacky ideas you could try...
Instead of an actual argument list, you might consider passing an array of generic (void *) pointers.
The called function can then step through the array, much like main() might step through argv.
(Obviously this works only if you have control over all the called functions.)
"wacky ideas" (on the "inverse varargs problem")
Somehow I've never gotten around to writing the definitive essay on those "wacky ideas". Here are
several messages I've written at various times on the subject of what I call the "inverse varargs problem";
they summarize most of the ideas, and should at least get you started. (There is some overlap, because
some of the later ones were written when I didn't have copies of the earlier ones handy.)
article posted to comp.unix.wizards and comp.lang.c 1989-06-04
article posted to comp.lang.c 1992-07-14
mail message sent 1993-03-07 to someone asking about the "wacky ideas"
more recent ideas (1997-06-28)
most recent ideas (2001-05-27)
http://www.eskimo.com/~scs/C-faq/varargs/wackyideas.html [22/07/2003 5:42:03 PM]
"inverse varargs problem", take 1
[This article was originally posted on June 4, 1989. I have altered the presentation slightly for this web page.]
From: scs@adam.pika.mit.edu (Steve Summit)
Newsgroups: comp.unix.wizards,comp.lang.c
Subject: Re: Needed: A (Portable) way of setting up the arg stack
Keywords: 1/varargs, callg
Message-ID: <11830@bloom-beacon.MIT.EDU>
Date: 4 Jun 89 16:43:52 GMT
References: <708@mitisft.Convergent.COM> <32208@apple.Apple.COM> <10354@smoke.BRL.MIL>
In article <708@mitisft.Convergent.COM> Gregory Kemnitz writes:
>I need to know how (or if) *NIX (System V.3) has the ability to let
>a stack of arguments be set for a function before it is called. I
>have several hundred pointers to functions which are called from one
>place, and each function has different numbers of arguments.
A nice problem. Doug Gwyn's suggestion is the right one, for maximum portability, but constrains the form of the
called subroutines and also any calls that do not go through the "inverse varargs mechanism." (That is, you can't
really call the subroutines in question without building the little argument vector.)
For transparency (at some expense in portability) I use a routine I call "callg," named after the VAX instruction of
the same name. (This is equivalent to Peter Desnoyers' "va_call" routine; in retrospect, I like his name better.)
va_call can be implemented in one line of assembly language on the VAX; it typically requires ten or twenty
lines on other machines, to copy the arguments from the vector to the real stack (or wherever arguments are really
passed). I have implementations for the PDP11, NS32000, 68000, and 80x86. (This is a machine specific problem,
not an operating system specific problem.) A routine such as va_call must be written in assembly language; it is
one of the handful of functions I know of that cannot possibly be written in C.
Not all machines use a stack; some use register passing or other conventions. For maximum portability, then, the
interface to a routine like va_call should allow the type of each argument to be explicitly specified, as well as
hiding the details of the argument vector construction. I have been contemplating an interface similar to that
illustrated by the following example:
#include "varargs2.h"
extern printf();
main()
{
va_stack(stack, 10);
va_push(stack,
va_push(stack,
va_push(stack,
va_push(stack,
/* declare vector which holds up to 10 args */
"%d %f %s\n", char *);

12, int);
3.14, double);
"Hello, world!", char *);
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19890604.html (1 of 3) [22/07/2003 5:42:05 PM]
va_call(printf, stack);
}
Note that this calls the standard printf; printf need take no special precautions, and indeed cannot necessarily
tell that it has not been called normally. (This is what I meant by "transparency.")
On a "conventional," stack-based machine, va_stack would declare an array of 10 ints (assuming that int is
the machine's natural word size) and va_push would copy words to it using pointer manipulations analogous to
those used by the va_arg macro in the current varargs and stdarg implementations. (Note that "declare
vector which holds up to 10 args" is therefore misleading; the vector holds up to 10 words, and it is up to the
programmer to leave enough slop for multi-word types such as long and double. The distinction between a
"word" and an "argument" is the one that always comes up when someone suggests supplying a va_nargs()
macro; let's not start that discussion again.)
For a register-passing machine, the choice of registers may depend on the types of the arguments. For this reason,
the interface must allow the type information to be retained in the argument vector for inspection by the va_call
routine. This would be easier to be implement if manifest constants were used, instead of C type names:
va_push(stack, 12, VA_INT);
va_push(stack, 3.14, VA_DOUBLE);
va_push(stack, "Hello, world!", VA_POINTER);
Since it would be tricky to "switch" on these constants inside the va_push macro to decide how many words of the
vector to set aside, separate push macros might be preferable:
va_push_int(stack, 12);
va_push_double(stack, 3.14);
va_push_pointer(stack, "Hello, world!");
(This option has the additional advantage over the single va_push in that it does not require that the second macro
argument be of variable type.) There is still a major difficulty here, however, in that one cannot assume the existence
of a single kind of pointer.
For the "worst" machines, the full generality of C type names (as in the first example) would probably be required.
Unfortunately, to do everything with type names you might want to do, you have to handle them specially in the
compiler. (On the other hand, the machines that would have trouble with va_push are probably the same ones that
already have to have the varargs or stdarg mechanisms recognized by the compiler.)
Lest you think that va_call, if implementable, solves the whole problem, don't let your breath out yet: what
should the return value be? In the most general case, the routines being indirectly called might return different types.
The return value of va_call would not, so to speak, be representable in closed form.
This last wrinkle (variable return type on top of variable arguments passed) is at the heart of a C interpreter that
allows intermixing of interpreted and compiled code. I know how I solved it; I'd be curious to know how Saber C
solves it. (I solved it with two more assembly language routines, also unimplementable in C. A better solution, to
half of the problem, anyway, would be to to provide a third argument, a union pointer of some kind, to va_call
for storing the return value.)

I just whipped together an implementation of the first example, which I have appended for your edification and
amusement, as long as you have a VAX.
Steve Summit
scs@eskimo.com
[The "implementation" consists of three files:
pf.c varargs2.h va_call.s]
http://www.eskimo.com/~scs/C-faq/varargs/pf.c
#include "varargs2.h"
extern printf();
main()
{
va_stack(stack, 10);
/* declare vector which holds up to 10 args */
va_stack_init(stack);
va_push(stack,
va_push(stack,
va_push(stack,
va_push(stack,
"%d %f %s\n", char *);

12, int);
3.14, double);
"Hello, world!", char *);
va_call(printf, stack);
}
http://www.eskimo.com/~scs/C-faq/varargs/pf.c [22/07/2003 5:42:08 PM]
http://www.eskimo.com/~scs/C-faq/varargs/varargs2.h
#ifndef VARARGS2_H
#define VARARGS2_H
#define va_stack(name, max) int name[max+1]
#define va_stack_init(stack) stack[0] = 0
#define va_push(stack, arg, type) *(type *)(&stack[stack[0]+1]) = (arg), \
stack[0] += (sizeof(type) + sizeof(int) - 1) / sizeof(int)
#ifdef __STDC__
extern int va_call(int (*)(), int *);
#endif
#endif
http://www.eskimo.com/~scs/C-faq/varargs/varargs2.h [22/07/2003 5:42:11 PM]
"inverse varargs problem", takes 2 and 3
[This article was originally posted on July 14, 1992. I have edited the text slightly for this web page.]
From: scs@adam.mit.edu (Steve Summit)
Subject: Re: varargs fn calling varargs fn?
Message-ID: <1992Jul14.214625.15684@athena.mit.edu>
References: <1992Jul9.171606.24625@shearson.com> <1992Jul10.015616.12264@infodev.cam.ac.uk>
Date: Tue, 14 Jul 1992 21:46:25 GMT
In article <1992Jul9.171606.24625@shearson.com>, Frank Greco (fgreco@shearson.com) writes:
> Does anyone have any examples of a C function that uses varargs
> that calls another C function that uses the same args?
> ...
> The man page for varargs(3) sez:
>
> The argument list (or its remainder) can be passed to
> another function using a pointer to a variable of type
> va_list()...
>
> What do they mean "pointer to a variable of type va_list()? I've tried
> several combinations and bus-erroring away...
And in article <1992Jul10.015616.12264@infodev.cam.ac.uk>, Colin Bell (crb11@cus.cam.ac.uk)
writes:
> I've solved this one, but have a related one of my own. I have a function which handles
> message handling for a system: ie takes a list of arguments as one would send to
> fprintf prefixed by two or three telling the function what to do with the resulting
> string. What I would like to do is strip these arguments off the front and then pass
> the entire remainder to sprintf, and deal with the resulting string.
Preliminary answers to both of these questions can be found in the comp.lang.c frequently-asked
questions (FAQ) list:
6.2:
How can I write a function that takes a format string and a variable number of
arguments, like printf, and passes them to printf to do most of the work?
A:
Use vprintf, vfprintf, or vsprintf.
Here is an "error" routine which prints an error message, preceded by the string "error: "
and terminated with a newline:
[sample code deleted]

References: K&R II Sec. 8.3 p. 174, Sec. B1.2 p. 245; H&S Sec. 17.12 p. 337; ANSI Secs.
4.9.6.7, 4.9.6.8, 4.9.6.9 .
[current version of this question, now numbered 15.5]
6.4:
How can I write a function which takes a variable number of arguments and passes
them to some other function (which takes a variable number of arguments)?
A:
In general, you cannot. You must provide a version of that other function which
accepts a va_list pointer, as does vfprintf in the example above. If the arguments
must be passed directly as actual arguments (not indirectly through a va_list pointer) to
another function which is itself variadic (for which you do not have the option of creating
an alternate, va_list-accepting version) no portable solution is possible. (The problem
can be solved by resorting to machine-specific assembly language.)
Now, more could of course be said about these issues. In the case of accepting a format string and some
variable arguments and some other arguments, passing the format string and the variable arguments to
vsprintf, and then doing something with the string produced by vsprintf and the other arguments,
there is a real problem in that an appropriate or guaranteed-adequate size for vsprintf's return buffer
cannot readily be determined. (How to attempt to do so is itself a frequently-asked question which the
FAQ list should probably address [and now does]. I'll say no more about it today, except to lament the
fact that there is unfortunately no good, portable solution.)
Frank Greco asked, "What do they mean `pointer to a variable of type va_list()'?". There's
apparently a typo in his manual, but the issue of variable-length argument lists versus "pointers" to same
is a potentially confusing one. Here is how I think about them, extracted from an e-mail discussion I had
last Spring with someone about question 6.4 in the FAQ list:
*
...one often speaks of "the variable-length part of a variable-length argument list," which is kind of a silly
mouthful, but it is often important to distinguish the variable-length part from the fixed part, which in
ANSI C must always exist. (For example, printf always takes one char * argument -- the "fixed"
part -- followed by 0 or more arguments of arbitrary types.)
[Passing a va_list pointer along to another routine] is the preferred way of doing things, but note that
a function that accepts a single va_list pointer is not at all variadic. (As far as C is concerned, it
accepts exactly one argument.)
This is the solution alluded to by the (admittedly rather opaque) FAQ list wording
You must provide a version of that other function which accepts a va_list pointer, as
does vfprintf in the example above.
That is, the answer to the question "Can I write a debug_printf() in terms of printf()?" is "No."
printf() is variadic, but vprintf et al take va_list pointers, so you can write a
debug_printf() in terms of one of them.
*
"Variadic" means (at least as I use it) very specifically that a subroutine, as the C compiler sees it,
accepts a variable number of formal parameters. (Sometimes the term "formal parameter" is used when
we must distinguish very carefully between the generic things a subroutine is set up to receive and the
"actual arguments" which are passed to it by a specific subroutine call.)
It is fairly easy to prove that one cannot call a variadic function with an argument list which varies at run
time (which is what you need to do if you're going to use the variable-length argument list that another
variadic function has just received). The only way that a C compiler will generate a subroutine call is
when you write an expression of the form
function ( argument-list )
where function is an expression which evaluates to a function (either the name of a function, or the
"contents" of a pointer- to-function), and argument-list is a list of 0 or more comma- separated
expressions. Obviously, since they appear in the source code, there must be a fixed number of arguments
(in any one call).
*
[When a variadic function tries to pass the variable arguments along to another, it] doesn't matter whether
or not the first function does any additional "processing" or not, and it doesn't matter whether the second
function uses stdarg.h or not (there are other, less portable methods of accessing variable-length
argument lists, and the second function might not even be written in C). The essential question is whether
a variadic function can call another variadic function (with the strict interpretation of "variadic" which I
use), passing to the second function the (indeterminate number of) arguments which it receives (at runhttp://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19920714.html (3 of 6) [22/07/2003 5:42:15 PM]
time) in a particular call. [Once again, the answer is that it cannot.]

If and when I fix up this question, I'm going to introduce the following explanation, which should help to
clarify things in your mind if they're not clear already.
Just as we can use both integers and pointers-to-integers in C, there are also two ways of
manipulating variable- length argument lists. A function such as printf accepts a
variable-length argument list -- we cannot see exactly how many arguments printf will
be called with. However, we can also conceive of a "pointer to a variable-length argument
list," and this is exactly what a va_list is.
When one is dealing with pointers, one has to be very careful to distinguish between the
pointer and the thing pointed to. It is invariably incorrect to attempt to use a pointer value
when one intends to use the value pointed to. Explicit syntax, e.g. the unary * operator, is
used to dereference a pointer, i.e. to access the value pointed to.
It is similarly meaningless and/or impossible to use a pointer to a variable-length argument
list in place of an actual variable-length argument list, or vice versa. [For example, you
cannot pass a va_list to printf, nor can you call vprintf with a variable number
of arguments.]
Now, it happens that in C we as programmers do not have as much control over, and
freedom with respect to, variable-length argument lists (or fixed-length argument lists, for
that matter). In particular, we cannot create a variable-length argument list, nor can we
pick an argument out of a variable-length argument list.
(We can do only a little bit more with fixed-length argument lists. We can "create" a fixedlength argument list only by writing a subroutine-call expression in a piece of source code,
and we can pick an argument out of a fixed-length argument list only by naming it, in the
context of a subroutine definition, and referring to it by name.)
The only thing we, as programmers, can do with variable-length argument lists are:
1. Given a variable-length argument list, generate a pointer to it, using the va_start
macro. (Under this analysis, va_start is analogous to the unary & operator.)
2. Given a pointer to a variable-length argument list, (i.e. a va_list), pick an
argument (of a particular type) out of it.
However, we can manipulate a va_list (a pointer to a variable-length argument list)
with almost as much freedom as we manipulate other objects in C. In particular, we can
pass a va_list to another subroutine.
There is, however, neither a syntax by which we could declare an "object" of "type"
"variable-length argument list," nor a mechanism by which we could initialize one with a
particular value (i.e. with a particular variable- length argument list). Therefore, we cannot
accept a variable-length argument list in one variadic function and pass it (as a variablelength argument list) to another variadic function. What we can do is to take a pointer (a
va_list) to the variable-length argument list (using va_start), and pass that pointer
(as a single argument) to another function; however, that other, va_list-accepting
function is typically not variadic.
Examples of variadic functions, which accept variable- length argument lists, are printf,
fprintf, and sprintf. Examples of non-variadic functions, which accept pointers to
variable-length argument lists as single va_lists, are vprintf, vfprintf, and
vsprintf.
As a point of general advice, whenever one writes a function which accepts a variable
number of arguments, it is a good idea to write a companion routine which accepts a single
va_list. Writing these two routines is no more work than writing just the one, because
the first, variadic function can be implemented very simply in terms of the second -- it just
has to call va_start, then pass the resultant va_list to the second function.
By implementing a pair of functions, you leave the door open for someone (you or anyone
else) later to build additional functionality "on top" of the function(s), by writing another
variadic function (or, better, a variadic and non-variadic pair) which calls the second,
va_list-accepting, lower-level function as well as doing some additional processing.
If you were to write only a single, variadic function, later programmers who tried to
implement additional functionality on top of it would find themselves in the same situation
as did people trying to write things like debug_printf() in the days before the
v*printf family had been invented: they had no portable way to "hook up" to the core
functionality provided by the *printf family; they had to use nonportable kludges to call
printf with (at least some of) the (variable) arguments which the higher-level functions
received, or they had to abandon printf entirely and reimplement some or all of its
functionality.
Obviously, that's far too long-winded and repetetetive for the FAQ list, but it's a good first draft for me to
try to whittle down and distill the essence out of. (There's also a slight imperfection in its conceptual
model -- although it alludes to "objects of type `variable-length argument list,'" there are really no values
of type "variable-length argument list." All actual argument lists obviously have a certain number of
arguments, so an "object of type `variable-length argument list'" is just one that can hold arbitrary fixedlength argument lists, regardless of how long they are.)
Steve Summit
scs@eskimo.com
Question 15.5
Question 15.5
How can I write a function that takes a format string and a variable number of arguments, like printf,
and passes them to printf to do most of the work?
Use vprintf, vfprintf, or vsprintf.

Here is an error routine which prints an error message, preceded by the string `èrror: '' and terminated
with a newline:
#include <stdio.h>
#include <stdarg.h>
void error(char *fmt, ...)
{
va_list argp;
fprintf(stderr, "error: ");
vfprintf(stderr, fmt, argp);
va_end(argp);
}
References: K&R2 Sec. 8.3 p. 174, Sec. B1.2 p. 245
ANSI Secs. 4.9.6.7,4.9.6.8,4.9.6.9
ISO Secs. 7.9.6.7,7.9.6.8,7.9.6.9
H&S Sec. 15.12 pp. 379-80
PCS Sec. 11 pp. 186-7
Question 15.7
Question 15.7
I have a pre-ANSI compiler, without <stdarg.h>. What can I do?
There's an older header, <varargs.h>, which offers about the same functionality.
CT&P Sec. A.2 pp. 134-139
PCS Sec. 11 pp. 184-5, Sec. 13 p. 250
Question 15.6
Question 15.6
How can I write a function analogous to scanf, that calls scanf to do most of the work?
Unfortunately, vscanf and the like are not standard. You're on your own.
Variable-Length Argument Lists

15.1 I heard that you have to #include <stdio.h> before calling printf. Why?
15.2 How can %f be used for both float and double arguments in printf?
15.3 Why don't function prototypes guard against mismatches in printf's arguments?
15.4 How can I write a function that takes a variable number of arguments?
15.5 How can I write a function that takes a format string and a variable number of arguments, like
printf, and passes them to printf to do most of the work?
15.6 How can I write a function analogous to scanf, that calls scanf to do most of the work?
15.7 I have a pre-ANSI compiler, without <stdarg.h>. What can I do?
15.8 How can I discover how many arguments a function was actually called with?
15.9 My compiler isn't letting me declare a function that accepts only variable arguments.
15.10 Why isn't "va_arg(argp, float)" working?
15.11 I can't get va_arg to pull in an argument of type pointer-to-function.
15.12 How can I write a function which takes a variable number of arguments and passes them to some
other function ?
15.13 How can I call a function with an argument list built up at run time?
top
Question 15.2
Question 15.2
How can %f be used for both float and double arguments in printf? Aren't they different types?
In the variable-length part of a variable-length argument list, the ``default argument promotions'' apply:
types char and short int are promoted to int, and float is promoted to double. (These are the
same promotions that apply to function calls without a prototype in scope, also known as `òld style''
function calls; see question 11.3.) Therefore, printf's %f format always sees a double. (Similarly,
%c always sees an int, as does %hd.) See also questions 12.9 and 12.13.
ISO Sec. 6.3.2.2
H&S Sec. 6.3.5 p. 177, Sec. 9.4 pp. 272-3
Copyright
This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book "C
Programming FAQs: Frequently Asked Questions" (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
http://www.eskimo.com/~scs/C-faq/copyright.html [22/07/2003 5:42:26 PM]
Question 15.8
Question 15.8
How can I discover how many arguments a function was actually called with?
This information is not available to a portable program. Some old systems provided a nonstandard
nargs function, but its use was always questionable, since it typically returned the number of words
passed, not the number of arguments. (Structures, long ints, and floating point values are usually
passed as several words.)
Any function which takes a variable number of arguments must be able to determine from the arguments
themselves how many of them there are. printf-like functions do this by looking for formatting
specifiers (%d and the like) in the format string (which is why these functions fail badly if the format
string does not match the argument list). Another common technique, applicable when the arguments are
all of the same type, is to use a sentinel value (often 0, -1, or an appropriately-cast null pointer) at the end
of the list (see the execl and vstrcat examples in questions 5.2 and 15.4). Finally, if their types are
predictable, you can pass an explicit count of the number of variable arguments (although it's usually a
nuisance for the caller to generate).
References: PCS Sec. 11 pp. 167-8
Question 15.9
Question 15.9
My compiler isn't letting me declare a function
int f(...)
{
}
i.e. with no fixed arguments.
Standard C requires at least one fixed argument, in part so that you can hand it to va_start.
References: ANSI Sec. 3.5.4, Sec. 3.5.4.3, Sec. 4.8.1.1
ISO Sec. 6.5.4, Sec. 6.5.4.3, Sec. 7.8.1.1
H&S Sec. 9.2 p. 263
Question 15.10
Question 15.10
I have a varargs function which accepts a float parameter. Why isn't
va_arg(argp, float)
working?
In the variable-length part of variable-length argument lists, the old ``default argument promotions''
apply: arguments of type float are always promoted (widened) to type double, and types char and
short int are promoted to int. Therefore, it is never correct to invoke va_arg(argp, float);
instead you should always use va_arg(argp, double). Similarly, use va_arg(argp, int) to
retrieve arguments which were originally char, short, or int. See also questions 11.3 and 15.2.
ISO Sec. 6.3.2.2
H&S Sec. 11.4 p. 297
Question 15.11
Question 15.11
I can't get va_arg to pull in an argument of type pointer-to-function.
The type-rewriting games which the va_arg macro typically plays are stymied by overly-complicated
types such as pointer-to-function. If you use a typedef for the function pointer type, however, all will
be well. See also question 1.21.
ISO Sec. 7.8.1.2
Question 15.12
Question 15.12
How can I write a function which takes a variable number of arguments and passes them to some other
function (which takes a variable number of arguments)?
In general, you cannot. Ideally, you should provide a version of that other function which accepts a
va_list pointer (analogous to vfprintf; see question 15.5). If the arguments must be passed
directly as actual arguments, or if you do not have the option of rewriting the second function to accept a
va_list (in other words, if the second, called function must accept a variable number of arguments,
not a va_list), no portable solution is possible. (The problem could perhaps be solved by resorting to
machine-specific assembly language; see also question 15.13.)
[This message was originally sent on March 7, 1993, to someone who asked about the "wacky ideas".]
From: scs@adam.mit.edu (Steve Summit)
Subject: Re: dynamic function call
Date: Sun, 7 Mar 93 17:58:52 -0500
Message-Id: <9303072258.AA22973@adam.MIT.EDU>
You wrote:
> I was curious to find out your ideas on the below question appearing
> in the C language FAQ:
>
> 7.5: How can I call a function with an argument list built up at run
> time?
>
> A: There is no guaranteed or portable way to do this. If you're
> curious, ask this list's editor, who has a few wacky ideas you
> could try... (See also question 16.10.)
Believe it or not, you're the first to have asked, so you get to be the first to hear me apologize for the fact
that several elaborate discussions of those "wacky ideas" which I've written at various times are in fact
stuck off in my magtape archives somewhere, not readily accessible.
[The two main ones I was probably thinking of are this one and this one.]
Here is an outline; I'll try to find one of my longer old descriptions and send it later.
The basic idea is to postulate the existence of the following routine:
#include <stdarg.h>
callg(funcp, argp)
int (*funcp)();
magic_arglist_pointer argp;
This routine calls the function pointed to by funcp, passing to it the argument list pointed to by argp.
It returns whatever the pointed-to function returns.
The second question is of course how to construct the argument list pointed to by argp. I've had a
number of ideas over the years on how to do this; perhaps the best (or at least the one I'm currently
happiest with) is to do something like this:
extern func();
va_alloc(arglist, 10);
va_list argp;
va_start2(arglist, argp);
va_arg(argp, int) = 1;
va_arg(argp, double) = 2.3;
va_arg(argp, char *) = "four";
va_finish2(arglist, argp);
callg(func, arglist);
The above is equivalent to the simple call
func(1, 2.3, "four");
Now, the interesting thing is that it's often (perhaps even "usually") possible to construct va_alloc,
va_start2, and va_finish2 macros such as I've illustrated above such that the standard va_arg
macro out of <stdarg.h> or <varargs.h> does the real work. (In other words, the traditional
implementations of va_arg would in fact work in an lvalue context, i.e. on the left hand side of an
assignment.)
I'm fairly sure I've got working versions of macros along the lines of va_alloc, va_start2, and
va_finish2 macros somewhere, but I can't find them at the moment, either. At the end of this
message I'll append a different set of macros, not predicated on the assumption of being able to re-use the
existing va_arg on the left hand side, which should serve as an example of the essential
implementation ideas.
A third question is what to do if the return type (not just the argument list) of the called function is not
known at compile time. (If you're still with me, we're moving in the direction of doing so much at run
time that we've practically stopped compiling and started interpreting, and in fact many of the ideas I'm
discussing in this note come out of my attempts to write a full-blown C interpreter.) We can answer the
third question by adding a third argument to our hypothetical callg() routine, involving a tag
describing the type of result we expect and a union in which any return value can be stored.
A fourth question is how to write this hypothetical callg() function. It is one of my favorite examples
of a function that essentially must be written in assembler, not for efficiency, but because it's something
you simply can't do in C. It's actually not terribly hard to write; I've got implementations of it for most of
the machines I use. I write it for a new environment by compiling a simple little program fragment such
as
int nargs;
int args[10];
int (*funcp)();
int i;
switch(nargs)
{
case 0: (*funcp)();
break;
case 1: (*funcp)(args[0]);
break;
case 2: (*funcp)(args[0], args[1]);
break;
...
for(i = 0; i < nargs; i++)
(*funcp)(args[i]);
, and massaging the assembly language output to push a variable number of arguments before performing
the (single) call. It's possible to do this without knowing very much else about the assembly/machine
language in use. (It helps a lot to have a compiler output or listing option which lets you see the its
generated assembly language.)
When I say it's generally easy to write callg, I'm thinking of conventional, stack-based machines.
Some modern machines pass some or all arguments in registers, and which registers are used can depend
on the type of the arguments, which makes this sort of thing much harder.
A fifth question is where my name "callg" comes from. The VAX has a single instruction, CALLG,
which does exactly what you want here. (In other words, an assembly-language implementation of this
callg() routine is a single line of assembler on the VAX.) I've also used the name va_call instead
of callg.
If you have questions or comments prompted by any of this, or if you'd like to see more code fragments,
feel free to ask.
Steve Summit
scs@eskimo.com
[The aforementioned "set of macros" is in this file: varargs2.h .]
[This message was originally sent on June 28, 1997, when I was temporarily acting as a moderator for
comp.lang.c.moderated. It mentions, without going into too much detail, two more-recent ideas I've
received on the subject.]
Subject: Re: Interpreter -- Calling functions
Date: Sat, 28 Jun 1997 10:44:03 -0700 (PDT)
Message-Id: <199706281744.KAA07952@solutions.solon.com>
X-scs-References: <199612161422.OAA02621@goday.ac.upc.es>
<9701171704.ZM25160@collie.hpl.hp.com>
Besides the ideas I passed along in that other message, here are two I've received more recently from
other netters. Fermin Javier Reig had a variation on the idea of using an interface like
Push(arg1);
Push(arg2);
...
Call(func);
, although his idea ran into several problems of its own, and anyway it addresses the part of the problem
that you've already solved, so I won't say more about it here. Jonathan Thompson had the idea of
"creating a very large structure of chars", filling in that structure with an image of the argument list (i.e.
the stack frame) to be built, and then passing what looks like a single argument, i.e. that one structure, to
the called function, which could -- with luck -- interpret it as multiple arguments. I think this trick could
be made to work on some machines, although I haven't experimented with it yet, and it obviously
assumes a pure stack calling model, i.e. it wouldn't work at all on a register-passing machine.
Steve Summit
comp.lang.c.moderated co-moderator
scs@eskimo.com
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19970628.html [22/07/2003 5:42:40 PM]
[This message was originally sent on May 27, 2001, to someone who was asking about constructing
variable-length argument lists. I have edited the text slightly for this web page.]
Date: Sun, 27 May 2001 10:49:32 -0500
Message-Id: <2001May27.1049.scs.009@aeroroot.scs.ndip.eskimo.net>
Subject: Re: Variable Argument lists in C
You wrote:
> Hi, I was searching on Google for a way to build a variable of type
> va_list. My task is to accept a variable number of parameters with
> one function, remove a few arguments from that list, and then pass the
> va_list to another function, probably vprintf or vsprintf...
> Do you know how this could be accomplished? I either need
> to simply remove the args from the va_list that I obtain from the
> first function call, or build a new va_list and omit the unwanted args.
Well, there are two very different answers to this question. The first is that there is no portable way of
doing this sort of thing at all; the "solutions" all involve either assembly language or ghastly kludges, or
both. If at all possible, you should find some other way of accomplishing your higher task that doesn't
involve trying to build or manipulate argument lists on the fly. Getting involved with dynamic argument
lists is a lot like getting involved with absinthe: darkly exotic and stimulating at first, but destructive and
mind-rotting in the end.
That's the sober, responsible answer. The second answer is that dynamic function calls are exotic and
stimulating and tons of fun. But they're not easy; they do require either assembly language or ghastly
kludges (or both). This is what question 15.13 in the comp.lang.c FAQ list is about; I'm appending my
collection of "wacky ideas" at the end of this message. Besides the ideas there, lately I've been having
mostly good luck with a set of gcc extensions: __builtin_apply_args, __builtin_apply, and
__builtin_return. (I can't tell you how to use these, because I barely understand them myself, but
if you're using gcc, they might be an option for you. The only documentation I know of for them is the
gcc Info node "Constructing Calls".)
Steve Summit
scs@eskimo.com
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.20010527.html [22/07/2003 5:42:42 PM]
Question 19.33
Question 19.33
How can a process change an environment variable in its caller?
It may or may not be possible to do so at all. Different operating systems implement global name/value
functionality similar to the Unix environment in different ways. Whether the `ènvironment'' can be
usefully altered by a running program, and if so, how, is system-dependent.
Under Unix, a process can modify its own environment (some systems provide setenv or putenv
functions for the purpose), and the modified environment is generally passed on to child processes, but it
is not propagated back to the parent process.
Question 19.32
Question 19.32
How can I automatically locate a program's configuration files in the same directory as the executable?
It's hard; see also question 19.31. Even if you can figure out a workable way to do it, you might want to
consider making the program's auxiliary (library) directory configurable, perhaps with an environment
variable. (It's especially important to allow variable placement of a program's configuration files when
the program will be used by several people, e.g. on a multiuser system.)
Question 19.31
Question 19.31
How can my program discover the complete pathname to the executable from which it was invoked?
argv[0] may contain all or part of the pathname, or it may contain nothing. You may be able to
duplicate the command language interpreter's search path logic to locate the executable if the name in
argv[0] is present but incomplete. However, there is no guaranteed solution.
K&R2 Sec. 5.10 p. 115
ANSI Sec. 2.1.2.2.1
ISO Sec. 5.1.2.2.1
H&S Sec. 20.1 p. 416
Question 14.9
Question 14.9
How do I test for IEEE NaN and other special values?
Many systems with high-quality IEEE floating-point implementations provide facilities (e.g. predefined
constants, and functions like isnan(), either as nonstandard extensions in <math.h> or perhaps in
<ieee.h> or <nan.h>) to deal with these values cleanly, and work is being done to formally
standardize such facilities. A crude but usually effective test for NaN is exemplified by
#define isnan(x) ((x) != (x))
although non-IEEE-aware compilers may optimize the test away.
Another possibility is to format the value in question using sprintf: on many systems it generates
strings like "NaN" and "Inf" which you could compare for in a pinch.
Question 14.8
Question 14.8
The pre-#defined constant M_PI seems to be missing from my machine's copy of <math.h>.
That constant (which is apparently supposed to be the value of pi, accurate to the machine's precision), is
not standard. If you need pi, you'll have to #define it yourself.
Question 14.7
Question 14.7
Why doesn't C have an exponentiation operator?
Because few processors have an exponentiation instruction. C has a pow function, declared in
<math.h>, although explicit multiplication is often better for small positive integral exponents.
ISO Sec. 7.5.5.1
H&S Sec. 17.6 p. 393
Question 13.26
Question 13.26
I'm still getting errors due to library functions being undefined, even though I'm explicitly requesting the
right libraries while linking.
Many linkers make one pass over the list of object files and libraries you specify, and extract from
libraries only those modules which satisfy references which have so far come up as undefined. Therefore,
the order in which libraries are listed with respect to object files (and each other) is significant; usually,
you want to search the libraries last. (For example, under Unix, put any -l options towards the end of
the command line.) See also question 13.28.
Question 13.28
Question 13.28
What does it mean when the linker says that _end is undefined?
That message is a quirk of the old Unix linkers. You get an error about _end being undefined only when
other things are undefined, too--fix the others, and the error about _end will disappear. (See also
questions 13.25 and 13.26.)
Library Functions

13.1 How can I convert numbers to strings?
13.2 Why does strncpy not always write a '\0'?
13.5 Why do some versions of toupper act strangely if given an upper-case letter?
13.6 How can I split up a string into whitespace-separated fields?
13.7 I need some code to do regular expression and wildcard matching.
13.8 I'm trying to sort an array of strings with qsort, using strcmp as the comparison function, but it's
not working.
13.9 Now I'm trying to sort an array of structures, but the compiler is complaining that the function is of
the wrong type for qsort.
13.10 How can I sort a linked list?
13.11 How can I sort more data than will fit in memory?
13.12 How can I get the time of day in a C program?
13.13 How can I convert a struct tm or a string into a time_t?
13.14 How can I perform calendar manipulations?
13.15 I need a random number generator.
13.16 How can I get random integers in a certain range?
13.17 Each time I run my program, I get the same sequence of numbers back from rand().
13.18 I need a random true/false value, so I'm just taking rand() % 2, but it's alternating 0, 1, 0, 1, 0...
13.20 How can I generate random numbers with a normal or Gaussian distribution?
Library Functions
13.24 I'm trying to port this old program. Why do I get `ùndefined external'' errors for some library
functions?
13.25 I get errors due to library functions being undefined even though I #include the right header
files.
13.26 I'm still getting errors due to library functions being undefined, even though I'm requesting the
right libraries.
13.28 What does it mean when the linker says that _end is undefined?
top
Question 13.14
Question 13.14
How can I add n days to a date? How can I find the difference between two dates?
The ANSI/ISO Standard C mktime and difftime functions provide some support for both problems.
mktime accepts non-normalized dates, so it is straightforward to take a filled-in struct tm, add or
subtract from the tm_mday field, and call mktime to normalize the year, month, and day fields (and
incidentally convert to a time_t value). difftime computes the difference, in seconds, between two
time_t values; mktime can be used to compute time_t values for two dates to be subtracted.
These solutions are only guaranteed to work correctly for dates in the range which can be represented as
time_t's. The tm_mday field is an int, so day offsets of more than 32,736 or so may cause overflow.
Note also that at daylight saving time changeovers, local days are not 24 hours long.
Another approach to both problems is to use ``Julian day'' numbers. Implementations of Julian day
routines can be found in the file JULCAL10.ZIP from the Simtel/Oakland archives (see question 18.16)
and the ``Date conversions'' article mentioned in the References.
ANSI Secs. 4.12.2.2,4.12.2.3
ISO Secs. 7.12.2.2,7.12.2.3
H&S Secs. 18.4,18.5 pp. 401-2
David Burki, ``Date Conversions''
Miscellaneous
20. Miscellaneous
20.1 How can I return multiple values from a function?
20.3 How do I access command-line arguments?
20.5 How can I write data files which can be read on other machines with different data formats?
20.6 How can I call a function, given its name as a string?
20.8 How can I implement sets or arrays of bits?
20.9 How can I determine whether a machine's byte order is big-endian or little-endian?
20.10 How can I convert integers to binary or hexadecimal?
20.11 Can I use base-2 constants (something like 0b101010)?
Is there a printf format for binary?
20.12 What is the most efficient way to count the number of bits which are set in a value?
20.13 How can I make my code more efficient?
20.14 Are pointers really faster than arrays? How much do function calls slow things down?
20.17 Is there a way to switch on strings?
20.18 Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)?
20.19 Are the outer parentheses in return statements really optional?
20.20 Why don't C comments nest? Are they legal inside quoted strings?
20.24 Why doesn't C have nested functions?
20.25 How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C?
Miscellaneous
20.26 Does anyone know of a program for converting Pascal or FORTRAN to C?

20.27 Can I use a C++ compiler to compile C code?
20.28 I need to compare two strings for close, but not necessarily exact, equality.
20.29 What is hashing?
20.31 How can I find the day of the week given the date?
20.32 Will 2000 be a leap year?
20.34 How do you write a program which produces its own source code as its output?
20.35 What is ``Duff's Device''?
20.36 When will the next Obfuscated C Code Contest be held? How can I get a copy of previous winning
entries?
20.37 What was the entry keyword mentioned in K&R1?
20.38 Where does the name ``C'' come from, anyway?
20.39 How do you pronounce ``char''?
20.40 Where can I get extra copies of this list?
top
Question 20.10
Question 20.10
How can I convert integers to binary or hexadecimal?
Make sure you really know what you're asking. Integers are stored internally in binary, although for most
purposes it is not incorrect to think of them as being in octal, decimal, or hexadecimal, whichever is
convenient. The base in which a number is expressed matters only when that number is read in from or
written out to the outside world.
In source code, a non-decimal base is indicated by a leading 0 or 0x (for octal or hexadecimal,
respectively). During I/O, the base of a formatted number is controlled in the printf and scanf
family of functions by the choice of format specifier (%d, %o, %x, etc.) and in the strtol and
strtoul functions by the third argument. During binary I/O, however, the base again becomes
immaterial.
For more information about ``binary'' I/O, see question 2.11. See also questions 8.6 and 13.1.
References: ANSI Secs. 4.10.1.5,4.10.1.6
ISO Secs. 7.10.1.5,7.10.1.6
Question 2.11
Question 2.11
How can I read/write structures from/to data files?
It is relatively straightforward to write a structure out using fwrite:

fwrite(&somestruct, sizeof somestruct, 1, fp);
and a corresponding fread invocation can read it back in. (Under pre-ANSI C, a (char *) cast on
the first argument is required. What's important is that fwrite receive a byte pointer, not a structure
pointer.) However, data files so written will not be portable (see questions 2.12 and 20.5). Note also that
if the structure contains any pointers, only the pointer values will be written, and they are most unlikely
to be valid when read back in. Finally, note that for widespread portability you must use the "b" flag
when fopening the files; see question 12.38.
A more portable solution, though it's a bit more work initially, is to write a pair of functions for writing
and reading a structure, field-by-field, in a portable (perhaps even human-readable) way.
References: H&S Sec. 15.13 p. 381
Question 8.6
Question 8.6
How can I get the numeric (character set) value corresponding to a character, or vice versa?
In C, characters are represented by small integers corresponding to their values (in the machine's
character set), so you don't need a conversion routine: if you have the character, you have its value.
Question 8.3
Question 8.3
If I can say
char a[] = "Hello, world!";
why can't I say
char a[14];
a = "Hello, world!";
Strings are arrays, and you can't assign arrays directly. Use strcpy instead:
strcpy(a, "Hello, world!");
Question 4.2
Question 4.2
I'm trying to declare a pointer and allocate some space for it, but it's not working. What's wrong with this
code?
char *p;
*p = malloc(10);
The pointer you declared is p, not *p. To make a pointer point somewhere, you just use the name of the
pointer:
p = malloc(10);
It's when you're manipulating the pointed-to memory that you use * as an indirection operator:
*p = 'H';
References: CT&P Sec. 3.1 p. 28
Question 3.16
Question 3.16
I have a complicated expression which I have to assign to one of two variables, depending on a
condition. Can I use code like this?
((condition) ? a : b) = complicated_expression;
No. The ?: operator, like most operators, yields a value, and you can't assign to a value. (In other words,
?: does not yield an lvalue.) If you really want to, you can try something like
*((condition) ? &a : &b) = complicated_expression;
although this is admittedly not as pretty.
References: ANSI Sec. 3.3.15 esp. footnote 50
ISO Sec. 6.3.15
H&S Sec. 7.1 pp. 179-180
Question 3.14
Question 3.14
Why doesn't the code
int a = 1000, b = 1000;
long int c = a * b;
work?
Under C's integral promotion rules, the multiplication is carried out using int arithmetic, and the result
may overflow or be truncated before being promoted and assigned to the long int left-hand side. Use
an explicit cast to force long arithmetic:
long int c = (long int)a * b;
Note that (long int)(a * b) would not have the desired effect.
A similar problem can arise when two integers are divided, with the result assigned to a floating-point
variable.
K&R2 Sec. 2.7 p. 44
ANSI Sec. 3.2.1.5
ISO Sec. 6.2.1.5
H&S Sec. 6.3.4 p. 176
CT&P Sec. 3.9 pp. 49-50
Question 3.12
Question 3.12
If I'm not using the value of the expression, should I use i++ or ++i to increment a variable?
Since the two forms differ only in the value yielded, they are entirely equivalent when only their side
effect is needed.
K&R2 Sec. 2.8 p. 47
ANSI Sec. 3.3.2.4, Sec. 3.3.3.1
ISO Sec. 6.3.2.4, Sec. 6.3.3.1
H&S Sec. 7.4.4 pp. 192-3, Sec. 7.5.8 pp. 199-200
Question 3.4
Question 3.4
Can I use explicit parentheses to force the order of evaluation I want? Even if I don't, doesn't precedence
dictate it?
Not in general.
Operator precedence and explicit parentheses impose only a partial ordering on the evaluation of an
expression. In the expression
f() + g() * h()
although we know that the multiplication will happen before the addition, there is no telling which of the
three functions will be called first.
When you need to ensure the order of subexpression evaluation, you may need to use explicit temporary
variables and separate statements.
References: K&R1 Sec. 2.12 p. 49, Sec. A.7 p. 185
K&R2 Sec. 2.12 pp. 52-3, Sec. A.7 p. 200
Question 3.5
Question 3.5
But what about the && and || operators?
I see code like ``while((c = getchar()) != EOF && c != '\n')'' ...
There is a special exception for those operators (as well as the ?: operator): left-to-right evaluation is
guaranteed (as is an intermediate sequence point, see question 3.8). Any book on C should make this
clear.
References: K&R1 Sec. 2.6 p. 38, Secs. A7.11-12 pp. 190-1
K&R2 Sec. 2.6 p. 41, Secs. A7.14-15 pp. 207-8
ANSI Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15
ISO Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15
H&S Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 218-20, Sec. 7.12.1 p. 229
CT&P Sec. 3.7 pp. 46-7
Question 4.3
Question 4.3
Does *p++ increment p, or what it points to?
Unary operators like *, ++, and -- all associate (group) from right to left. Therefore, *p++ increments
p (and returns the value pointed to by p before the increment). To increment the value pointed to by p,
use (*p)++ (or perhaps ++*p, if the order of the side effect doesn't matter).
K&R2 Sec. 5.1 p. 95
ANSI Sec. 3.3.2, Sec. 3.3.3
ISO Sec. 6.3.2, Sec. 6.3.3
H&S Sec. 7.4.4 pp. 192-3, Sec. 7.5 p. 193, Secs. 7.5.7,7.5.8 pp. 199-200
Question 4.5
Question 4.5
I have a char * pointer that happens to point to some ints, and I want to step it over them. Why
doesn't
((int *)p)++;
work?
In C, a cast operator does not mean ``pretend these bits have a different type, and treat them
accordingly''; it is a conversion operator, and by definition it yields an rvalue, which cannot be assigned
to, or incremented with ++. (It is an anomaly in pcc-derived compilers, and an extension in gcc, that
expressions such as the above are ever accepted.) Say what you mean: use
p = (char *)((int *)p + 1);
or (since p is a char *) simply
p += sizeof(int);
Whenever possible, you should choose appropriate pointer types in the first place, instead of trying to
treat one type as another.
ANSI Sec. 3.3.4 (esp. footnote 14)
ISO Sec. 6.3.4
H&S Sec. 7.1 pp. 179-80
Pointers
4. Pointers
4.2 What's wrong with "char *p; *p = malloc(10);"?
4.3 Does *p++ increment p, or what it points to?
4.5 I want to use a char * pointer to step over some ints. Why doesn't "((int *)p)++;" work?
4.8 I have a function which accepts, and is supposed to initialize, a pointer, but the pointer in the caller
remains unchanged.
4.9 Can I use a void ** pointer to pass a generic pointer to a function by reference?
4.10 I have a function which accepts a pointer to an int. How can I pass a constant like 5 to it?
4.11 Does C even have ``pass by reference''?
4.12 I've seen different methods used for calling functions via pointers.
top
Question 8.2
Question 8.2
I'm checking a string to see if it matches a particular value. Why isn't this code working?
char *string;
...
if(string == "value") {
/* string matches "value" */
...
}
Strings in C are represented as arrays of characters, and C never manipulates (assigns, compares, etc.)
arrays as a whole. The == operator in the code fragment above compares two pointers--the value of the
pointer variable string and a pointer to the string literal "value"--to see if they are equal, that is, if
they point to the same place. They probably don't, so the comparison never succeeds.
To compare two strings, you generally use the library function strcmp:
if(strcmp(string, "value") == 0) {
/* string matches "value" */
...
}
Question 8.1
Question 8.1
Why doesn't
strcat(string, '!');
work?
There is a very real difference between characters and strings, and strcat concatenates strings.
Characters in C are represented by small integers corresponding to their character set values (see also
question 8.6). Strings are represented by arrays of characters; you usually manipulate a pointer to the first
character of the array. It is never correct to use one when the other is expected. To append a ! to a string,
use
strcat(string, "!");
Question 8.9
Question 8.9
I think something's wrong with my compiler: I just noticed that sizeof('a') is 2, not 1 (i.e. not
sizeof(char)).
Perhaps surprisingly, character constants in C are of type int, so sizeof('a') is sizeof(int)

(though it's different in C++). See also question 7.8.
ISO Sec. 6.1.3.4
H&S Sec. 2.7.3 p. 29
Question 7.8
Question 7.8
I see code like
char *p = malloc(strlen(s) + 1);
strcpy(p, s);
Shouldn't that be malloc((strlen(s) + 1) * sizeof(char))?
It's never necessary to multiply by sizeof(char), since sizeof(char) is, by definition, exactly 1.
(On the other hand, multiplying by sizeof(char) doesn't hurt, and may help by introducing a
size_t into the expression.) See also question 8.9.
ISO Sec. 6.3.3.4
H&S Sec. 7.5.2 p. 195
Question 7.7
Question 7.7
Why does some code carefully cast the values returned by malloc to the pointer type being allocated?
Before ANSI/ISO Standard C introduced the void * generic pointer type, these casts were typically
required to silence warnings (and perhaps induce conversions) when assigning between incompatible
pointer types. (Under ANSI/ISO Standard C, these casts are no longer necessary.)
Question 7.6
Question 7.6
Why am I getting ``warning: assignment of pointer from integer lacks a cast'' for calls to malloc?
Have you #included <stdlib.h>, or otherwise arranged for malloc to be declared properly?
References: H&S Sec. 4.7 p. 101
Question 7.14
Question 7.14
I've heard that some operating systems don't actually allocate malloc'ed memory until the program tries
to use it. Is this legal?
It's hard to say. The Standard doesn't say that systems can act this way, but it doesn't explicitly say that
they can't, either.
ISO Sec. 7.10.3
Question 7.16
Question 7.16
I'm allocating a large array for some numeric work, using the line
double *array = malloc(256 * 256 * sizeof(double));
malloc isn't returning null, but the program is acting strangely, as if it's overwriting memory, or
malloc isn't allocating as much as I asked for, or something.
Notice that 256 x 256 is 65,536, which will not fit in a 16-bit int, even before you multiply it by
sizeof(double). If you need to allocate this much memory, you'll have to be careful. If size_t
(the type accepted by malloc) is a 32-bit type on your machine, but int is 16 bits, you might be able
to get away with writing 256 * (256 * sizeof(double)) (see question 3.14). Otherwise, you'll
have to break your data structure up into smaller chunks, or use a 32-bit machine, or use some
nonstandard memory allocation routines. See also question 19.23.
Question 19.22
Question 19.22
How can I find out how much memory is available?
Your operating system may provide a routine which returns this information, but it's quite systemdependent.
Question 19.20
Question 19.20
How can I read a directory in a C program?
See if you can use the opendir and readdir routines, which are part of the POSIX standard and are
available on most Unix variants. Implementations also exist for MS-DOS, VMS, and other systems. (MSDOS also has FINDFIRST and FINDNEXT routines which do essentially the same thing.) readdir
only returns file names; if you need more information about the file, try calling stat. To match
filenames to some wildcard pattern, see question 13.7.
PCS Sec. 13 pp. 230-1
POSIX Sec. 5.1
Schumacher, ed., Software Solutions in C Sec. 8
Question 19.18
Question 19.18
I'm getting an error, ``Too many open files''. How can I increase the allowable number of simultaneously
open files?
There are actually at least two resource limitations on the number of simultaneously open files: the
number of low-level ``file descriptors'' or ``file handles'' available in the operating system, and the
number of FILE structures available in the stdio library. Both must be sufficient. Under MS-DOS
systems, you can control the number of operating system file handles with a line in CONFIG.SYS. Some
compilers come with instructions (and perhaps a source file or two) for increasing the number of stdio
FILE structures.
Question 7.17
Question 7.17
I've got 8 meg of memory in my PC. Why can I only seem to malloc 640K or so?
Under the segmented architecture of PC compatibles, it can be difficult to use more than 640K with any
degree of transparency. See also question 19.23.
Question 7.20
Question 7.20
You can't use dynamically-allocated memory after you free it, can you?
No. Some early documentation for malloc stated that the contents of freed memory were ``left
undisturbed,'' but this ill-advised guarantee was never universal and is not required by the C Standard.
Few programmers would use the contents of freed memory deliberately, but it is easy to do so
accidentally. Consider the following (correct) code for freeing a singly-linked list:
struct list *listp, *nextp;
for(listp = base; listp != NULL; listp = nextp) {
nextp = listp->next;
free((void *)listp);
}
and notice what would happen if the more-obvious loop iteration expression listp = listp->next
were used, without the temporary nextp pointer.
ANSI Sec. 4.10.3
ISO Sec. 7.10.3
H&S Sec. 16.2 p. 387
CT&P Sec. 7.10 p. 95
Question 7.32
Question 7.32
What is alloca and why is its use discouraged?
alloca allocates memory which is automatically freed when the function which called alloca
returns. That is, memory allocated with alloca is local to a particular function's ``stack frame'' or
context.
alloca cannot be written portably, and is difficult to implement on machines without a conventional
stack. Its use is problematical (and the obvious implementation on a stack-based machine fails) when its
return value is passed directly to another function, as in fgets(alloca(100), 100, stdin).
For these reasons, alloca is not Standard and cannot be used in programs which must be widely
portable, no matter how useful it might be.
Question 16.6
Question 16.6
Why does this code:
char *p = "hello, world!";
p[0] = 'H';
crash?
String literals are not necessarily modifiable, except (in effect) when they are used as array initializers.
Try
char a[] = "hello, world!";
ISO Sec. 6.1.4
H&S Sec. 2.7.4 pp. 31-2
Question 16.5
Question 16.5
This program runs perfectly on one machine, but I get weird results on another. Stranger still, adding or
removing debugging printouts changes the symptoms...
Lots of things could be going wrong; here are a few of the more common things to check:
uninitialized local variables (see also question 7.1)

integer overflow, especially on 16-bit machines, especially of an intermediate result when doing
things like a * b / c (see also question 3.14)
undefined evaluation order (see questions 3.1 through 3.4)
omitted declaration of external functions, especially those which return something other than int
(see questions 1.25 and 14.2)
dereferenced null pointers (see section 5)
improper malloc/free use: assuming malloced memory contains 0, assuming freed storage
persists, freeing something twice (see also questions 7.20 and 7.19)
pointer problems in general (see also question 16.8)
mismatch between printf format and arguments, especially trying to print long ints using
%d (see question 12.9)
trying to malloc(256 * 256 * sizeof(double)), especially on machines with limited
memory (see also questions 7.16 and 19.23)
array bounds problems, especially of small, temporary buffers, perhaps used for constructing
strings with sprintf (see also questions 7.1 and 12.21)
invalid assumptions about the mapping of typedefs, especially size_t
floating point problems (see questions 14.1 and 14.4)
anything you thought was a clever exploitation of the way you believe code is generated for your
specific system
Proper use of function prototypes can catch several of these problems; lint would catch several more.

Question 16.3
Question 16.3
This program crashes before it even runs! (When single-stepping with a debugger, it dies before the first
statement in main.)
You probably have one or more very large (kilobyte or more) local arrays. Many systems have fixed-size
stacks, and those which perform dynamic stack allocation automatically (e.g. Unix) can be confused
when the stack tries to grow by a huge chunk all at once. It is often better to declare large arrays with
static duration (unless of course you need a fresh set with each recursive call, in which case you could
dynamically allocate them with malloc; see also question 1.31).
(See also questions 11.12, 16.4, 16.5, and 18.4.)
Question 11.12
Question 11.12
Can I declare main as void, to shut off these annoying ``main returns no value'' messages?
No. main must be declared as returning an int, and as taking either zero or two arguments, of the
appropriate types. If you're calling exit() but still getting warnings, you may have to insert a
redundant return statement (or use some kind of ``not reached'' directive, if available).
Declaring a function as void does not merely shut off or rearrange warnings: it may also result in a
different function call/return sequence, incompatible with what the caller (in main's case, the C run-time
startup code) expects.
(Note that this discussion of main pertains only to ``hosted'' implementations; none of it applies to
``freestanding'' implementations, which may not even have main. However, freestanding
implementations are comparatively rare, and if you're using one, you probably know it. If you've never
heard of the distinction, you're probably using a hosted implementation, and the above rules apply.)
References: ANSI Sec. 2.1.2.2.1, Sec. F.5.1
ISO Sec. 5.1.2.2.1, Sec. G.5.1
H&S Sec. 20.1 p. 416
CT&P Sec. 3.10 pp. 50-51
Question 11.10
Question 11.10
Why can't I pass a char ** to a function which expects a const char **?
You can use a pointer-to-T (for any type T) where a pointer-to-const-T is expected. However, the rule
(an explicit exception) which permits slight mismatches in qualified pointer types is not applied
recursively, but only at the top level.
You must use explicit casts (e.g. (const char **) in this case) when assigning (or passing) pointers
which have qualifier mismatches at other than the first level of indirection.
References: ANSI Sec. 3.1.2.6, Sec. 3.3.16.1, Sec. 3.5.3
ISO Sec. 6.1.2.6, Sec. 6.3.16.1, Sec. 6.5.3
H&S Sec. 7.9.1 pp. 221-2
Question 11.13
Question 11.13
But what about main's third argument, envp?
It's a non-standard (though common) extension. If you really need to access the environment in ways
beyind what the standard getenv function provides, though, the global variable environ is probably a
better avenue (though it's equally non-standard).
References: ANSI Sec. F.5.1
ISO Sec. G.5.1
H&S Sec. 20.1 pp. 416-7
Question 11.14
Question 11.14
I believe that declaring void main() can't fail, since I'm calling exit instead of returning, and
anyway my operating system ignores a program's exit/return status.
It doesn't matter whether main returns or not, or whether anyone looks at the status; the problem is that
when main is misdeclared, its caller (the runtime startup code) may not even be able to call it correctly
(due to the potential clash of calling conventions; see question 11.12). Your operating system may ignore
the exit status, and void main() may work for you, but it is not portable and not correct.
Question 11.15
Question 11.15
The book I've been using, C Programing for the Compleat Idiot, always uses void main().
Perhaps its author counts himself among the target audience. Many books unaccountably use void
main() in examples. They're wrong.
Question 11.16
Question 11.16
Is exit(status) truly equivalent to returning the same status from main?
Yes and no. The Standard says that they are equivalent. However, a few older, nonconforming systems
may have problems with one or the other form. Also, a return from main cannot be expected to work
if data local to main might be needed during cleanup; see also question 16.4. (Finally, the two forms are
obviously not equivalent in a recursive call to main.)
ANSI Sec. 2.1.2.2.3
ISO Sec. 5.1.2.2.3
Question 16.4
Question 16.4
I have a program that seems to run correctly, but it crashes as it's exiting, after the last statement in
main(). What could be causing this?
Look for a misdeclared main() (see questions 2.18 and 10.9), or local buffers passed to setbuf or
setvbuf, or problems in cleanup functions registered by atexit. See also questions 7.5 and 11.16.
Question 18.4
Question 18.4
I just typed in this program, and it's acting strangely. Can you see anything wrong with it?
See if you can run lint first (perhaps with the -a, -c, -h, -p or other options). Many C compilers are
really only half-compilers, electing not to diagnose numerous source code difficulties which would not
actively preclude code generation.
References: Ian Darwin, Checking C Programs with lint
Question 18.5
Question 18.5
How can I shut off the ``warning: possible pointer alignment problem'' message which lint gives me
for each call to malloc?
The problem is that traditional versions of lint do not know, and cannot be told, that malloc ``returns
a pointer to space suitably aligned for storage of any type of object.'' It is possible to provide a
pseudoimplementation of malloc, using a #define inside of #ifdef lint, which effectively shuts
this warning off, but a simpleminded definition will also suppress meaningful messages about truly
incorrect invocations. It may be easier simply to ignore the message, perhaps in an automated way with
grep -v. (But don't get in the habit of ignoring too many lint messages, otherwise one day you'll
overlook a significant one.)
Question 18.7
Question 18.7
Where can I get an ANSI-compatible lint?
Products called PC-Lint and FlexeLint (in ``shrouded source form,'' for compilation on 'most any system)
are available from
Gimpel Software
3207 Hogarth Lane
Collegeville, PA 19426
(+1) 610 584 4261
gimpel@netaxs.com
USA
The Unix System V release 4 lint is ANSI-compatible, and is available separately (bundled with other
C tools) from UNIX Support Labs or from System V resellers.
Another ANSI-compatible lint (which can also perform higher-level formal verification) is LCLint,
available via anonymous ftp from larch.lcs.mit.edu in pub/Larch/lclint/.
In the absence of lint, many modern compilers do attempt to diagnose almost as many problems as
lint does.
Question 18.8
Question 18.8
Don't ANSI function prototypes render lint obsolete?
Not really. First of all, prototypes work only if they are present and correct; an inadvertently incorrect
prototype is worse than useless. Secondly, lint checks consistency across multiple source files, and
checks data declarations as well as functions. Finally, an independent program like lint will probably
always be more scrupulous at enforcing compatible, portable coding practices than will any particular,
implementation-specific, feature- and extension-laden compiler.
If you do want to use function prototypes instead of lint for cross-file consistency checking, make sure
that you set the prototypes up correctly in header files. See questions 1.7 and 10.6.
Tools and Resources

18.1 I'm looking for C development tools (cross-reference generators, code beautifiers, etc.).
18.2 How can I track down these pesky malloc problems?
18.3 What's a free or cheap C compiler I can use?
18.4 I just typed in this program, and it's acting strangely. Can you see anything wrong with it?
18.5 How can I shut off the ``warning: possible pointer alignment problem'' message which lint gives
me for each call to malloc?
18.7 Where can I get an ANSI-compatible lint?
18.8 Don't ANSI function prototypes render lint obsolete?
18.9 Are there any C tutorials or other resources on the net?
18.10 What's a good book for learning C?
18.13 Where can I find the sources of the standard C libraries?
18.14 I need code to parse and evaluate expressions.
18.15 Where can I get a BNF or YACC grammar for C?
18.15a Does anyone have a C compiler test suite I can use?
18.16 Where and how can I get copies of all these freely distributable programs?
top
Question 11.17
Question 11.17
I'm trying to use the ANSI ``stringizing'' preprocessing operator `#' to insert the value of a symbolic
constant into a message, but it keeps stringizing the macro's name rather than its value.
You can use something like the following two-step procedure to force a macro to be expanded as well as
stringized:
#define Str(x) #x
#define Xstr(x) Str(x)
#define OP plus
char *opname = Xstr(OP);
This code sets opname to "plus" rather than "OP".
An equivalent circumlocution is necessary with the token-pasting operator ## when the values (rather
than the names) of two macros are to be concatenated.
References: ANSI Sec. 3.8.3.2, Sec. 3.8.3.5 example
ISO Sec. 6.8.3.2, Sec. 6.8.3.5
Question 11.18
Question 11.18
What does the message ``warning: macro replacement within a string literal'' mean?
Some pre-ANSI compilers/preprocessors interpreted macro definitions like

#define TRACE(var, fmt) printf("TRACE: var = fmt\n", var)
such that invocations like
TRACE(i, %d);
were expanded as
printf("TRACE: i = %d\n", i);
In other words, macro parameters were expanded even inside string literals and character constants.
Macro expansion is not defined in this way by K&R or by Standard C. When you do want to turn macro
arguments into strings, you can use the new # preprocessing operator, along with string literal
concatenation (another new ANSI feature):
#define TRACE(var, fmt) \
printf("TRACE: " #var " = " #fmt "\n", var)
Question 11.22
Question 11.22
Is char a[3] = "abc"; legal? What does it mean?
It is legal in ANSI C (and perhaps in a few pre-ANSI systems), though useful only in rare circumstances.
It declares an array of size three, initialized with the three characters 'a', 'b', and 'c', without the
usual terminating '\0' character. The array is therefore not a true C string and cannot be used with
strcpy, printf %s, etc.
Most of the time, you should let the compiler count the initializers when initializing arrays (in the case of
the initializer "abc", of course, the computed size will be 4).
ISO Sec. 6.5.7
H&S Sec. 4.6.4 p. 98
Question 11.24
Question 11.24
Why can't I perform arithmetic on a void * pointer?
The compiler doesn't know the size of the pointed-to objects. Before performing arithmetic, convert the
pointer either to char * or to the pointer type you're trying to manipulate (but see also question 4.5).
ISO Sec. 6.1.2.5, Sec. 6.3.6
H&S Sec. 7.6.2 p. 204
Question 11.25
Question 11.25
What's the difference between memcpy and memmove?
memmove offers guaranteed behavior if the source and destination arguments overlap. memcpy makes
no such guarantee, and may therefore be more efficiently implementable. When in doubt, it's safer to use
memmove.
ANSI Sec. 4.11.2.1, Sec. 4.11.2.2
ISO Sec. 7.11.2.1, Sec. 7.11.2.2
H&S Sec. 14.3 pp. 341-2
PCS Sec. 11 pp. 165-6
Question 11.26
Question 11.26
What should malloc(0) do? Return a null pointer or a pointer to 0 bytes?
The ANSI/ISO Standard says that it may do either; the behavior is implementation-defined (see question
11.33).
ISO Sec. 7.10.3
PCS Sec. 16.1 p. 386
Question 11.27
Question 11.27
Why does the ANSI Standard not guarantee more than six case-insensitive characters of external
identifier significance?
The problem is older linkers which are under the control of neither the ANSI/ISO Standard nor the C
compiler developers on the systems which have them. The limitation is only that identifiers be significant
in the first six characters, not that they be restricted to six characters in length. This limitation is
annoying, but certainly not unbearable, and is marked in the Standard as `òbsolescent,'' i.e. a future
revision will likely relax it.
This concession to current, restrictive linkers really had to be made, no matter how vehemently some
people oppose it. (The Rationale notes that its retention was ``most painful.'') If you disagree, or have
thought of a trick by which a compiler burdened with a restrictive linker could present the C programmer
with the appearance of more significance in external identifiers, read the excellently-worded section 3.1.2
in the X3.159 Rationale (see question 11.1), which discusses several such schemes and explains why
they could not be mandated.
References: ANSI Sec. 3.1.2, Sec. 3.9.1
ISO Sec. 6.1.2, Sec. 6.9.1
H&S Sec. 2.5 pp. 22-3
Question 18.15a
Question 18.15a
Does anyone have a C compiler test suite I can use?
Plum Hall (formerly in Cardiff, NJ; now in Hawaii) sells one; another package is Ronald Guilmette's
RoadTest(tm) Compiler Test Suites (ftp to netcom.com, pub/rfg/roadtest/announce.txt for information).
The FSF's GNU C (gcc) distribution includes a c-torture-test which checks a number of common
problems with compilers. Kahan's paranoia test, found in netlib/paranoia on netlib.att.com, strenuously
tests a C implementation's floating point capabilities.
http://www.eskimo.com/~scs/C-faq/q18.15a.html [22/07/2003 5:44:21 PM]
Strange Problems

16.3 This program crashes before it even runs!
16.4 I have a program that seems to run correctly, but then crashes as it's exiting.
16.5 This program runs perfectly on one machine, but I get weird results on another.
16.6 Why does the code "char *p = "hello, world!"; p[0] = 'H';" crash?
16.8 What does ``Segmentation violation'' mean?
top
Characters and Strings

8.1 Why doesn't "strcat(string, '!');" work?
8.2 Why won't the test if(string == "value") correctly compare string against the value?
8.3 Why can't I assign strings to character arrays?
8.6 How can I get the numeric (character set) value corresponding to a character?
8.9 Why is sizeof('a') not 1?
top
Question 9.1
Question 9.1
What is the right type to use for Boolean values in C? Why isn't it a standard type? Should I use
#defines or enums for the true and false values?
C does not provide a standard Boolean type, in part because picking one involves a space/time tradeoff
which can best be decided by the programmer. (Using an int may be faster, while using char may
save data space. Smaller types may make the generated code bigger or slower, though, if they require lots
of conversions to and from int.)
The choice between #defines and enumeration constants for the true/false values is arbitrary and not
terribly interesting (see also questions 2.22 and 17.10). Use any of
#define TRUE 1 #define YES 1
#define FALSE 0 #define NO 0
enum bool {false, true};
enum bool {no, yes};
or use raw 1 and 0, as long as you are consistent within one program
or project. (An enumeration may be preferable if your debugger shows
the names of enumeration constants when examining variables.)
Some people prefer variants like
#define TRUE (1==1)
#define FALSE (!TRUE)
or define ``helper'' macros such as
#define Istrue(e) ((e) != 0)
These don't buy anything (see question 9.2; see also questions 5.12 and 10.2).
Question 9.1
Question 5.12
Question 5.12
I use the preprocessor macro
#define Nullptr(type) (type *)0
to help me build null pointers of the correct type.
This trick, though popular and superficially attractive, does not buy much. It is not needed in assignments
and comparisons; see question 5.2. It does not even save keystrokes. Its use may suggest to the reader
that the program's author is shaky on the subject of null pointers, requiring that the #definition of the
macro, its invocations, and all other pointer usages be checked. See also questions 9.1 and 10.2.
Question 10.2
Question 10.2
Here are some cute preprocessor macros:
#define begin
#define end
{
}
What do y'all think?
Bleah. See also section 17.
Style
17. Style
17.1 What's the best style for code layout in C?
17.3 Is the code "if(!strcmp(s1, s2))" good style?
17.4 Why do some people write if(0 == x) instead of if(x == 0)?
17.5 I came across some code that puts a (void) cast before each call to printf. Why?
17.8 What is Hungarian Notation''? Is it worthwhile?
17.9 Where can I get the `Ìndian Hill Style Guide'' and other coding standards?
17.10 Some people say that goto's are evil and that I should never use them. Isn't that a bit extreme?
top
Question 9.3
Question 9.3
Is if(p), where p is a pointer, a valid conditional?
Yes. See question 5.3.
Boolean Expressions and Variables

9.1 What is the right type to use for Boolean values in C?
9.2 What if a built-in logical or relational operator ``returns'' something other than 1?
9.3 Is if(p), where p is a pointer, valid?
top
Question 10.3
Question 10.3
How can I write a generic macro to swap two values?
There is no good answer to this question. If the values are integers, a well-known trick using exclusiveOR could perhaps be used, but it will not work for floating-point values or pointers, or if the two values
are the same variable (and the `òbvious'' supercompressed implementation for integral types
a^=b^=a^=b is illegal due to multiple side-effects; see question 3.2). If the macro is intended to be
used on values of arbitrary type (the usual goal), it cannot use a temporary, since it does not know what
type of temporary it needs (and would have a hard time naming it if it did), and standard C does not
provide a typeof operator.
The best all-around solution is probably to forget about using a macro, unless you're willing to pass in the
type as a third argument.
Question 10.4
Question 10.4
What's the best way to write a multi-statement macro?
The usual goal is to write a macro that can be invoked as if it were a statement consisting of a single
function call. This means that the ``caller'' will be supplying the final semicolon, so the macro body
should not. The macro body cannot therefore be a simple brace-enclosed compound statement, because
syntax errors would result if it were invoked (apparently as a single statement, but with a resultant extra
semicolon) as the if branch of an if/else statement with an explicit else clause.
The traditional solution, therefore, is to use
#define MACRO(arg1, arg2) do { \
/* declarations */
\
stmt1;
\
stmt2;
\
/* ... */
\
} while(0)
/* (no trailing ; ) */
When the caller appends a semicolon, this expansion becomes a single statement regardless of context.
(An optimizing compiler will remove any ``dead'' tests or branches on the constant condition 0, although
lint may complain.)
If all of the statements in the intended macro are simple expressions, with no declarations or loops,
another technique is to write a single, parenthesized expression using one or more comma operators. (For
an example, see the first DEBUG() macro in question 10.26.) This technique also allows a value to be
``returned.''
CT&P Sec. 6.3 pp. 82-3
C Preprocessor
10. C Preprocessor
10.2 I've got some cute preprocessor macros that let me write C code that looks more like Pascal. What
do y'all think?
10.3 How can I write a generic macro to swap two values?
10.4 What's the best way to write a multi-statement macro?
10.6 What are .h files and what should I put in them?
10.7 Is it acceptable for one header file to #include another?
10.8 Where are header (``#include'') files searched for?
10.9 I'm getting strange syntax errors on the very first declaration in a file, but it looks fine.
10.11 Where can I get a copy of a missing header file?
10.12 How can I construct preprocessor #if expressions which compare strings?
10.13 Does the sizeof operator work in preprocessor #if directives?
10.14 Can I use an #ifdef in a #define line, to define something two different ways?
10.15 Is there anything like an #ifdef for typedefs?
10.16 How can I use a preprocessor #if expression to detect endianness?
10.18 How can I preprocess some code to remove selected conditional compilations, without
preprocessing everything?
10.19 How can I list all of the pre#defined identifiers?
10.20 I have some old code that tries to construct identifiers with a macro like "#define Paste(a,
b) a/**/b", but it doesn't work any more.
C Preprocessor
10.22
What does the message ``warning: macro replacement within a string literal'' mean?
10.23 How can I use a macro argument inside a string literal in the macro expansion?
10.25 I've got this tricky preprocessing I want to do and I can't figure out a way to do it.
10.26 How can I write a macro which takes a variable number of arguments?
top
Question 10.13
Question 10.13
Does the sizeof operator work in preprocessor #if directives?
No. Preprocessing happens during an earlier phase of compilation, before type names have been parsed.
Instead of sizeof, consider using the predefined constants in ANSI's <limits.h>, if applicable, or
perhaps a ``configure'' script. (Better yet, try to write code which is inherently insensitive to type sizes.)
References: ANSI Sec. 2.1.1.2, Sec. 3.8.1 footnote 83
ISO Sec. 5.1.1.2, Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 10.14
Question 10.14
Can I use an #ifdef in a #define line, to define something two different ways?
No. You can't ``run the preprocessor on itself,'' so to speak. What you can do is use one of two
completely separate #define lines, depending on the #ifdef setting.
References: ANSI Sec. 3.8.3, Sec. 3.8.3.4
ISO Sec. 6.8.3, Sec. 6.8.3.4
H&S Sec. 3.2 pp. 40-1
Question 10.15
Question 10.15
Is there anything like an #ifdef for typedefs?
Unfortunately, no. (See also question 10.13.)

References: ANSI Sec. 2.1.1.2, Sec. 3.8.1 footnote 83
ISO Sec. 5.1.1.2, Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 10.19
Question 10.19
How can I list all of the pre#defined identifiers?
There's no standard way, although it is a common need. If the compiler documentation is unhelpful, the
most expedient way is probably to extract printable strings from the compiler or preprocessor executable
with something like the Unix strings utility. Beware that many traditional system-specific
pre#defined identifiers (e.g. `ùnix'') are non-Standard (because they clash with the user's
namespace) and are being removed or renamed.
Question 10.20
Question 10.20
I have some old code that tries to construct identifiers with a macro like
#define Paste(a, b) a/**/b
but it doesn't work any more.
It was an undocumented feature of some early preprocessor implementations (notably John Reiser's) that
comments disappeared entirely and could therefore be used for token pasting. ANSI affirms (as did
K&R1) that comments are replaced with white space. However, since the need for pasting tokens was
demonstrated and real, ANSI introduced a well-defined token-pasting operator, ##, which can be used
like this:
#define Paste(a, b) a##b
ISO Sec. 6.8.3.3
H&S Sec. 3.3.9 p. 52
Question 10.22
Question 10.22
Why is the macro
#define TRACE(n) printf("TRACE: %d\n", n)
giving me the warning ``macro replacement within a string literal''? It seems to be expanding
TRACE(count);
as
printf("TRACE: %d\count", count);
See question 11.18.
Question 10.23
Question 10.23
How can I use a macro argument inside a string literal in the macro expansion?
See question 11.18.
Question 10.25
Question 10.25
I've got this tricky preprocessing I want to do and I can't figure out a way to do it.
C's preprocessor is not intended as a general-purpose tool. (Note also that it is not guaranteed to be
available as a separate program.) Rather than forcing it to do something inappropriate, consider writing
your own little special-purpose preprocessing tool, instead. You can easily get a utility like make(1) to
run it for you automatically.
If you are trying to preprocess something other than C, consider using a general-purpose preprocessor.
(One older one available on most Unix systems is m4.)
Question 5.13
Question 5.13
This is strange. NULL is guaranteed to be 0, but the null pointer is not?
When the term ``null'' or ``NULL'' is casually used, one of several things may be meant:
1. 1. The conceptual null pointer, the abstract language concept defined in question 5.1. It is
implemented with...
2. 2. The internal (or run-time) representation of a null pointer, which may or may not be all-bits-0
and which may be different for different pointer types. The actual values should be of concern
only to compiler writers. Authors of C programs never see them, since they use...
3. 3. The null pointer constant, which is a constant integer 0 (see question 5.2). It is often hidden
behind...
4. 4. The NULL macro, which is #defined to be 0 or ((void *)0) (see question 5.4). Finally,
as red herrings, we have...
5. 5. The ASCII null character (NUL), which does have all bits zero, but has no necessary relation to
the null pointer except in name; and...
6. 6. The ``null string,'' which is another name for the empty string (""). Using the term ``null
string'' can be confusing in C, because an empty string involves a null ('\0') character, but not a
null pointer, which brings us full circle...
This article uses the phrase ``null pointer'' (in lower case) for sense 1, the character ``0'' or the phrase
``null pointer constant'' for sense 3, and the capitalized word ``NULL'' for sense 4.
Question 5.14
Question 5.14
Why is there so much confusion surrounding null pointers? Why do these questions come up so often?
C programmers traditionally like to know more than they need to about the underlying machine
implementation. The fact that null pointers are represented both in source code, and internally to most
machines, as zero invites unwarranted assumptions. The use of a preprocessor macro (NULL) may seem
to suggest that the value could change some day, or on some weird machine. The construct `ìf(p ==
0)'' is easily misread as calling for conversion of p to an integral type, rather than 0 to a pointer type,
before the comparison. Finally, the distinction between the several uses of the term ``null'' (listed in
question 5.13) is often overlooked.
One good way to wade out of the confusion is to imagine that C used a keyword (perhaps nil, like
Pascal) as a null pointer constant. The compiler could either turn nil into the correct type of null pointer
when it could determine the type from the source code, or complain when it could not. Now in fact, in C
the keyword for a null pointer constant is not nil but 0, which works almost as well, except that an
uncast 0 in a non-pointer context generates an integer zero instead of an error message, and if that uncast
0 was supposed to be a null pointer constant, the code may not work.
Question 5.15
Question 5.15
I'm confused. I just can't understand all this null pointer stuff.
Follow these two simple rules:

1. When you want a null pointer constant in source code, use ``0'' or ``NULL''.
2. If the usage of ``0'' or ``NULL'' is an argument in a function call, cast it to the pointer type
expected by the function being called.
The rest of the discussion has to do with other people's misunderstandings, with the internal
representation of null pointers (which you shouldn't need to know), and with ANSI C refinements.
Understand questions 5.1, 5.2, and 5.4, and consider 5.3, 5.9, 5.13, and 5.14, and you'll do fine.
Question 20.11
Question 20.11
Can I use base-2 constants (something like 0b101010)?
Is there a printf format for binary?
No, on both counts. You can convert base-2 string representations to integers with strtol.
Question 20.12
Question 20.12
What is the most efficient way to count the number of bits which are set in a value?
Many ``bit-fiddling'' problems like this one can be sped up and streamlined using lookup tables (but see
question 20.13).
Question 20.13
Question 20.13
How can I make my code more efficient?
Efficiency, though a favorite comp.lang.c topic, is not important nearly as often as people tend to think it
is. Most of the code in most programs is not time-critical. When code is not time-critical, it is far more
important that it be written clearly and portably than that it be written maximally efficiently. (Remember
that computers are very, very fast, and that even `ìnefficient'' code can run without apparent delay.)
It is notoriously difficult to predict what the ``hot spots'' in a program will be. When efficiency is a
concern, it is important to use profiling software to determine which parts of the program deserve
attention. Often, actual computation time is swamped by peripheral tasks such as I/O and memory
allocation, which can be sped up by using buffering and caching techniques.
Even for code that is time-critical, it is not as important to ``microoptimize'' the coding details. Many of
the `èfficient coding tricks'' which are frequently suggested (e.g. substituting shift operators for
multiplication by powers of two) are performed automatically by even simpleminded compilers.
Heavyhanded optimization attempts can make code so bulky that performance is actually degraded, and
are rarely portable (i.e. they may speed things up on one machine but slow them down on another). In
any case, tweaking the coding usually results in at best linear performance improvements; the big payoffs
are in better algorithms.
For more discussion of efficiency tradeoffs, as well as good advice on how to improve efficiency when it
is important, see chapter 7 of Kernighan and Plauger's The Elements of Programming Style, and Jon
Bentley's Writing Efficient Programs.
Question 20.14
Question 20.14
Are pointers really faster than arrays? How much do function calls slow things down? Is ++i faster than
i = i + 1?
Precise answers to these and many similar questions depend of course on the processor and compiler in
use. If you simply must know, you'll have to time test programs carefully. (Often the differences are so
slight that hundreds of thousands of iterations are required even to see them. Check the compiler's
assembly language output, if available, to see if two purported alternatives aren't compiled identically.)
It is `ùsually'' faster to march through large arrays with pointers rather than array subscripts, but for
some processors the reverse is true.
Function calls, though obviously incrementally slower than in-line code, contribute so much to
modularity and code clarity that there is rarely good reason to avoid them.
Before rearranging expressions such as i = i + 1, remember that you are dealing with a compiler,
not a keystroke-programmable calculator. Any decent compiler will generate identical code for ++i, i
+= 1, and i = i + 1. The reasons for using ++i or i += 1 over i = i + 1 have to do with
style, not efficiency. (See also question 3.12.)
Question 20.24
Question 20.24
Why doesn't C have nested functions?
It's not trivial to implement nested functions such that they have the proper access to local variables in
the containing function(s), so they were deliberately left out of C as a simplification. (gcc does allow
them, as an extension.) For many potential uses of nested functions (e.g. qsort comparison functions),
an adequate if slightly cumbersome solution is to use an adjacent function with static declaration,
communicating if necessary via a few static variables. (A cleaner solution when such functions must
communicate is to pass around a pointer to a structure containing the necessary context.)
Question 20.25
Question 20.25
How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C? (And vice versa?)
The answer is entirely dependent on the machine and the specific calling sequences of the various
compilers in use, and may not be possible at all. Read your compiler documentation very carefully;
sometimes there is a ``mixed-language programming guide,'' although the techniques for passing
arguments and ensuring correct run-time startup are often arcane. More information may be found in
FORT.gz by Glenn Geers, available via anonymous ftp from suphys.physics.su.oz.au in the src directory.
cfortran.h, a C header file, simplifies C/FORTRAN interfacing on many popular machines. It is available
via anonymous ftp from zebra.desy.de (131.169.2.244).
In C++, a "C" modifier in an external function declaration indicates that the function is to be called
using C calling conventions.
Question 20.26
Question 20.26
Does anyone know of a program for converting Pascal or FORTRAN (or LISP, Ada, awk, `Òld'' C, ...)
to C?
Several freely distributable programs are available:
p2c 8 A Pascal to C converter written by Dave Gillespie, posted to comp.sources.unix in March,

1990 (Volume 21); also available by anonymous ftp from csvax.cs.caltech.edu, file pub/p2c1.20.tar.Z .
ptoc Another Pascal to C converter, this one written in Pascal (comp.sources.unix, Volume 10,
also patches in Volume 13?).
f2c A FORTRAN to C converter jointly developed by people from Bell Labs, Bellcore, and
Carnegie Mellon. To find out more about f2c, send the mail message ``send index from f2c'' to
netlib@research.att.com or research!netlib. (It is also available via anonymous ftp on
netlib.att.com, in directory netlib/f2c.)
This FAQ list's maintainer also has available a list of a few other commercial translation products, and
some for more obscure languages.
Question 20.27
Question 20.27
Is C++ a superset of C? Can I use a C++ compiler to compile C code?
C++ was derived from C, and is largely based on it, but there are some legal C constructs which are not
legal C++. Conversely, ANSI C inherited several features from C++, including prototypes and const,
so neither language is really a subset or superset of the other. In spite of the differences, many C
programs will compile correctly in a C++ environment, and many recent compilers offer both C and C++
compilation modes.
References: H&S p. xviii, Sec. 1.1.5 p. 6, Sec. 2.8 pp. 36-7, Sec. 4.9 pp. 104-107
Question 20.28
Question 20.28
I need a sort of an `àpproximate'' strcmp routine, for comparing two strings for close, but not necessarily
exact, equality.
Some nice information and algorithms having to do with approximate string matching, as well as a useful
bibliography, can be found in Sun Wu and Udi Manber's paper `ÀGREP--A Fast Approximate PatternMatching Tool.''
Another approach involves the ``soundex'' algorithm, which maps similar-sounding words to the same
codes. Soundex was designed for discovering similar-sounding names (for telephone directory
assistance, as it happens), but it can be pressed into service for processing arbitrary words.
References: Knuth Sec. 6 pp. 391-2 Volume 3
Wu and Manber, `ÀGREP--A Fast Approximate Pattern-Matching Tool''
Question 20.29
Question 20.29
What is hashing?
Hashing is the process of mapping strings to integers, usually in a relatively small range. A ``hash
function'' maps a string (or some other data structure) to a a bounded number (the ``hash bucket'') which
can more easily be used as an index in an array, or for performing repeated comparisons. (Obviously, a
mapping from a potentially huge set of strings to a small set of integers will not be unique. Any
algorithm using hashing therefore has to deal with the possibility of ``collisions.'') Many hashing
functions and related algorithms have been developed; a full treatment is beyond the scope of this list.
Knuth Sec. 6.4 pp. 506-549 Volume 3
Sedgewick Sec. 16 pp. 231-244
Question 20.38
Question 20.38
Where does the name ``C'' come from, anyway?
C was derived from Ken Thompson's experimental language B, which was inspired by Martin Richards's
BCPL (Basic Combined Programming Language), which was a simplification of CPL (Cambridge
Programming Language). For a while, there was speculation that C's successor might be named P (the
third letter in BCPL) instead of D, but of course the most visible descendant language today is C++.
Question 20.39
Question 20.39
How do you pronounce ``char''?
You can pronounce the C keyword ``char'' in at least three ways: like the English words ``char,'' ``care,''
or ``car;'' the choice is arbitrary.
Question 13.15
Question 13.15
I need a random number generator.
The Standard C library has one: rand. The implementation on your system may not be perfect, but
writing a better one isn't necessarily easy, either.
If you do find yourself needing to implement your own random number generator, there is plenty of
literature out there; see the References. There are also any number of packages on the net: look for r250,
RANLIB, and FSULTRA (see question 18.16).
References: K&R2 Sec. 2.7 p. 46, Sec. 7.8.7 p. 168
ANSI Sec. 4.10.2.1
ISO Sec. 7.10.2.1
H&S Sec. 17.7 p. 393
PCS Sec. 11 p. 172
Knuth Vol. 2 Chap. 3 pp. 1-177
Park and Miller, ``Random Number Generators: Good Ones are hard to Find''
Question 13.16
Question 13.16
How can I get random integers in a certain range?
The obvious way,

rand() % N
/* POOR */
(which tries to return numbers from 0 to N-1) is poor, because the low-order bits of many random
number generators are distressingly non-random. (See question 13.18.) A better method is something like
(int)((double)rand() / ((double)RAND_MAX + 1) * N)
If you're worried about using floating point, you could use
rand() / (RAND_MAX / N + 1)
Both methods obviously require knowing RAND_MAX (which ANSI #defines in <stdlib.h>), and
assume that N is much less than RAND_MAX.
(Note, by the way, that RAND_MAX is a constant telling you what the fixed range of the C library rand
function is. You cannot set RAND_MAX to some other value, and there is no way of requesting that rand
return numbers in some other range.)
If you're starting with a random number generator which returns floating-point values between 0 and 1,
all you have to do to get integers from 0 to N-1 is multiply the output of that generator by N.
PCS Sec. 11 p. 172
Question 13.18
Question 13.18
I need a random true/false value, so I'm just taking rand() % 2, but it's alternating 0, 1, 0, 1, 0...
Poor pseudorandom number generators (such as the ones unfortunately supplied with some systems) are
not very random in the low-order bits. Try using the higher-order bits: see question 13.16.
References: Knuth Sec. 3.2.1.1 pp. 12-14
Question 13.17
Question 13.17
Each time I run my program, I get the same sequence of numbers back from rand().
You can call srand to seed the pseudo-random number generator with a truly random initial value.
Popular seed values are the time of day, or the elapsed time before the user presses a key (although
keypress times are hard to determine portably; see question 19.37). (Note also that it's rarely useful to
call srand more than once during a run of a program; in particular, don't try calling srand before each
call to rand, in an attempt to get ``really random'' numbers.)
ANSI Sec. 4.10.2.2
ISO Sec. 7.10.2.2
H&S Sec. 17.7 p. 393
Question 13.20
Question 13.20
How can I generate random numbers with a normal or Gaussian distribution?
Here is one method, by Box and Muller, and recommended by Knuth:

#include <stdlib.h>
#include <math.h>
double gaussrand()
{
static double V1, V2, S;
static int phase = 0;
double X;
if(phase == 0) {
do {
double U1 = (double)rand() / RAND_MAX;
double U2 = (double)rand() / RAND_MAX;
V1 = 2 * U1 - 1;
V2 = 2 * U2 - 1;
S = V1 * V1 + V2 * V2;
} while(S >= 1 || S == 0);
X = V1 * sqrt(-2 * log(S) / S);
} else
X = V2 * sqrt(-2 * log(S) / S);
phase = 1 - phase;
return X;
}
See the extended versions of this list (see question 20.40) for other ideas.
References: Knuth Sec. 3.4.1 p. 117
Box and Muller, `À Note on the Generation of Random Normal Deviates''
Press et al., Numerical Recipes in C Sec. 7.2 pp. 288-290
Question 13.20
Question 13.24
Question 13.24
I'm trying to port this old program. Why do I get `ùndefined external'' errors for some library functions?
Some old or semistandard functions have been renamed or replaced over the years;
if you need:/you should instead:
index
use strchr.
rindex
use strrchr.
bcopy
use memmove, after interchanging the first and second arguments (see also question 11.25).
bcmp
use memcmp.
bzero
use memset, with a second argument of 0.
Contrariwise, if you're using an older system which is missing the functions in the second column, you
may be able to implement them in terms of, or substitute, the functions in the first.
References: PCS Sec. 11
Question 12.34
Question 12.34
Once I've used freopen, how can I get the original stdout (or stdin) back?
There isn't a good way. If you need to switch back, the best solution is not to have used freopen in the
first place. Try using your own explicit output (or input) stream variable, which you can reassign at will,
while leaving the original stdout (or stdin) undisturbed.
Question 12.33
Question 12.33
How can I redirect stdin or stdout to a file from within a program?
Use freopen (but see question 12.34).

ISO Sec. 7.9.5.4
H&S Sec. 15.2
undefined behavior
[This article was originally posted on July 28, 1996, in the midst of a thread on precedence and order of evaluation. I have
edited the text slightly for this web page.]
Subject: Re: precedence vs. order of evaluation
Message-ID: <Dv9v5H.EGu@eskimo.com>
X-scs-References: <199607252021.NAA15633@mail.eskimo.com>
References: <01bb798a$11913f80$87ee6fce@timpent.airshields.com>
<1996Jul25.1157.scs.0001@eskimo.com> <kluev-2607962112500001@kluev.macsimum.gamma.ru>
Date: Sun, 28 Jul 1996 21:17:40 GMT
In article <kluev-2607962112500001@kluev.macsimum.gamma.ru>, kluev@macsimum.gamma.ru (Michael Kluev) writes:
>
>
>
>
I do not get one thing: Why there is so fundamental difference

between the statements and expressions? Why the evaluation order of
sub-expressions is undefined while the evaluation order of statements
within the statement sequence is defined?
I always have a hard time with questions like these. Sometimes, they hardly make sense: after all, the language is what it is,
and we simply have to...
> Sure, I know the answer: "this is C standard" or "this is how
> language is defined", but I'm not looking for such an answer.
Oh. Then I guess I'm not allowed to use that answer, then.
> What I am looking for is the answer of the following question:
> "Why the language was defined such a way".
The usual answer is that it's to avoid unnecessarily constraining the compiler's ability to...
> If your answer is a form of: "This way compilers could do the better
> job of optimising expressions", then remember, that compiler must
> optimise not only expressions, but statements also.
Oh. So I guess I'm not allowed to use that answer, either. (Now it seems as if you're really constraining me!) So I'm afraid the
only truly factual answer I'm left with is "I don't know."
I posted a long article a year or two ago exploring some reasons why C might leave certain aspects of evaluation order
undefined. I'd re-post it now, but it would take me too long to find it in my magtape archives, and anyway the oxide is starting
to flake off (I'm not sure if it's the fault of the drive or the tapes), and I'm thinking that the next time I spin those tapes should
really be to transfer them to some new media, which I haven't selected yet. [I did eventually recover a copy.]
Instead, I'll explore a couple of different reasons. Bear in mind that these are only my own speculations, so they absolutely will
not answer your question. If you simply must know why C is defined the way it is on this point, you'll have to ask Dennis
Ritchie. The speculations I'll give you are some of the ones which might now lead me to design a language the same way, but I
freely admit that my thinking on language design has been very heavily influenced by C, which I somehow find pretty
congenial and easy to get along with.
undefined behavior
A language specification is a huge set of tradeoffs. No, don't worry, I won't talk about the forbidden tradeoff between the
programmer's freedom of expression and the compiler's license to optimize. The tradeoff I'm invoking now is one of
documentation: how much time and how many words can we afford to spend defining the language? As in so many other
areas, the law of diminishing returns sets in here, too. We have to ask ourselves whether an attempt to nail down some aspect
of the language more tightly is worth it in terms of the number of programs (or programmers) that absolutely need the extra
precision.
A language specification (like any detailed specification of a complex system) is also staggeringly difficult to write. The harder
you try, the more questions you end up inviting from devious folk who take your previous round of ever-more-detailed
specifications as a grand puzzle, the challenge being to find some loophole or ambiguous case or fascinating question which
remains unanswerable in the system as constructed.
Therefore, the standard makes certain simplifications. It makes these not just to make compilers easier to write, not just to
make the standard easier to write, but to make it easier for the rest of us to read, to wrap our brains around it and understand it.
The more exceptions it contains, or overly complex explanations of overly complex devices inserted just to placate the
nitpickers and puzzlemongers, the more likely it is to be unreadable by mere morals, or unimplementable by mere mortals, and
so ultimately to fail.
A largely forgotten aspect of the "Unix philosophy," and an aspect which like many others is equally responsible for the design
of C as Unix, and an aspect which is responsible both for the success of the operating system and the language and for the
heaping truckloads of acrimonious criticism which both incessantly receive, is that neither system was ever intended to satisfy
everybody. The designers were shooting for about a 90% solution, and were unapologetically willing to call that "good
enough." They knew, if they tried to satisfy everybody, that the first 90% of the requirements would take 90% of their time,
and that 9% of the remaining requirements would take the other 90% of their time, and that 0.9% of the remaining
requirements would take another 90% of their time, and so on [footnote 1]. Discretion being the better part of valor, they
decided -- with remarkable restraint, which I for one could never manage -- to nip that infinite regression in the bud. And in
one of X3J11's admirable successes at preserving the "spirit of C," that minimalist attitude largely pervades the C standard, as
well.
Returning to the subject of expression evaluation, the simplification of interest to us here is, in a nutshell, that how you specify
the order that things happen in is with statements, and how you compute values where the order doesn't matter much is with
expressions. If you care about the order, you need separate statements. (There were always a few exceptions to this simplistic
rule, of course; ANSI added a few more). The modern, more precise statement is that if you need to be sure that side effects
have taken effect, you need to have a sequence point, but to keep everybody's life simple, there are still relatively few defined
sequence points. If you have a complex expression with complicated sequencing requirements, you may simply have to break
it up into several statements, and maybe use a temporary variable. That was true in K&R (I've already quoted the relevant
sentences from section 2.12), and it's true today. It's a simple rule to state, it's a simple rule to implement, and at least 90% of
expressions (and in fact far more, probably closer to 99.9%) can in fact be written as single, simple expressions, because they
are simple and don't have complicated sequencing requirements.
I honestly believe that the existing rules are good enough, and that the excruciating discussions which we have about the issue
here from time to time tend to overstate its importance. I probably break an expression up into two statements to keep its
sequence correct about twice a year [footnote 2]. The vast majority of the time, you don't have to worry about the order of
multiple side effects (most of the time, because there aren't multiple side effects), and when you do, I claim (though I realize
how patronizing this sounds to the people I most wish would think about it) that the expression is probably too complicated
and that it should be probably be simplified if for no other reason than so that people could understand it, and incidentally so
that it would be well-defined to the compiler.
In closing, let me offer another way of thinking about order of evaluation, which I came up with a day or two ago and refined
during a brief e-mail exchange with James Robinson. As I mentioned, compilers tend to build parse trees, and precedence has
some influence on the shape of parse trees which has some influence on the order of evaluation. What's a good way of thinking
undefined behavior
about the ways in which the shape of a parse tree does and doesn't necessarily affect order of evaluation?
If it weren't for side effects -- assignment operators, ++, --, function calls, and (for those of you writing device drivers)
fetches from volatile locations -- [footnote 3], the entire meaning of a parse tree would be the computation of a single value.
Each node computes a new value from the values of its children, so values percolate up the tree. We have to begin evaluation
at the bottom, of course, and we can therefore say that there's some ordering of the evaluation from bottom to top, but we don't
care about the relative ordering of parallel branches of the tree; in fact for all we care they could be evaluated in parallel. The
reason we don't care is precisely that (for the moment) we are thinking about pure expression evaluation, without side effects.
If, as I claim here, the primary purpose of a parse tree is to generate the single value that pops out of the top of it, then a good
way to think about an expression (which is the basis for a parse tree) is that its primary purpose is to generate a single value,
too. We should think about side effects as in some sense secondary, because it turns out that the Standard (and, hence, the
compiler) accords them a good deal less respect, at least with respect to their scheduling. We should imagine that the compiler
goes about the business of evaluating an expression by working its way through the parse tree, applying no more ordering
constraints than the blatantly obvious one that you can't (in general) evaluate a node until you've evaluated its children. We
should imagine that, whenever the compiler encounters a node within the tree mandating a side effect, which would require it
to store some value in some location, it makes a little note to itself: "I must remember to write that value to that location,
sometime", and then goes back to its true love, which is evaluating away on that parse tree. Finally, we should imagine that
when the compiler reaches a point which the Standard labels as a sequence point, the compiler says to itself: "Oh, well, I guess
I can't play all day, now I'll have to get down to business and attack that `to do' list."
Of course, I'm not saying that you have to think about the situation in this way. But if you're looking for a model which will let
you think about expression evaluation in a way that matches the Standard's, I think this is a pretty good one. Even though we
only evaluate expressions for their side effects (that is, the statement
i + 1;
does nothing), the right way to think about expression evaluation is that we are, after all, evaluating an expression, or figuring
out what its value is. Its one value, singular. The only ordering dependencies are those which must apply in order to ensure that
we compute the correct value. If there are any intermediate values that we care about, because we expect them to be stashed
away via side effects, we must not care what order they occur in. Therefore, if there are multiple side effects, all of them had
better write to different locations. Also (again because we can't be sure when they'll happen) none of them had better write to
locations which we might later try to read from within the same expression.
This may have seemed like a roundabout set of explanations, and I'm sure it's still unsatisfying to those who insist on knowing
why C is as it is. To summarize the arguments I've tried to present here, C specifies expression evaluation as it does because it's
simple and good enough for most purposes while still allowing (taboo answer alert!) for decently optimized code generation,
and it does not provide more guarantees about expression evaluation because few expressions would need them.
Steve Summit
scs@eskimo.com
Footnote 1. Melanie has dubbed this "Zeno's 90/90 rule."
Footnote 2. The number of times I've had to break expressions up into separate statements decreased by about 2/3 when I
realized that the compiler does not have (and has probably never had) license to rearrange
int i, isc, fac1, fac2;
...
isc = (long)i * fac1 / fac2;
undefined behavior
which at one time I wrote as

long tmp = (long)i * fac1;
isc = tmp / fac2;
because I was worried about underflow in case the compiler evaluated it as (long)i / fac2 * fac1 .
Footnote 3. I apologize for the two entirely different uses of -- in this sentence.
undefined behavior
[This article was originally posted on March 21, 1995, in the midst of a thread on undefined behavior. I have edited
the text slightly for this web page.]
Subject: Re: Quick C test - whats the correct answer???
Message-ID: <D5swGz.4Cv@eskimo.com>
Summary: why is undefined behavior undefined?
References: <fjm.58.000B63C1@ti.com> <D5JLGy.A63@tigadmin.ml.com>
Date: Tue, 21 Mar 1995 17:26:59 GMT
In article <D5JLGy.A63@tigadmin.ml.com>, Jim Frohnhofer (jimf@nottingham.ml.com) writes:
> In article 000B63C1@ti.com, fjm@ti.com (Fred J. McCall) writes:
>> That's right, and that's why you should listen to him (and to me, and to
>> everyone else telling you this). The REASON it is undefined is BECAUSE THE
>> STANDARD SAYS SO.
>
> I may regret this, but I'll jump in anyway. I accept that the behaviour
> of such a construct is undefined simply because the Standard says so. But
> isn't it legitimate for me to ask why the Standard leaves it undefined?
It is, but you should probably be careful how you ask it. If you're not careful (you were careful, but most people
aren't), it ends up sounding like you don't believe that something is undefined, or that you believe -- and your
compiler happens to agree -- that it has a sensible meaning. But as long as we're very clear in our heads that we're
asking the abstract, intellectual question "Why are these things undefined?", and not anything at all practical like
"How will these things behave in practice?", we may proceed.
The first few answers I give won't be the ones you're looking for, but towards the end I may present one which you'll
find more satisfying.
First of all, you might want to take a look at who's saying what. I'll grant that statements such as Fred's above can be
annoying. I agree completely that it's usually very good to know the reasons behind things. But if the people who
have been posting to comp.lang.c the longest, and who have been programming in C for the longest and with the
most success, keep saying "it's undefined, it's undefined, quit worrying about why, just don't do it, it's undefined",
they might have a good reason, even if they don't -- or can't -- say what is, and that might be a good enough reason
for you, too.
I've been programming in C for, oh, 15 years now. For at least 10 of those years, I've been regarded as somewhat of
an expert. For going on 5 of those years, I've been maintaining the comp.lang.c FAQ list. I am someone who is
usually incapable of learning abstract facts unless I understand the reasons behind them. When I used to study for
math tests, I wouldn't memorize formulas, I'd just make sure that I could rederive them if I needed them. Yet for
most of the 15 years I've been programming in C, I simply could not have told you why i=i++ is undefined. It's an
ugly expression; it's meaningless to me; I don't know what it does; I don't want to know what it does; I'd never write
it; I don't understand why anyone would write it; it's a mystery to me why anyone cares what it does; if I ever
encountered it in code I was maintaining, I'd rewrite it. When I was learning C from K&R1 in 1980 or whenever it
was, one of their nice little sentences, which they only say once, and which if you miss you're sunk, leaped up off
http://www.eskimo.com/~scs/readings/undef.950321.html (1 of 6) [22/07/2003 5:53:11 PM]
undefined behavior
the page and wrapped itself around my brain and has never let go:
Naturally, it is necessary to know what things to avoid, but if you don't know how they are done on
various machines, that innocence may help to protect you.
As it happens, it's possible to read K&R too carefully here: the discussion at the end of chapter 2 about a[i] =
i++ suggests that it's implementation-defined, and the word "unspecified" appears in K&R2, while ANSI says that
the behavior is undefined. The sentence I've quoted above suggests not knowing how things are done on various
machines, while in fact what we really want to know is that maybe they can't be done on various machines.
Nevertheless, the message -- that a bit of innocence may do you good -- is in this case a good one.
That's my first answer. I realize that it's wholly unsatisfying to anyone who's still interested in this question. On to
answer number two.
> As far as I know, it was created by a committee not brought down by Moses
> from the mountain top. If I want to become a better C programmer, won't
> it help me to know why the committee acted as it did.
Perhaps, but again, only if you're very careful.
Let's say you're wondering why i++ * i++ is undefined. Someone explains that it's because no one felt like
defining which order the side effects happen in. That's a nice answer: it's a bit incomplete and perhaps a bit
incorrect, but it's certainly easy to understand, and since you insisted on an answer you could understand (as
opposed to something inscrutable like "it's just undefined; don't do it"), it's the kind of answer you're likely to get.
So next week, you find yourself writing something like
i = 7;
printf("%d\n", i++ * i++);
and you say to yourself, "Whoah, that might be undefined. How did that explanation go again? It's undefined...
because nobody felt like saying... which order the side effects happened in. So either the first i gets incremented
first, and it's 7 times 8 is 56, or the second i gets incremented first, and it's 8 times 7 is 56. Hey! It may be
undefined, but I get a predictable answer, so I can use the code after all!"
So in this case, knowing a reason why has not made you a better programmer, it has made you a worse programmer,
because some day when you're not around to defend it (or when you are around, but you don't have time to debug it),
that code is going to print 49, or maybe 42, or maybe "Floating point exception -- core dumped".
If, on the other hand, you didn't know why it was undefined, just that it was undefined, you would have instead
written
printf("%d\n", i * (i+1));
i += 2;
(or whatever it was that you meant), and your code would have been portable, robust, and well-defined.
undefined behavior
Now, some of you may be thinking that

printf("%d\n", i++ * i++);
is a ridiculous example which no one would ever write. That may be true, but it's the example I use in the FAQ list
(and it's the oldest of the undefined-evaluation-order questions in the FAQ list) because it illustrates, I hope better
than a "real" example would, the kind of contortions that people do get into when they start thinking too hard (but
not too carefully) about undefined expressions. There was an actual question posted to comp.lang.c years ago (which
I could probably find in my archives if I looked hard enough) in which the poster used essentially the same
argument: "the evaluation order may be undefined, but no matter which order the compiler picks, I'll get the answer
I want", and that was the inspiration for the i++ * i++ question (4.2 on the current list). (Remember, it's not the
evaluation order that's undefined, it's the entire expression that's undefined.)
Two paragraphs back, I suggested that the programmer who did not know why something was undefined might have
been better off than the programmer who did. I am not stating (not in the current climate, anyway) that you should
not know why things are undefined. But if you insist on knowing, you are going to have to get the full story, not just
some simplistic justifications. And you are going to have to be excruciatingly careful that you don't use your
knowledge of why something is undefined to try to second-guess how a certain undefined construct (which you've
decided you simply have to use in your program) is going to behave. Undefined behavior is slippery stuff: it really,
truly is undefined; it really, truly can do anything at all; yet some people (particularly those who are always
clamoring to know why things are undefined) are always trying to salvage some remnants of definedness out of an
undefined situation, and are always getting themselves in trouble, and end up convincing the people who have to
come along and pick up the mess that yes, it really was a mistake trying to explain why it was undefined, and we
probably should have just said it was undefined because we said it was, after all.
This has been answer number two, and I realize it's still not satisfying, because I'm still suggesting that maybe you
don't want to know why.
For answer number three, I'll quit beating around the bush and try to explain why, although I have to admit that, for
me, the answers from here on are going to get less satisfying, and less nicely worded, because we get down into
some realms that I don't usually think about (because I really have taken to heart K&R's advice about maintaining
some innocence).
An international Standard like X3.159/ISO-9899 is a tremendously difficult document to write, even for a relatively
simple programming language like C. When a Standard says something, it must say it definitively and
unambiguously. (When it is inaccurate, it must be definitively inaccurate, and when it contains any areas of doubt or
uncertainty, they must be rigidly defined areas of doubt or uncertainty.) The Standard must withstand intense
scrutiny, for many years, from all sorts of observers, including language lawyers, professional nitpickers, grudging
ex-users of other languages, semicompetent implementors of spiffola new compilers for scruffola old computers,
and xenophobic adherents of other sociopolitical systems. (If you think that abstract constructions like programming
languages are inherently removed from sociopolitical concerns, you haven't followed the crafting of any
international Standards.)
Since it's so hard to specify things precisely enough for a Standard like this, the wise Standard-writer (especially for
a Standard that hews to existing practice) won't specify anything more than is necessary. Features that everyone will
use (or that someone might be reasonably expected to use) must be specified precisely, but features that no sane
undefined behavior
person could be expected to use in 6,000 years might be relegated to the dustbin, instead. (Naturally, since we're
being precise, we'll have to precisely define the boundaries of the parts we've decided to be imprecise about; the
paraphrase above of Vroomfondel's demand from The Hitchhiker's Guide to the Galaxy is not facetious.)
In practice, you can only tell what a computer program has done when it causes a side effect. In a nutshell, then,
what a Standard for a programming language does is to define the mapping between the programs you write and the
side effects they produce. Consequently, we are very concerned with side effects: how they're expressed in our
programs, and how the Standard defines them to behave.
The fragment
int i;
i = 7;
printf("%d\n", i);
contains two obvious side effects, and it is blatantly obvious what their effect should be, and what the Standard says
agrees with what you think they should do.
The fragment
int a[10] = {0}, i = 1;
i = i++;
a[i] = i++;
printf("%d %d %d %d\n", i, a[1], a[2], a[3]);
contains a few more side effects, but (I assert) it is not obvious how they should behave. What does the second line
do? In the third line, do we decide which cell of a[] we assign to before or after we increment i again? Plenty of
people can probably come up with plenty of opinions of how this fragment ought to work, but they probably won't
all agree with each other, as the situation is not nearly so clear-cut.
Furthermore, a Standard obviously cannot talk about individual program fragments like i = i++ and a[i] =
i++, because there are an infinite number of those. It must make general prescriptions about how the various
primitive elements of the language behave, which we then use in combination to determine how real programs
behave. If you think you know what a[i] = i++ should do, you can't just say that; instead you have to come up
with a general statement which says how all expressions which modify and then use the same object should act, and
you have to convince yourself that this rule is appropriate not only for the a[i] = i++ case you thought about but
also for all the other expressions (infinitely many of them, remember) that you did not think about, and you have to
convince a bunch of other people of your conviction.
The people writing the C Standard decided that they could not do that. Instead, they decided that expressions such as
a[i] = i++ and i++ * i++ and i = i++ would be undefined. They decided this because these expressions
are too hard to define and too stupid to be worth defining. They came up with some definitions (no small feat in
itself) of which expressions this undefinedness applies to. You've seen these definitions; they're the ones that say
Between the previous and next sequence point an object shall have its stored value modified at most
once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to
undefined behavior
determine the value to be stored.

Now, I'll grant that these definitions are at least as hard to understand as expressions such as a[i] = i++. But
there are plenty of more complicated expressions which we can't begin to make sense of (or even decide if they
make any sense) without these definitions. Perhaps I'll say more about them later, but for the moment we're still
trying to answer the question of why some things (such as the expressions not defined by the two definitions quoted
above) are undefined, and my opinion is, as I stated above: because they're too hard to define, and too stupid to be
worth defining. (If you're still counting, call this answer number three.)
Now, you may think that I'm being overly pessimistic. You may think that it's easy to define what a[i] = i++ or
i++ * i++ or i = i++ should do. Perhaps you think they should be evaluated in exactly the order suggested by
precedence and associativity. Perhaps you think that i++ should increment i immediately after giving up i's value
and before evaluating any of the rest of the expression. (These rules would say that the incremented value of i is
used to decide which cell of a[] to assign to, and that the left-hand i++ in i++ * i++ happens first, and that i
= i++ ends up -- here's a surprise -- being exactly equivalent to i = i, unless perhaps if i is volatile.) But -- and
you're going to have to take my word for it here, because I'm taking other people's word for it -- these hypothetical
"well-defined" expression rules, though they're easy enough to state and probably precise enough and probably
comprehensive for the cases we're interested in, would result in significantly poorer performance than pre-ANSI C
traditionally had and that we're used to. Optimizing compilers would have to generate code which evaluated
expressions in little itsy bitsy steps, in lockstep with the precedence- and associativity-based parse. They would not
be able to rearrange parts of the expression to make the best use of the target machine's instruction repertoire or
available registers or pipelining or parallelization or whatnot.
When people talk about why it's good that an expression like i++ * i++ is undefined, they usually speak of
modern, parallel machines which might get really confused if they try to do, not the left-hand i++ first or the righthand one first, but instead both at once somehow. Instead of that example, let's imagine a very simple CPU,
analogous to a four-function calculator with some memory registers. Let's play compiler, and imagine what buttons
on the calculator we'll push, operating under the "well-defined" rules of the previous paragraph.
Since the (hypothetical) "well-defined" rules tell us exactly which order to evaluate the expression in, our task is
straightforward. First, the left-hand side of the multiplication operator: the value we want to multiply is i's previous
value. Whoops, before we can think about doing the multiplication, the rules say we have to do the increment.
Whoops, once we do the increment, we'll have lost the previous value. So we recall from i, store the value in a
temporary register, add one to to the value, and store it back in i. Now we can recall from the temporary, and do the
multiplication... no. Because first we have to do the same thing on the right-hand side: recall from i, store it in a
second temporary, add one to it, and store it back in i. Now, finally, we can recall the two values from the two
temporaries and do the multiplication.
If we remove the (still hypothetical, just for the purposes of this article, not part of Standard C) "well-defined" rules,
and revert to ANSI's rules, under which we only have to make sure that the increments to i happen sometime before
the next sequence point, look how much easier things become: Recall from i. Make a note to increment its value
later. Multiply it by: recall from i again, make a note to increment it later. Now we've got the product, do whatever
we have to with it. We've still got these two notes to increment i, so handle them, and we're done.
I'd never actually gone through the analysis in this level of detail until just now, but this is exactly how the example
int i = 7;
undefined behavior
printf("%d\n", i++ * i++);

from question 4.2 of the FAQ list could print 49 instead of 56. (No, I'm not going to come up with a scenario under
which it could print 42.)
So, if you're still with me (and after 300 lines, I can see how you might not be, but if you're one of the people who's
been insisting on a reason why, you better be :-) ), here is my fourth, last answer: these things are undefined
because if you made them defined, compilers would have to be paranoid and generate lockstep code which would be
bulkier or slower or both.
Having come this far, I have to repeat that this last answer isn't one I'm particularly pleased with, partly because I'm
not a code generation expert, and partly because I don't usually worry about efficiency very much. (On the other
hand, it doesn't matter whether I'm pleased with it, or whether you're pleased with it either; it is one of the reasons.)
You're probably also still harboring some doubts; you may be thinking that a compiler would only have to use the
slow, bulky, lockstep code for expressions with multiple side effects, which many people (even some of the
Doubting Thomases) agree that we shouldn't be writing anyway. Perhaps you're right. Perhaps we could have our
cake and eat it too; perhaps we could have blazingly efficient code generated for polite little expressions and still
have defined behavior for rogue ones. Perhaps we could, but not under the current Standard: it still does say that the
rogue expressions are undefined. But the C Standard is under revision: perhaps, if this is important enough to you,
you can convince the committee to pronounce defined behavior upon the rogue expressions, too. Good luck.
Steve Summit
scs@eskimo.com
Precedence vs. Order of Evaluation
[This article was originally posted on May 12, 2001.]

Subject: Re: *p++
Date: 12 May 2001 16:15:23 GMT
Message-ID: <9djnir$gcb$1@eskinews.eskimo.com>
References: <9dho8u$ieenp$1@ID-84876.news.dfncis.de>
In article <9dho8u$ieenp$1@ID-84876.news.dfncis.de>, dac evidently wrote:
> This is supposed to return the value pointed by p, and then increment p,
> right?
That's essentially correct.
> My problem is that, looking at the precedence table, I can't seem to
> understand it.
> ++ has greater precedence than *, and both evaluate from right to left.
> So, in *p++ shouldn't ++ be evaluated first, and then dereferenced?
I think what you've managed to do here is confuse precedence with order of evaluation. But don't worry:
it's extremely easy to confuse precedence with order of evaluation, so easy that on the other hand it can
be very hard -- sometimes I find it essentially impossible -- to explain why precedence is not the same
thing as order of evaluation. But your example, and more importantly, your misinterpretation of this
example, provides an excellent arena for explaining this subtle point. So thanks for asking.
Yes, the precedence of postfix ++ is higher than unary *. Now, the important thing about precedence is
not that it tells us what order things will be evaluated in. The important thing about precedence is that it
tells us which operators are matched up with which operands.
When we look at the expression *p++, before we can even start worrying about what order things will
be evaluated in, we have to figure out what the expression is supposed to mean. The * says that we're
taking the contents of a pointer, and the ++ says that we're incrementing something. But what are we
incrementing, the pointer or the value pointed to?
Precedence tells us. If the precedence of unary * were higher than postfix ++ (which it is not), then
*p++ would mean "take the contents of the object pointed to by p, and increment it in-place". However,
since it's actually the precedence of postfix ++ that's higher, the correct interpretation is that what we're
incrementing is the pointer, and what we're taking the contents of is the pointer as operated on by the
postfix ++ operator.
Notice that I did not say, "the interpretation is that first we increment the pointer, and then we take the
contents of the incremented pointer". When we use words like "first" and "and then", we're talking about
order of evaluation, but order of evaluation can be a very slippery thing to talk about. Here, by talking
about it too soon, we'd confuse ourselves into getting the wrong answer, because as we'll see, although in
*p++ we do in fact take "the contents of the pointer as operated on by the postfix ++ operator", by the
definition of postfix ++, the pointer value we'll take the contents of is the prior, unincremented value.
So let's look at *p++ very carefully. Once more, since ++ has higher precedence, we're (a) applying ++
to p, and (b) applying * to the result. In other words, the correct interpretation is as if we had written
*(p++)
So we're taking the contents of the subexpression p++. Now, what is the value of the subexpression
p++? The definition of postfix ++ is that it increments a variable, but yields as its value the prior,
unincremented value. Remember, when you say
int i = 5;
int j = i++;
you end up with the value 5 in j, even though i ends up as 6. It's the same with pointer values: when you
evaluate *p++, you end up with the value that p used to point to, even though p ends up pointing one
beyond it. Try it -- compile and run this little program:
#include <stdio.h>
int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
int main()
{
int *p = &a[5];
int i = *p++;
printf("%d %d\n", i, *p);
return 0;
}
The program prints "5 6" -- i receives 5, the value p started out pointing to, but p ends up pointing at 6.
Now, what if you don't want the value pointed to by the old value of p, but the new one? That is, what if
you want to increment the pointer, and then take the value pointed to? This is now an order of evaluation
question, and it turns out that one of the first things to think of whenever you have an order of evaluation
question is that there's a good chance that the answer does not involve precedence.
We want an expression like *p++, in which a pointer is incremented and its pointed-to value is fetched,
except that it's the new, incremented value that's fetched. One way, of course, would be to do it as two
separate statements: first evaluate p = p + 1, then apply *p. But if we want to be Real C
Programmers and use the slick shortcut ++ operator, all we have to do is remember that the distinction
between using the old or the new, the unincremented or the incremented value, is precisely the distinction
between the postfix and prefix forms of ++. So to increment p and fetch the incremented value, we'd
write *++p.
(Notice, by the way, that if the explanation of *p++ were "first we increment the pointer, then we take
the contents of the incremented pointer" -- which, again, it is not -- then there would be no difference
between *p++ and *++p. But of course there is -- and should be -- a very significant difference between
these two, different expressions.)
While we're here, let's explain a few more things. Suppose we were using *p++, and we noticed that it
was the old value of the pointer that was being used by the * operator, and we wanted it to be the new
value, and we said to ourselves, "I want it to first increment the pointer, then take the contents of the
incremented pointer", and suppose further that we thought that the way to get the order of evaluation we
wanted was to use explicit parentheses, leading us to write *(p++). What happens next?
Well, it doesn't work, of course. *(p++) acts precisely like plain *p++, and the reason is that the real
purpose of parentheses is not to force an order of evaluation, but rather to override the default
precedence. So when you write *(p++), all that the parentheses say is, "++ is applied to p, and * is
applied to the result", and that's the interpretation that the higher precedence of ++ implied already,
which is why the parentheses don't make any difference. In other words, when you've got an order of
evaluation problem, not only is the answer probably not going to involve precedence, it's probably not
going to involve parentheses, either.
What are parentheses good for, then? Well, let's go back to the earlier question of whether *p++
increments the pointer or the thing pointed to. We've seen that it increments the pointer, but what if we
want to increment the thing pointed to? That's a precedence problem, so the appropriate answer is, "use
explicit parentheses", and the result is (*p)++. This says, * is applied to p to fetch a value, and ++ is
applied to the fetched value.
Exercise for the reader: (*p)++ increments the pointed-to value, but (again, by the definition of postfix
++) returns the old, unincremented pointed-to value. What if you wanted to increment the pointed-to
value and return the new, incremented, pointed-to value? (Do you need explicit parentheses in this case?)
If I haven't put you to sleep by now, I'll launch into one more little digression concerning the use of timerelated concepts to explain precedence. Ideally, to avoid confusion between precedence and order of
evaluation, we'd use time-related language to explain only order of evaluation, not precedence. But when
your C teacher was first explaining precedence, the explanation probably involved an expression along
the lines of
1 + 2 * 3
and the statement was probably made that "the multiplication happens first, before the addition, because
multiplication has higher precedence than addition". And having heard an explanation like that, it's all
too easy to come away with the impression that precedence controls order of evaluation, and it's also all
too easy to get into trouble later when considering an expression like *p++, where the precedence does
not control, doesn't even come close to controlling, the aspect of evaluation order that you're interested
in.
If we look at the expression 1 + 2 * 3 very carefully, the key aspect explained by precedence is that
the multiplication operator is applied to the operands 2 and 3, and the addition operator is applied to 1
and the result of the multiplication. (Notice that I said "and", not "and then"). Here, although there's
definitely an order of evaluation issue, and although the order of evaluation is apparently influenced by
the precedence somehow, the influence is actually not a direct one. The real constraint on the order of
evaluation is that we obviously can't complete an addition which involves the result of the multiplication
until we've got the result of that multiplication. So we're probably going to have to do the multiplication
first and the addition second, but this is mostly a consequence of causality, not precedence.
But if your C instructor misled you into thinking that precedence had more to do with order of evaluation
than it does, have pity, because I don't know of a good way of explaining precedence that doesn't involve
time-related concepts, either. My students used to have to watch me do battle with myself, lips flapping
like a fish but with no words coming out, as one part of my brain, paranoid about the possibility of later
misunderstandings, desperately tried to keep another part from blurting out the dreaded "the
multiplication happens first, before the addition, because multiplication has higher precedence than
addition". (And at least half the time, that's probably what I ended up saying anyway.)
> So, for example, having
> char *p = "Hello";
> printf("%c", *p++);
> shouldn't return 'e' ?
Nope. By the arguments above, it prints 'H'. To print 'e', you'd want *++p.
See also the comp.lang.c FAQ list, question 4.3. See also
http://www.eskimo.com/~scs/readings/precvsooe.960725.html.
Steve Summit
scs@eskimo.com
8889185069805239596085360662344508517964037796278267072470951552302
undefined behavior
[This article was originally posted on March 11, 1995, in the midst of a thread on undefined behavior. I am
indebted to Michael Hodous for having saved a copy of this article, and locating it and forwarding it to me over a
year later, after I'd lost mine.]
Subject: undefined behavior and Doubting Thomases (was: Quick C test...)
Message-ID: <D5Aqno.II8@eskimo.com>
References: <danpop.794784912@rscernix> <3jq1mn$63g@news.sas.ab.ca>
<danpop.794862450@rscernix>
Date: Sat, 11 Mar 1995 22:04:35 GMT
Lines: 114
In article <danpop.794862450@rscernix>, Dan Pop (danpop@cernapo.cern.ch) writes:
> In <3jq1mn$63g@news.sas.ab.ca> dmartin@freenet.edmonton.ab.ca () writes:
>> You can say "THE STANDARD SAYS IT'S UNDEFINED BEHAVIOUR!!!" until your
>> keyboard wears out and you still won't have answered the basic question of
>> WHY?
>
> I did, because this is the right answer. Your "answer" doesn't explain
> why the standard chose "undefined behaviour" instead of "unspecified
> behaviour" or "implementation defined behaviour". It won't convince
> anybody who can think for himself.
Actually, Douglas and Dan are both right.
Posters to comp.lang.c do assert, over and over and over again, that certain behavior is undefined. It doesn't help.
Other posters keep asking "Why?" or saying "Oh, but if I don't care which outcome I get, I'm okay."
Posters to comp.lang.c do assert, over and over and over again, that "No, it's UNDEFINED! Undefined behavior
means that ANYTHING can happen! If you write i=i++, your computer can teleport itself to 16th century
Wittenberg and change Western history by nailing sections 1.6 and 3.3 of ANSI X3.159 to the church door!". It
still doesn't help. Hyperbolic scenarios like that are fun to construct, but if someone has heard but not accepted one
sober definition of undefined behavior, and persists in their dogged belief that i=i++ is meaningful to them and
their compiler, those increasingly fantastic explanations don't usually penetrate, either.
If we want to educate, if we want people to stop believing that they can write i=i++ or predict what it should do,
we can't just keep repeating "It's undefined". We can't say anything that sounds like "It's undefined because we and
the Standard say it is, so just accept it." If it sounds like that, if it can possibly be construed to sound like that, then
people with a hyperactive (but perhaps beneficial, in other situations) skepticism gland will construe it like that,
and will cross their arms, state that they're from Missouri, and demand that we show them why. We can't say
"because the Standard says so" for the nth time -- they remain unconvinced. We can't emulate an inane beer
commercial and say "Why ask why?" -- their arms remain crossed. We have to give them a reason, in language
they can understand, and if we think that the best reason really is "because the Standard says so," or that it's not
really sensible to ask why or meaningful to explain why, then we had better -- again, if we're interested in
undefined behavior
educating and in keeping the noise level down -- keep those opinions to ourselves.
I keep trying to find ways of saying these things that really work. (Two other FAQ list issues that seem to defy
being put to rest are whether arrays are pointers, and whether it's possible to write a generic swap macro.) You may
think that the current FAQ list wording is adequate (if so, thank you! :-) ), and that people, having read it, shouldn't
keep wondering why things are the way they are, but FAQ lists are by their very nature pragmatic: if people
manage not to understand them, then they (the lists) are not doing their job. If an FAQ list which merely stated the
facts were adequate, if it were a reader's fault for misunderstanding it, then we wouldn't need FAQ lists at all; there
are already plenty of documents out there which merely state the facts, and which readers manage to
misunderstand.
Roger Miller came up with a very nice explanation of how behavior which the pedants claim is undefined can
seem to work "correctly" on the skeptic's computer:
Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried
it and it worked just fine. He obviously didn't understand basketball.
(This observation can also be used to make Dan's point: about the only answer to the question "Why can't you hold
the ball and run in basketball?" is "because it's against the rules" or "because that's not the way the game is
played.")
At one time I was toying with statements of the form
Undefined means that, notwithstanding question 8.2, the code
printf("%d", j++ <= j);
can print 42, or "forty-two".
(For those who don't have a copy of the FAQ list in front of you, question 8.2 is the one that says that boolean
operators like <= always return 0 or 1. I confess that by mentioning the number 42, I've started down the road
towards hyperbole.) I think I even once posted here something along the lines of
Boolean operators like <= always return 0 or 1.
The code
printf("%d", j++ <= j);
can print 42, or "forty-two".
If you can hold these two thoughts simultaneously, you will have achieved Enlightenment.
These arguments do have a certain subtle appeal, but when you're trying to pierce those curtains of skepticism, it
seems that neither subtlety nor hit-em-over-the-head, nasal demons arguments are sufficient.
The massive FAQ list revision I'm currently immersed in (and which I shouldn't even be taking time out from to
undefined behavior
post this article) has a bunch more to say about undefined behavior; we'll see if any of it helps.
Steve Summit
scs@eskimo.com
undefined behavior
[This article was originally posted on November 5, 1998.]

Subject: Re: Interesting question in C
Date: 5 Nov 1998 17:41:02 GMT
Lines: 257
Message-ID: <71snve$3cg$1@eskinews.eskimo.com>
References: <01be08ba$4b7439c0$ec19ea93@lalex.ecitele.com>
<3641B492.7AC8@gec.nospam.com>
In article <3641B492.7AC8@gec.nospam.com>, Edward Hill writes:

>Alex Lemelev wrote:
>>
int i , j = 3;
>>
i = ++j * ++j;
>
> But well ++ has higher precedence than *
> so I'd imagine j is incremented to 4 and then incremented
> to 5 and then multiplied by itself to give 25.
> So what's the answer?
The answer, of course, is that there is no answer.
Attempting to use precedence rules to analyze malformed expressions like these is a popular pastime, but
it turns out not to help. It's true that the "outputs" of the two ++'s are to be multiplied together, but what
are those outputs?
Consider a related expression:
i = f1() * f2();
Function calls have higher precedence than multiplication, so the two functions will be called before the
multiplication can happen. But which function is called first, f1 or f2? No one can say, and different
compilers do it differently.
(I suppose you might try to invoke the left-to-right associativity of multiplication to claim that f1 would
be called first, but associativity doesn't help here, either. Associativity says that in f1() * f2() *
f3(), the results of f1 and f2 are multiplied together before being multiplied by the result of f3, but
we still don't know which order the functions were called in. f3 might be called first, and its result saved
for last.)
undefined behavior
The situation with ++j * ++j is actually even worse than with f1() * f2(). In the function call
case, it's probably true that either f1 is called first and then f2, or f2 is called first and then f1.
Applying the same logic to ++j * ++j, we might imagine that either the left-hand ++j gets evaluated
first and then the right-hand one, yielding 4 * 5 = 20, or else the right-hand one gets evaluated first and
then the left-hand one, yielding 5 * 4 = 20. But it turns out that this analysis doesn't work, either.
The problem is that ++j does two things. It delivers the value j+1 to the surrounding expression, and it
stores the value j+1 back into j, thus incrementing it. But we don't know (because the C Standard
doesn't tell us) precisely when the updated value of j gets written back to j.
In some expressions, it doesn't matter. If we wrote
i = ++j * k;
the compiler could emit code which essentially either interpreted the expression as
j = j + 1;
i = j * k;
or
i = (j+1) * k;
j = j + 1;
If we wrote
i = ++j * ++k;
there would be at least six possible interpretations:
j = j + 1;
k = k + 1;
i = j * k;
k = k + 1;
j = j + 1;
i = j * k;
i = (j+1) * (k+1);
j = j + 1;
k = k + 1;
i = (j+1) * (k+1);
k = k + 1;
j = j + 1;
j = j + 1;
i = j * (k+1);
k = k + 1;
i = (j+1) * k;
undefined behavior
k = k + 1;
j = j + 1;
Now, all six of those interpretations yield the same answer (16, if j and k both start out as 3). But if you
replace k in each of them by j, resulting in six possible interpretations of the expression ++j * ++j,
you can get at least three different answers (16, 20, and 25, by my reckoning).
But it gets worse. The official wording in the ANSI/ISO C Standard says that if you attempt to modify
the same object at one spot within an expression and refer to that same object at another spot (as in the
expression ++j * j), or if you attempt to modify the same object twice within an expression (as in the
expression ++j * ++j), the results are undefined [footnote 1]. What this means is that the compiler
isn't even required to choose among the two (or six, or however many you can come up with) "plausible"
interpretations of the expression; it can do any random thing it wants to, and it won't be in violation of
the Standard. (In practice, it's not so much that compilers do "any random thing they want to" in these
situations, but what does happen is that they do any random thing that falls out of their algorithms for
handling well-formed expressions, algorithms which are not designed -- because they do not have to be -to do anything reasonable with undefined expressions.)
Now, you might say (and I'd agree with you) that the expression i = ++j * ++j would be so
unlikely to come up in a real program that it hardly matters what it does. But if you think you can analyze
that expression and figure out what it does, then there's a flaw either in your analysis or in your
understanding of C. And I would urge you to repair that flaw -- to teach yourself that you cannot analyze
the behavior of expressions like these -- because that way you won't falsely imagine that you can predict
the behavior of other, more insidious undefined expressions which do occasionally crop up in real
programs, expressions which will cause real problems if they are retained, expressions which it will do
nobody any positive good to be able to analyze or justify, expressions which should ideally be
recognized as the undefined expressions they are and banished from the program as soon as possible.
As an example of how a "more insidious undefined expression" can in fact crop up in a real program,
consider the following decidedly non-hypothetical scenario. (The reason I know it's not hypothetical is
that I know the programmer it happened to.) The program is the interpreter for a stack machine, and it
contains code like
switch(op)
{
case ADD:
push(pop() + pop());
break;
case SUBTRACT:
tmp = pop();
push(pop() - tmp);
break;
case MULTIPLY:
undefined behavior
push(pop() * pop());
break;
case SUBTRACT:
tmp = pop();
if(tmp == 0) {error("divide by 0"); break; }
push(pop() / tmp);
break;
...
In the original implementation, push() and pop() are function calls. The original programmer,
knowing that the order of function calls within an expression is unpredictable (see our discussion of f1
and f2 above) has taken care to use a temporary variable to ensure that the operands are popped in the
right order for subtraction and division, where it matters.
But the interpreter is too slow, and a later programmer reasons that part of the problem is function call
overhead. (Most of the time, it's a good idea not to worry about function call overhead too much, but in
some situations, such as the inner loop of an interpreter which might be interpreting huge amounts of
code, it can indeed be significant.)
I don't remember if the push and pop functions were
void push(val)
{
stack[stackp++] = val;
}
pop()
{
return stack[--stackp];
}
or
void push(val)
{
*stackp++ = val;
}
pop()
{
undefined behavior
return *--stackp;
}
(That is, although the stack and the stack pointer were both global, I don't remember whether the stack
pointer was an index or a pointer into the stack.) But, in either case, it's easy to see how we might replace
the push and pop functions with macros, thus removing the function call overhead in the critical inner
loop, without rewriting the entire interpreter. (In fact, this interpreter had hundreds of operators, some
much more complicated than simple arithmetic, so there were hundreds of calls to push() and pop()
spread across many source files.)
The macros would, of course, either be
#define push(val) (stack[stackp++] = (val))
#define pop() stack[--stackp]
or
#define push(val) (*stackp++ = (val))
#define pop() *--stackp
and this seems like a very simple way to dramatically increase the efficiency of the interpreter. (It did,
too, by 20 or 30 percent.) The problem was that the interpreter also began delivering wildly erroneous
answers.
The problem turned out to be in those innocuous-looking lines
push(pop() + pop());
and
push(pop() * pop());
for addition and multiplication. When push() and pop() are macros, these lines expand to either
(stack[stackp++] = (stack[--stackp] + stack[--stackp]));
(stack[stackp++] = (stack[--stackp] * stack[--stackp]));
or
(*stackp++ = (*--stackp + *--stackp));
(*stackp++ = (*--stackp * *--stackp));
undefined behavior
And these expressions bear an eerie similarity to the one that began this thread. (They're complicated by
an added level of indirection, and they each attempt to modify stackp three times within one expression.)
And yes, under a popular, real-world compiler, namely gcc, these expressions did produce some crazy
result which was not at all the expected one but which, by a strict and proper interpretation of the C
Standard, was not incorrect. (That is, it was not incorrect for the compiler and the emitted code to have
produced the undesired result.) The compiler had not done anything wrong; the error was on the part of
the programmer who attempted to replace the function calls with macros, macros which proved to be
dangerous because of their embedded side effects.
And finally, in case anyone is doubting the truth of this story, I can tell you that the maintenance
programmer who introduced the undefined behavior was (as others of you have probably already
guessed) none other than the maintainer of the comp.lang.c FAQ list, Steve Summit, who presumably
ought to have known better.
Steve Summit
scs@eskimo.com
Footnote 1: The real rule on undefined expressions is that you can't modify an object twice, or modify
and then inspect it, between sequence points. There are a few operators -- including function calls -which do introduce sequence points, so it's possible to come up with expressions which contain multiple
side effects but which are not undefined.
C strings: pointers vs. arrays
[This article was originally posted on September 26, 1998.]

Subject: Re: newbie question: stings, pointers and arrays
Date: 26 Sep 1998 16:25:24 GMT
Lines: 234
Message-ID: <6uj4hk$ts2$1@eskinews.eskimo.com>
References: <slrn70lima.8b4.edward@portaloo.hairnet> <6uhpkp$len$1@nnrp1.dejanews.com>
In article <6uhpkp$len$1@nnrp1.dejanews.com>, bbarnette@my-dejanews.com writes:
>In article <slrn70lima.8b4.edward@portaloo.hairnet>,
>edward@hairnet.demon.co.uk wrote:
>> but if I do this it seg faults
>> char *play[2]={"1","2"};
>> strcpy(play[0]="3");
>> strcpy(play[1]="4");
>> What am I doing wrong?
>
> Your problem is in strcpy(). It is strcpy(destination, source);
> There is no need for the = here.
True, but that was only one of his problems, and here, as so often happens (to borrow a line from
Douglas Adams), the superficial design flaw has hidden a more fundamental design flaw.
> Try this:
> char *play[2]={"1","2"};
> strcpy(play[0],"3");
> strcpy(play[1],"4");
This is also likely to seg fault or otherwise abort. The problem is that the two pointers play[0] and
play[1] point to constant strings, as requested by the string literals "1" and "2". Those constant
stings are not necessarily writable, and on more and more systems, attempts to do so will fail. Yet
modifying those strings is exactly what the calls
strcpy(play[0], "3"); strcpy(play[1], "4");
attempt to do -- the "3", for example, is supposed to be copied into the (possibly read-only) memory
occupied by the constant string "1".
The situation is different if play[0] and play[1] are arrays rather than pointers. If the code had been
http://www.eskimo.com/~scs/readings/strings.980926.html (1 of 5) [22/07/2003 5:53:30 PM]
char playa[2][2] = {"1", "2"};

then the strcpy calls would have been correct and guaranteed to work. Here, playa[0] is not a
pointer to a char, it is an array of char, and a writable one at that. In this case, it doesn't matter
whether the string literal "1" is writable or not, because when a string literal is used as an array
initializer, the string is copied into the array, after which it's the array you might be trying to write to, not
the string literal.
There's a second potential problem with those strcpy calls, of course. If the replacement strings were
any longer, e.g. if we were to call
strcpy(play[0], "345"); strcpy(play[1], "678");
we would exceed the storage allocated for the two initial strings, and this would be a problem regardless
of whether we were using pointers or arrays. (In the array case, we could preallocate the arrays larger
than required, by declaring, say, char playa[2][10], where we'd be assuming that no string longer
than 9 characters would ever be copied into either of the arrays.)
There's actually a very interesting sort of paradox underlying this whole issue. It's true that strings in C
are arrays, and it's true that you can't assign arrays, and it's true that you therefore often need to call
strcpy when you want to "assign" a string. But this does not mean that whenever you have the notions
of "string" and "assign" in your head, you should reflexively call strcpy! The reason is that strings are
often referred to by pointers, and strcpy can be dangerous when used with unknown pointers (that is,
when you're not sure how much memory they point to, or whether the memory is writable).
The paradox is that although your C teacher told you that since arrays aren't assignable you have to call
strcpy to "assign" a string, pointers are assignable, and it's often much safer to use simple assignment
when working with pointers to strings! (Besides safety, if you're at all worried about efficiency, assigning
pointers can be significantly more efficient than copying strings around all the time.)
For example, the code
char *play[2] = {"1", "2"};
strcpy(play[0], "3");
is wrong, because it attempts to overwrite constant strings. Furthermore, the code
char *play[2] = {"1", "2"};
is doubly wrong, because it attempts to overwrite constant strings and overflows them if it succeeds.
BUT, the code
char *play[2] = {"1", "2"};
play[0] = "3";
play[1] = "4";
is perfectly legal and correct, as is
char *play[2] = {"1", "2"};
play[0] = "345";
play[1] = "678";
In these cases, we're reassigning the pointers to point to different string constants, and although those
string constants might not be writable, either, we've successfully accomplished our goal of assigning new
string values to play[0] and play[1].
Here's another example. When attempting to write a function which returns a string, beginners frequently
write code like this:
char *digitvalue(int d)
{
char result[10];
switch(d) {
case 0: strcpy(result,
...
}
return result;
}
"zero"); break;
"one"); break;
"two"); break;
"nine"); break;
This code has a serious and fatal bug, of course: the local result array will not exist by the time the
function returns to its caller, so the pointer that the caller receives will be useless. (See the comp.lang.c
FAQ list, question 7.5.) One recommended fix for this problem is to declare the result array as
static, but taking the larger view, it turns out that might not be the best fix in this situation, after all!
Much better, I think, is to rewrite the function completely, as
const char *digitvalue(int d)
{
switch(d) {
case
case
case
...
case
}
0: return "zero";
1: return "one";
2: return "two";
9: return "nine";
}
Now we've gotten rid of the result array completely; this has the additional benefit that besides not
having to worry about its allocation, we also don't have to worry about making it big enough. (In other
words, the arbitrary value 10 from the previous implementation has disappeared.) Since the returned
pointers now point directly to string literals, the pointed-to storage must not be written to, and I've
documented this fact by declaring the function as returning const char *. (This may seem like an
unfortunate and perhaps unacceptable additional restriction over the initial implementation, but since the
initial implementation returned a pointer to storage which was neither readable nor writable, I'm perfectly
comfortable with the function as modified.) If your programming style disallows functions with multiple
exit points, you can use a temporary internal return value which is a pointer rather than an array:
{
const char *result;
switch(d) {
case 0: result
case 1: result
case 2: result
...
case 9: result
}
return result;
}
= "zero"; break;
= "one"; break;
= "two"; break;
= "nine"; break;
Finally, as an aside, this particular switch statement is somewhat unnecessary -- since the case values are
dense and 0-based, it would suffice to pluck the digit names out of an array:
const char *digitnames[] = {"zero", "one", "two", "three",
"four", "five", "six", "seven", "eight", "nine"};
{
if(d >= 0 && d <= 9)
return digitnames[d];
else
return "error";
}
(We can immediately tell that this last implementation is superior, because I've been able to present a
complete, equally-readable example, with error-checking, in less space than I was formerly using to
present abbreviated examples. This last example, too, could be written with one exit point by assigning
one of the digitname strings to a local temporary return value pointer, but the point is that it's superior to
one that used strcpy to copy one of the digitname strings to a local temporary return value array.)
To summarize the argument I've been making, IF all the strings you're working with are constant, it's
arguably safer to use pointer assignment all of the time, and never to call strcpy.
BUT, that's a mighty big "if", of course. If you're constructing any strings on the fly, perhaps by calling
strcat, you have to be more careful, because neither a rule like "always use arrays and call strcpy
for simple assignments" nor "always use pointers and use assignment for simple assignments" will
suffice. If you're constructing strings on the fly (or reading them in from the outside world), you have to
ensure that the storage you write them into is both writable and big enough. If you're using arrays, this
will probably mean declaring arrays having some explicit, maximum size (i.e. the maximum size of the
strings you'll ever be able to construct or read in), and if you're using pointers, it will probably mean
calling malloc or realloc. (And, if you've settled on arrays, you may end up going back to strcpy
for all string "assignments".)
The bottom line is: if you have a program that manipulates a lot of strings, what should you choose as
your basic "string" type? If you choose "array of char", then yes, you will have to call strcpy all the
time. But if you choose "pointer to char", you probably do not want to call strcpy; most of the time
you will want to use pointer assignment instead. (And if you want to get the best of both worlds, by using
some arrays and some pointers, you'll have to be more careful still.)
Steve Summit
scs@eskimo.com
<TT>a++</TT> versus <TT>++a</TT>
[This article was originally posted on January 18, 1999; I have touched it up lightly since then.]
Subject: Re: a++ versus ++a
Date: 18 Jan 1999 16:31:45 GMT
Lines: 457
Message-ID: <77vnlh$8qq$1@eskinews.eskimo.com>
References: <36A2B0E3.95E2A918@newsandmail.com>
In article <36A2B0E3.95E2A918@newsandmail.com>, dan.johnson@newsandmail.com writes:
> Here's my understanding Incrimenting and Pre-Incrimenting....
> a++ means the same as a + 1
> ++a means the same as 1 + a
Well, no. That wouldn't be much of a difference, would it?
> What I don't understand is what the benefit of one versus the other
> is inside of code. I have used a++ many times in my for-loops
> [ie, for (a = 0; a > 10; a++) ... etc]
Part of the trick of understanding the difference between a++ and ++a is to realize that sometimes, there is no
difference. The difference between a++ and ++a is in what value is "returned" to the surrounding expression,
but this obviously makes a difference only when there is a surrounding expression, when the a++ or ++a
appears as a subexpression within a larger expression. When the a++ or ++a stands alone, as it does in the loop
header
for(a = 0; a < 10; a++)
there is no effective difference at all; that loop is 100% equivalent to one that begins
for(a = 0; a < 10; ++a)
When the a++ or ++a does appear as a subexpression within a larger expression, the difference is that a++
"returns" the old, unincremented value of a to the surrounding expression, while for ++a you get the new,
incremented value.
> I learn best by simple example, so if you have some code you can show
> off the best use and explanation of the differences between these, then
> by all means supply it in a reply.
Here's some code. (The example is going to seem excessively long, but underlying that length is the simplicity
you asked for, so please bear with me. It's long because I'm going to try to make it realistic; I'm going to
motivate the choice between prefix and postfix ++ with a convincing example, rather than an arbitrary fragment
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (1 of 9) [22/07/2003 5:53:35 PM]
like b = a++, and I'm going to provide a fair amount of commentary along the way.)
Suppose we have an array of integers
int numbers[10];
which we're going to fill in with numbers read from the user. We're not sure how many numbers the user is
going to type in -- sometimes it will be less than 10 -- so we'll keep a second variable,
int nnumbers;
to keep track of how many numbers the user did type in. Our very first cut at writing the code to read in the
numbers might look something like this:
int i, x;
for(i = 0; i < 10; i++)
{
x = getnumberfromuser();
if(x <= 0)
break;
numbers[i] = x;
}
nnumbers = i;
Here we've deferred the question of how exactly we're going to accept numbers from the user by moving that
functionality off into a function, getnumberfromuser(). (I'll provide a sample implementation below, for
completeness.) We've hypothesized that getnumberfromuser() will return 0 or a negative number to
indicate that the user has finished typing in numbers (perhaps because we'll simply ask the user to type in 0 or 1 to indicate the end of the list). Since the numbers[] array is size 10, we start off imagining that we'll make
10 trips through the loop to fill in up to 10 numbers. If the user indicates the end of the list (that is, types 0 or -1)
after typing fewer than 10 numbers, we won't be getting 10 numbers, after all, so we take an early exit from the
loop with the break statement.
The code above will work, but it has a few deficiencies. The most serious (though the least obvious) is that if
the getnumberfromuser() function explicitly prompts the user to enter a number (which is quite a likely
implementation), the user experience will be significantly different depending on whether the user types 10
numbers, or fewer than 10 numbers. If the user types, say, 7 numbers, another prompt will appear (which would
be for the 8th number), and the user will type 0 or -1 to indicate completion. If the user types 10 numbers,
however, the loop will take its normal exit (because the condition i < 10 in the for loop will no longer be
true), and getnumberfromuser() will not be called at all, and no 11th prompt will appear, and the user
won't have to type (or be given the opportunity to type) 0 or -1 at all.
Other deficiencies (though they're more minor stylistic quibbles) concern the loop used and the number of
variables used. The loop
for(i = 0; i < 10; i++)

suggests that we expect the loop to be traversed 10 times most of the time, except under exceptional
circumstances, but in fact if it's normal for the user to type fewer than 10 numbers, the loop will exit abnormally
(via the break statement) most of the time. (And if the user does frequently type all 10 numbers, it's possible
that the constant 10 in the program is too small, that the user would like to be typing more numbers, if given the
chance.) There's nothing wrong with using a break statement, especially when the condition it represents is
truly exceptional, but this particular use isn't the best.
Finally, there's at least one more variable in this first code fragment than we really need. Within the loop, the
variable i keeps track of how many trips we've made through the loop and hence how many numbers the user
has typed so far. But at the end of the loop, we copy i out into our other variable, nnumbers, which we had
specified would contain the final number of numbers the user eventually typed. It turns out we can get by with
just i or just nnumbers; we don't need both. (There's nothing intrinsically wrong with using more variables
than are strictly required -- an overzealous attempt to re-use variables can lead to buggy or difficult-tounderstand programs -- but sometimes, as here, extra variables indicate that the flow of the code is not as clean
as it could be.)
A better loop would emphasize that we're going to accept numbers as long as the user keeps volunteering them,
and would relegate the i < 10 test to the exceptional condition, since that test is more of an accident of the
(rather arbitrary) choice of 10 as the maximum number of numbers the program can support. Writing a loop that
accepts numbers as long as the user keeps typing them sounds much more like a while loop than a for loop,
and indeed the kind of loop we want to write, in pseudo-code, will look something like
while(there's another number)
{
store it in numbers[];
increment nnumbers;
}
One way to implement the loop "while(there's another number)" would be
while(getnumberfromuser() > 0)
But that won't work for this example, because when a number came back from the getnumberfromuser()
function, we'd (properly) compare it with zero to determine whether it was a positive number or a zero-ornegative sentinel indicating that the user was finished, but then we'd lose track of the just-returned value and
wouldn't be able to store it in the numbers[] array, which was supposed to be our primary job. What we
really want to do, of course, is save the returned value away in a variable so that we can both test it against 0
and (if necessary) use it to fill in one of the elements of the array. The obvious way to do that (which is about
what we used in our first attempt) is
x = getnumberfromuser();
if(x > 0)
else
store x in numbers[] and continue reading;

we're done reading numbers;
The slightly less-than-obvious way, but which is ideal for while loops like the one we're trying to write, is to
bury the assignment inside the conditional:
(x = getnumberfromuser()) > 0
Here we store getnumberfromuser's return value into the variable x, and then use the "return" value of the
assignment operator as the left-hand operand of the > operator. The "return" value of an assignment operator is
simply the value assigned, so the net result is that we both store getnumberfromuser's return value away in
x for future reference, and also immediately compare it against 0 to see if it's a number we want, or not. Placed
into a while loop (but still using some pseudocode), the expression is used like this:
while((x = getnumberfromuser()) > 0)
{
store x in numbers[];
}
we're done reading numbers;
If you don't mind the embedded assignment-and-test epitomized by the expression (x =
getnumberfromuser()) > 0, this is a splendid way to write the loop, and it corrects the first stylistic
deficiency. (Actually, I hope you don't mind the embedded assignment-and-test, because it's quite a popular
idiom in C!) Now I'll move on to the second deficiency, because even though it was quite minor, it's going to be
at the center of our decision whether to use prefix or postfix ++ (which is, believe it or not, still what I'm
ultimately trying to demonstrate here).
I want to remove the extra variable involved in the first example. When we're done reading numbers, the
variable nnumbers is supposed to contain the number of numbers the user has entered. I'm going to assert that,
within the loop, nnumbers is always going to contain the number of numbers the user has entered so far. (The
notion that "nnumbers always contains the number of numbers the user has entered so far" is an example of a
"loop invariant", and picking clean, simple loop invariants and sticking to them is an excellent technique when
we want to write clean, readable code.)
Let's think about the values that nnumbers should contain as the program fragment proceeds. Before we make
any trips through the loop, the user won't have typed any numbers, so nnumbers should (rather obviously)
start out as 0. At the end of the first trip through the loop (assuming the user doesn't type 0 or -1 right away),
nnumbers should contain 1. At the end of the second complete trip through the loop, nnumbers should
contain 2, etc.
If we're going to stick with using the nnumbers variable, and eliminate i as redundant, we're going to have to
use nnumbers to decide which element of the numbers[] array to store each new number in. Since arrays in
C are 0-based, the first number read will be stored in numbers[0], the second number will be stored in
numbers[1], etc. So it looks like we can store each new number into numbers[nnumbers], as long as
we're careful to increment nnumbers after we use it to decide which element of the array to store in. Our code
(which is no longer pseudocode) therefore looks like this:
nnumbers = 0;
{
numbers[nnumbers] = x;
nnumbers = nnumbers + 1;
}
We've met our goals of abandoning the for loop in favor of the (for this problem) more-natural while loop,
and of consolidating our use of the former variable i into the nnumbers variable. Our loop invariant is
(mostly) met: at the end of each trip through the loop, after incrementing nnumbers, nnumbers does indeed
contain the number of numbers read so far. (We've also lost some functionality, and introduced a bug: we're no
longer checking to make sure we store no more than 10 numbers in the array, so it could overflow. We'll fix this
up in a minute.)
Our purpose here (I still haven't forgotten) is to demonstrate the choice between prefix and postfix ++. If we
want to replace the "longhand" expression nnumbers = nnumbers + 1 with the shorthand ++ form, we
could (I suppose) write the body of the loop as
{
nnumbers++;
}
but this doesn't buy us much (nor is it an effective demonstration of the prefix vs. postfix decision). Here, the
expression nnumbers++ stands alone, with no surrounding expression, so it doesn't matter whether it "returns"
the old or the new value. We could just as well have written
{
++nnumbers;
}
We can, however, collapse the two statements in the loop together, with a result that will look like (reverting to
pseudocode again for a moment):
numbers[use nnumbers and also increment it] = x;
There are two ways to look at this process of collapsing: we can say that it's just another example of C
programmers trying to show off and be obfuscated by cramming everything possible into one line, or we can
say that it's a good way of expressing an idiom and strengthening a loop invariant. (The tack this article is
taking will seem to favor the latter interpretation, but I'm not going to try to jam it down your throat; I concede
that the C programmer proclivity to cram everything into one line can be annoying even when there are good
reasons for it.)

How do we decide whether to replace the pseudoexpression "use nnumbers and also increment it" with
++nnumbers or nnumbers++? Remember, we wanted to store the first value read into numbers[0], or
stated another way, to use nnumbers as a subscript before incrementing it. Prefix ++ means to increment the
variable and "return" the new incremented value, while postfix ++ yields the old, unincremented value. So
postfix ++ is exactly what we want here; the old body of the loop collapses down to the single expression
numbers[nnumbers++] = x;
When we put everything together, we'll also take the opportunity to restore the missing overflow check:
nnumbers = 0;
{
if(nnumbers >= 10)
{
printf("too many numbers!\n");
break;
}
numbers[nnumbers++] = x;
}
The nice thing about this code, besides the fact that it restores the missing overflow check, is that its loop
invariant -- "nnumbers always contains the number of numbers the user has entered so far" -- is even stronger.
When we wrote
there was one spot in the code, namely in the middle of the loop between the two statements, where the
invariant wasn't true, after all. The reason for collapsing those two lines down to a single expression is not to
show off how much we can accomplish in a single expression, it's because the tasks of "use nnumbers to
decide which element of the array to use" and "increment nnumbers" are (or should be) very tightly linked. It
would be a serious error to perform one task but not the other. By expressing the two tasks within one
expression in one statement, we reinforce the important connection between them. (I should point out, though,
that this is still only a stylistic consideration. There will be little if any difference in performance between the
longhand and shorthand forms of the loop, and if we were worried about more esoteric issues such as the truth
of the loop invariant in the face of interrupts, there would be no significant difference between them in that
respect, either.)
If you trace back through the discussion so far, you'll find that our choice of postfix ++ was rather closely tied
to the fact that arrays are 0-based in C. Let's suppose, for a moment, that for some reason you wanted to use a 1based array for numbers[]. In other words, you want the first value read to go into numbers[1], the second
into numbers[2], etc. If you don't mind wasting one element per array, it's simple enough to simulate a 1http://www.eskimo.com/~scs/readings/autoincrement.990118.html (6 of 9) [22/07/2003 5:53:35 PM]
based array in C -- just declare the array one bigger than it has to be:
int numbers[11];
Now you can use numbers[1] through numbers[10], and just kind of pretend that numbers[0] isn't
there.
In languages that support 1-based arrays by default, it's unfortunately rather common to see code which (to
mimic our ongoing example) initializes nnumbers to 1, since that's the first subscript value that will be used.
In other words, one way (though a poor way) to write the loop would be
nnumbers = 1;
{
}
nnumbers = nnumbers - 1;
/* correct for going one too far */
There's an obvious problem with this code: if its loop invariant was supposed to be "nnumbers always
contains the number of numbers the user has entered so far", it's now false most of the time; it's true only in the
middle of the loop, between the two statements. In fact, a more accurate loop invariant for the code as written
here would be that "nnumbers always contains one more than the number of numbers the user has entered so
far", which is obviously a much less useful statement. A poor loop invariant like this always has to be propped
up with fudge factors here and there, indicated by apologetic comments such as "correct for going one too far".
But 1-based arrays don't have to lead to convoluted code. Look how much simpler the code becomes if we
interchange the increment and the array assignment, and change the initial value of nnumbers back to 0:
nnumbers = 0;
{
}
Now, it's again mostly true that "nnumbers always contains the number of numbers the user has entered so
far", but since we increment nnumbers before using it, the first number read goes into numbers[1].
You can probably see where this is going. Replacing the longhand form numbers = numbers + 1 with
the autoincrement operator ++, and restoring the overflow test, we end up with this 1-based version:
nnumbers = 0;
{
if(nnumbers >= 10)
{
printf("too many numbers!\n");
break;
}
numbers[++nnumbers] = x;
}
If you compare this code to the previous 0-based version, you'll see that the only difference is that it uses prefix
++ instead of postfix.
***
To summarize, then: the difference between prefix and postfix ++ is that prefix "returns" to the surrounding
expression the new, incremented value of the variable, while postfix yields the old, unincremented value. This
distinction makes no difference if the prefix or postfix expression stands alone as a full expression; it only
makes a difference when the prefix or postfix expression sits as a subexpression within a larger expression
which cares about the value "returned." It may seem alarming that it's taken me 457 lines to describe this
distinction, and it's true that for most of the issues I've elaborated on here so agonizingly, experienced C
programmers make their decisions without thinking about each tiny detail so acutely. But if you're trying to
decide whether to use prefix or postfix autoincrement in some code, code which you've gotten as far as the
pseudocode
numbers[use nnumbers and also increment it] = x;
in defining, you do need to think about the fact that arrays are 0-based in C, and about your loop invariant,
before making your decision. When you do, you'll find that the prefix and postfix forms are not interchangeable,
that the distinction between them is significant and real, and that (for any given situation in which the value is
significant) one of them does just what you want while the other would be quite wrong.
Steve Summit
scs@eskimo.com
P.S. Here's the promised implementation of getnumberfromuser(), for completeness:
#include <stdio.h>
int getnumberfromuser()
{
char tmpbuf[25];
printf("enter a number [-1 to exit]: "); fflush(stdout);
if(fgets(tmpbuf, sizeof(tmpbuf), stdin) == NULL)
return -1;
return atoi(tmpbuf);
}
int main() vs. void main()
[This article was originally posted on August 23, 1996, and again with annotations on January 14, 1997
and March 20, 1998. I have edited the text further for this web page.]
[A few of the points I make in this article are becoming dated. It is becoming less and less kosher to
assume that a function declared without an explicit return type will be implicitly declared as returning
int; or to fail to return a value from a non-void function. In fact, the new revision of the ANSI/ISO C
Standard ("C99") actively disallows the first of these, and at least partially the second.]
Subject: Re: int main() ???? wrong?
Message-ID: <DwLyC9.5w5@eskimo.com>
References: <321b9857.75314178@news2.microserve.net>
<321cc6d4.152763064@news2.microserve.net>
Date: Fri, 23 Aug 1996 20:31:21 GMT
In article <321b9857.75314178@news2.microserve.net>, sidlip@lip.microserve.com writes:
> I see alot of reference to int main () being incorrect syntax..
> Ive gotten into the habit of doing it and would like to know what
> errors could pop up in my code from not voiding main.
I assume you mean, "what errors could pop up from not declaring main correctly." Speaking generally,
the same sorts of errors could pop up as occur any time a global variable or function is misdeclared.
Turning for a moment from main() and void to ints and doubles, let's consider the (incorrect)
code
#include <stdio.h>
extern int sqrt();
main()
{
printf("%d\n", sqrt(144.));
return 0;
}
This code is clearly trying to compute and print the square root of 144, but it almost as clearly will not
work. It declares that the library function sqrt() returns an int, and it tries to print sqrt's return
value using %d which expects an int. Yet, in actuality, sqrt() of course returns a double. The fact
that sqrt() returns a double, while the caller acts as if it returns an int, will almost certainly
prevent the program from working.
Why won't it work, exactly? It depends on the details of the particular machine's function call and return
http://www.eskimo.com/~scs/readings/voidmain.960823.html (1 of 7) [22/07/2003 5:53:40 PM]
mechanisms. We haven't said which machine we're using, and we shouldn't have to know the details of
these mechanisms (they're the compiler's worry, and to worry about them for us is one of the reasons
we're using a higher-level language like C in the first place), so we can speak only in general terms.
Perhaps the machine has one general-purpose register which is designated as the location where
functions return values of integer types, and one floating-point register which is designated as the
location where functions return values of floating-point types. If so, sqrt() will write its return value
to the floating-point return register, and the calling code will read a garbage int value from the generalpurpose return register. Or, perhaps integer and floating-point returns use the same locations. sqrt()
will write a floating-point value to the location, but the calling code will incorrectly interpret it as an
integer value. You'd then get approximately the same result as you'd get from executing
double d = 12.;
int *ip = (int *)&d;
In both cases, the bit pattern of the floating-point number is interpreted as if it were an integer. Since the
bit-level representations of integer and floating-point data are invariably different, confusing and
inaccurate answers result. (Note that in neither case -- the incorrect declaration of sqrt(), nor the
int/double pointer game -- does the compiler emit any code to do double-to-int conversion. Any
such conversions have been effectively circumvented by the peculiarities of these two code fragments. In
the case of calling sqrt(), you'd get a proper double-to-int conversion only if sqrt() were
properly declared as returning double, and the return value were assigned or cast to an int.) Finally,
suppose that values are returned on the stack, but that sizeof(double) is greater than
sizeof(int) (as is usually the case). sqrt() will push a double-sized result on the stack, but the
caller will pop an int-sized one. Not only will the caller print a garbage answer (interpreting those bits
it did pop as if they represented an int), but the shards of the double remaining on the stack could
screw later uses of the stack up enough that the program could crash.
In the preceding example, the calling code was incorrect, because it misdeclared the return value of the
sqrt() function. We're not allowed to choose what we want the return value of sqrt() to be, because
neither the defined return type of sqrt() (as fixed by the Standard and long practice) nor the actual
return type of sqrt() (as implemented in the library provided with our compiler) are under our control.
Issuing an external declaration for sqrt() declaring that it returns an int does not make it return an
int. (Nor does the misdeclaration instruct the compiler to convert sqrt's return value from double to
int; if the declaration is wrong, how could the compiler even know that the type to be converted from
was double?) Abraham Lincoln used to ask, "If we call a tail a leg, how many legs does a dog have?"
His answer was "Four -- calling it a leg doesn't make it one."
When we write an implementation of main() with a return type of void, the situation is similar, but
the roles are reversed. Now, it is the caller that is fixed and beyond our control, and that caller is
assuming that main() returns an int. You may imagine that somewhere (it's actually in the compiler
vendor's source for the C run-time library) there is some code which looks something like this:
extern int main(int, char *[]);

int argc;
char *argv[MAXARGS];
int status;
...
status = main(argc, argv);
exit(status);
If the caller declares main() as returning int, and you define main() as returning void, the
declarations are mismatched, just as the declarations of sqrt() were in the previous example. In
theory, the resulting program can fail in just the same sorts of ways. But, to reiterate, here we can't fix the
problem by fixing the caller (because the caller is, er, fixed). Instead, we have to fix main's
declaration, which is under our control, to match the caller's expectation.
Even if the program with the misdeclared main() "works" (that is, compiles without error, and runs
without crashing), it does result in a garbage (random) exit status being returned to the calling
environment. You or your command invocation environment may not be noticing that particular glitch
right now, but it is a glitch, and it may bite you later.
At one time, I had not gotten into the habit of making sure that main() called exit() or returned an
explicit value. (I wasn't declaring main() as void, but I was getting the same sorts of random exit
status values.) Whenever I wrote a little program (usually a special-purpose preprocessor) to automate
some step in the building of a large program, and stuck the program into one of the productions in a
Makefile, make would randomly abort the build, saying that there had been an error, whenever the
random status returned by the little program was nonzero (i.e. not EXIT_SUCCESS). Eventually, fixing
enough of those drove home to me the importance of always exiting with an appropriate, explicit status,
even in seemingly trivial programs.
Another place you'll notice the effect of a random exit status is if you run a program with one in the
background using a job-control shell under Unix; when it finishes, you'll get a message like "Done(1)" or
"Exit 1" which, if you've gotten used to the presence of the number indicating the presence of an error,
will mislead you into thinking that the command failed.
Yet other ways to see the results of particular exit statuses (that is, to depend on their being deterministic)
are when using errorlevel in DOS batch files, or when using DCL in VMS.
> several books use int main() as examples and others use void main(void)
Indeed.
> main returns an int i assume?
Yes. More precisely, main is supposed to return an int; what it actually returns is what you declare it
to return (and what your return statement(s), if any, say it returns). But if you implement main as
returning something other than what its caller expects it to return, you run the risk of its not working
correctly.
In article <321cc6d4.152763064@news2.microserve.net>, sidlip@lip.microserve.com goes on:
> Im curious why do i see void main(void) so much?
I honestly don't know.
The first few times I ever saw void main(), it was clear that the programmer was trying to avoid
warnings. The program
int main()
{
exit(0);
}
does not return a value from main(), but it doesn't matter, because control flow never falls off the end
of main(), because exit() never returns. But the compiler usually doesn't know that exit() doesn't
return, and may warn that "control reaches end of int-valued function without returning a value." In the
program
int main()
{
}
the compiler is likely to issue the same warning, and here it's perfectly appropriate, but the programmer
may not care, if the program isn't ever to be executed in an environment where the exit status matters. In
either case, declaring main() as void effectively shuts off that particular warning message, but at the
cost of making the program incorrect. (Indeed, some compilers will issue other warnings when main()
is misdeclared.)
It's because of this particular argument that the question currently known as 11.12 in the FAQ list is
worded the way it is, but this line of reasoning is not sufficient to explain void main()'s
unaccountable popularity today. It's become some kind of a meme plague: the "popular" textbooks all use
it, so hordes of unsuspecting C programmers learn it, and some of them go on to teach classes or write
more books which then spread the virus. In fact, my initial supposition (outlined above) of why people
might write void main() is now not only insufficient, it's almost completely forgotten! Today, when
people who use void main() are asked why they do, none of them ever mention shutting off
warnings about main() not returning a value. They all mumble something like "but everyone else uses
it" or "but it works on my compiler" or (particularly in the case of misguided teachers and textbook
authors) "that way we don't have to introduce the full complexity of return types right at first." (This last
justification is particularly specious, as we'll see.)
> what exactly is it saying? and why does it work ?
It tends to work (even though it doesn't have to and is not guaranteed to) for a combination of three
reasons:
1. Once upon a time, C did not have the void type. Functions which did not return values were
declared (usually implicitly) as returning int, but did not actually return values, and their callers
did not use the (nonexistent) return values. That's why, even today, the ANSI Standard does not
flatly prohibit falling off the end of a non-void function without executing a return statement,
or issuing an expressionless return statement within such a function. (The behavior is undefined
only if the caller tries to use the value, and never requires a diagnostic. Good compilers will emit
warnings.) In the old days, it was up to the programmer to remember which functions returned
values and which didn't, although this was of course one of the things which lint checked (and,
for those few who use it, still checks).
2. Partly because of #1, the actual low-level calling mechanisms for void-valued and intvalued functions are usually not different. A void-valued function simply omits to place a value
in the designated return value location, and the caller doesn't try to fetch a value from that
location. In these cases, the worst thing that happens when a function is incorrectly declared as
void is that the caller gets a garbage return value. Misdeclaring main (or any function) as void
will typically only result in catastrophic failures if values are returned on the stack: if the caller
pops a value but the caller didn't push one, the caller may instead have popped something vital
from the stack, such as a frame pointer or return address, such that when the caller later returns,
chaos results.
3. Since so much code (including the examples in some compiler manuals!) does misdeclare
main() as void, compiler vendors pretty much have to arrange that it works. (Alas, this
strategy is an "enabling" one; it allows the users of void main() to perpetuate their
rationalizations.) Posters to comp.lang.c can arrogantly state that "Your code is broken. Fix it.",
but compiler vendors who believe that "the customer is always right" cannot (particularly if the
vendor's offerings to date have permitted void main()).
(As an aside, these three reasons conspire to require nonstandard "pascal" keywords in compilers for
environments where code written in C must call libraries written in Pascal, or vice versa. I gather that
many Pascal compilers do use different calling conventions for procedures and functions. If C compilers
didn't have to coddle broken code by making void-valued and int-valued functions compatible, they
could arrange that void-valued functions were completely compatible with Pascal procedures. Instead, a
C program that calls a Pascal procedure must typically declare it as

extern pascal void f();
or some such. Similar arguments apply to "fortran" and "basic" keywords, and to fixed versus
variable-length argument lists.)
> also if type int is implied and () is implied as an empty delaration (in ANSI)
> why not main() (although ive never seen it)...
"Plain" main() is in fact pretty common. When I taught introductory C programming, I used to leave
the return type off of main() in examples, because was a distraction to have to explain what a
function's return type is at all at first, particularly in a function like main() where it isn't obvious who
the caller is or what would happen to the return value. (Alas, a certain amount of handwaving is still
required if the main() function includes an explicit return statement, as it must to keep all compilers
quiet.)
We can see, then, that an instructor or textbook author who doesn't want to "introduce the full complexity
of return types right at first" could and should demonstrate main() without an explicit return type, and
that using void instead is doubly counterproductive.
The situation is completely different, by the way, with respect to a function's declared parameters. The
declaration
extern main();
is an old-style declaration saying that main is a function taking unspecified (but fixed) arguments. The
definition
main()
{
}
, on the other hand, is an old-style definition saying that main takes zero arguments. Since the caller (as
we saw earlier) is going to be passing two arguments to main(), how can we get away with this
misdeclaration?
The answer is that main's parameter list is a very definite wart in Standard C. Anywhere else, if a
function is called with a number of arguments not matching its definition (and if the function does not
accept a variable-length argument list), the behavior is undefined. In this one case, however, main() is
allowed to be defined as accepting either zero arguments or two, of type int and char **. The reason
for the wart is of course to support old code, but a wart it is, and it may require some compilers to treat a
function named main specially.

> is return; implied return 0; ?
No. A return statement without an expression returns no defined value, and is valid only in voidvalued functions or in functions where the caller doesn't use the return value.
Steve Summit
scs@eskimo.com
[ Addendum: Despite the length of this article, it only barely gets around to explaining how, precisely,
declaring main() as void could fail. Besides the problems of a garbage exit status being received by
the calling environment, a more serious failure could occur if the compiler used different calling
conventions for void- and int-valued functions. If so, then for main() to push no return value on the
stack, and for the caller to pop an int value that main() hadn't pushed, could conceivably "screw later
uses of the stack up enough that the program could crash."]
[This article was originally posted on March 1, 1999. I have edited the text slightly for this web page.]
Subject: Re: "void main" is a lifestyle choice
Date: 1 Mar 1999 17:16:39 GMT
Message-ID: <7bei1n$tsl$1@eskinews.eskimo.com>
References: <36D9A98B.56C1@customized.removethis.com>
<36D9B89F.287C@customized.removethis.com> <7bco1u$cjd$1@eskinews.eskimo.com>
<36DA149D.757D@customized.removethis.com>
In article <36DA149D.757D@customized.removethis.com>, Sven writes:
>Steve Summit wrote:
>> Ben went a bit too far. Yes, to keep your compiler or lint
>> from complaining, you will probably need a "return 0" statement.
>> Yes, this is a statement that may never be executed, inserted
>> just to keep everyone happy. Yes, inserting it will bloat your
>> executable by a few bytes. All of that is okay in this case ->> really. If nothing else, it'll most certainly cost you less time
>> than arguing about it here.
>
> I made a point and I think you agree with it. I am not worried
> about code bloat or lost time for argument. Can't I raise an
> objection to an ANSI decision?
As a practical matter, no, you can't. That part of the Standard is very solidly set in stone. Furthermore,
like almost all aspects of the Standard, it codifies existing practice, which has always been that main has
been presumed to return int.
> Please keep this discussion in the academic/language-theory realm
> it was intended. If you disagree with me, then I expect a cogent
> argument in favour of "int main" and why it was chosen.
Unfortunately I don't have time for a full-blown argument this morning; perhaps another day. But
consider this: any interface specification must be adhered to by both sides unless renegotiated. You have
no more freedom to choose your own return type for main than you do for matherr, nor for the
functions passed as the last argument to qsort or bsearch, nor for the functions passed to atexit,
nor for the functions passed as the second argument to signal. Somebody else's code, code over which
you have no control, will be calling all of these functions, and unless you are prepared to get all compiler
and RTL authors to simultaneously rewrite their code to support your revised interface specification, and
to get all authors of C code in the world to simultaneously rewrite their code to work with the revised
compilers and RTLs, you're simply going to have to write this one off as one of life's little miseries, and
move on to fight other battles.

My (sincere, I am not being sarcastic here) recommendation to you is to write
return 0;
/*
/*
/*
/*
/*
This statement serves no purpose except

to coddle misguided and narrow-minded
specifications made by an ANSI committee
which didn't realize that I might write
a main() which never returns.
*/
*/
*/
*/
*/
at the end of main(). That's the sort of thing I do whenever I'm forced to comply with an inept interface
specification which I don't have time to fix. This accomplishes your purpose: you have written code
which is correct and which compiles without warning or error, and you have stated for the record your
objection to the way you were forced to write it. Then, as I said, move on.
I see that I haven't quite answered your question. Why was int chosen instead of, say, void? There are
two explanations. main has to have a single return type; there is no overloading in C. Some programs do
want to return an exit status from main and some don't, so one group or the other is sure to be
disappointed.
I was about to claim that it's easier for programs which don't want to return to comply with the dictum
that main must return int than it would be for programs that do want to return to deal with a world in
which ANSI specified that main was declared void, but I guess I might have gotten that backwards. In
a hypothetical world where main had to be declared void, programs which wanted to return an exit
status at the top level could simply call exit instead of returning. (This would wreak havoc with
programs which call main recursively, of course, but they're almost so rare they can be ignored.) So I
guess I have to fall back on the second explanation.
Once upon a time, C had no void type. Functions which didn't want to return anything just left out the
expressions in the return statements and left the return type out of function declarations. But the
default for a function with an omitted return type was always (until C99 came out) int. Ergo, in the
days before void, programmers who didn't want to return a value from main (and who thought they
were declaring main as returning nothing) were declaring main as int. Pre-ANSI, pre-void practice
was unanimously for main to be declared as returning int, so there was no other possible choice for
ANSI to standardize.
>> Other options -- depending on your compiler or lint -- are to
>> tell your compiler or lint that your program's main function
>> (that is, the one called from main that does all the work and has
>> a call to exit buried somewhere down within it) never returns,
>> or to insert some kind of a "notreached" directive right after
>> the call to that function in main(). Either of these options
>> (if available) may serve to suppress the "control reaches end of
>> int-valued function without return" message, without polluting
>> your code with a statement which will never be executed.
>
> Now we're talking about non-portable work-arounds to legitimately
> generated warnings and deliberately inserted unreachable code.
> Are you serious?
Yes.
Let me ask you this: what do you do when you have a function with an imposed, unchangeable signature
but which doesn't happen to need one of its parameters? Many compilers will warn you that "argument x
is unused in function f". Many times, this is an appropriate warning. Sometimes it is not. How do you
cleanly shut it off? (I honestly don't know. Right now, I'm ignoring it in a fair amount of production code
I'm compiling with gcc. Ten years ago, I would have inserted an /* ARGSUSED */ comment to
silence lint, but that's a nonstandard extension. If I finally get so sick of the warnings that I decide to
do something about it, I'll probably insert some dummy code which seems to do something with x, but
then that's wasted code which pollutes my clean program and accomplishes nothing at run time and
exists only to coddle various compiler warnings and interface specifications at compile time.)
>>> Remember that lint or compilers set to their highest warning levels
>>> don't understand program flow.
>>
>> Actually, many of them do.
>
> They can not. Program flow is not deterministic. A compiler can not
> determine ahead of time whether a function called in main will ever
> return control to main.
Yes, Sven, I've heard of the halting problem, too.
Many compilers know, semantically, that a call to a function named exit() never returns. Based on
this information, when faced with the code
int f()
{
exit(0);
}
they will suppress their normal warning about "control reaches end of non-void function without
return". Furthermore, some of these compilers allow you to specify (via some #pragma or
nonstandard declaration) that certain of your functions do not return, either.
My point is that these compilers do have a notion of control flow, a notion which is based not on
exhaustive analysis but on hints which the compiler has been given. Furthermore, since it's this very
same, weak form of control flow "analysis" that is driving the compiler's determinations that a "statement
is not reached" or a "variable is set but not used" or that "control reaches end of non-void function
without return", it is not inappropriate to be tying in with that weak control-flow analysis mechanism
when you're trying to suppress one of the warnings based on information which you have but which the
compiler cannot otherwise know. (But no, since all of this is compiler-specific, there is no Standard or
portable way to supply the hints or otherwise tie in with the mechanisms.)
>> "They" could not have left it as void, because it never was void.
>> The defined return type of main has always been int. main does
>> have to be a function returning int, because that's what the
>> calling code (remember, there is some code, namely in the
>> program's run-time start-up, that is going to call main) expects
>> it to be.
>
> Perhaps what I'm asking for is bona fide support for ANSI convention in
> the form of a prototype for "main" - maybe it should be in <stdlib.h>.
That would be wonderful, and it might help to put this eternal debate to rest. Alas, it's impossible,
because there is no one prototype for main -- it's either
int main(void);
or
int main(int, char **);
This is one of the larger warts on the ANSI/ISO C Standard, and it's there (like all the others) to codify
existing practice.
> Until then I can code my main function anyway I want to, despite
> ANSI wishes or the yelling to the contrary from this newsgroup.
Sure. And you can write a[i] = i++, or cross the street after the DON'T WALK sign has started
flashing, or take 13 items through the "12 items or less" aisle at the supermarket, or do any of a million
other things that sit squarely in the grey area between legality and illegality and which you might or
might not get called on.
Look, don't get me wrong. I salute your wish to write clean programs which express the programmer's
intent. I salute your wish not to pollute your code with nonstandard directives or meaningless or
unreachable code. Both of these are goals which I have for my own work as well. But I urge you to pick
your battles. (Reinhold Niebuhr's serenity prayer is applicable, too.) The situation with respect to main's
return status is not perfect, but it is a comparatively minor issue, and you will not have forsaken your
quest to write good, clean code by conceding defeat on this one. When you design your own, perfect
language (no sarcasm here, either; this is what I'm doing, too) you can fix the situation with respect to
main's return value once and for all.
(Also, I might explain or apologize for some of the other responses you've gotten in this thread. We're
not as stupid as you think we are, but you're new here, I think, and you've taken a fairly argumentative
and intolerant tone. It looks like some of your respondents have dealt with you rather summarily, not
noticing where you're coming from underneath.)
Steve Summit
scs@eskimo.com
P.S. No, matherr isn't standard, either.
Elegant
[In January, 1998, in my ISP's general discussion newsgroup, we were discussing software and the
marketplace, and I mentioned that all too many people don't even know what they're missing. Someone
asked what I meant, and this was my response. I have added to it slightly since then.]
Newsgroups: lobby
Subject: Re: software elegance (was: Root for the Indians this time?)
Date: 17 Jan 1998 17:56:38 GMT
Lines: 173
Message-ID: <69qrcm$ben$1@eskinews.eskimo.com>
References: <34b7e300.18628677@news.eskimo.com> <69bncf$e0v$1@eskinews.eskimo.com>
<69o1ib$qjj$1@eskinews.eskimo.com> <34c057b5.9765222@news.eskimo.com>
In article <34c057b5.9765222@news.eskimo.com>, wayneld@eskimo.com writes:
> On 16 Jan 1998 16:23:39 GMT, scs@eskimo.com (Steve Summit) wrote:
> > -- know what they're missing in terms of software elegance.
>
> What exactly is the defintion here of "software elegance?"
Why, vague and unstated, of course; otherwise how could we have eternal pointless arguments about it? :)
Elegant software is compact and efficient. Mundane software is bloated and slow.
Elegant software has features that people need and use. Mundane software has features that may look
good on paper and impress magazine editors, but which few people use in practice and which serve only
to add to the bloat (and the bug count).
Elegant software lets you do what you want to do. Mundane software lets you do what you want to do, as
long as you do it its way.
Elegant software does one job and does it well. Mundane software tries to do two or more jobs, and/or
does its job(s) clumsily (or sometimes does no job at all).
Elegant software has a clean model for the data it works with and the operations it performs on it; this
model is easy to understand and when you do, you find that it matches the way you think about your data.
Mundane software either has no well-defined model at all, or has one so bizarre that only its programmers
comprehend it, leaving you largely in the dark, forced to perform operations by rote, without
understanding them and powerless to adapt (to new situations or in the face of simple mistakes, either
yours or the software's).
Elegant software lets you do things its designers never imagined. Mundane software lets you do only
http://www.eskimo.com/~scs/readings/software_elegance.html (1 of 4) [22/07/2003 5:53:47 PM]
Elegant
those things its designers imagined.

Elegant software is configurable where it needs to be, but works well out of the box. Mundane software is
either not configurable at all, or so overly configurable that no mere user dares delve into its labyrinthine
configuration subsystem, and/or installs itself initially with defaults which no sane person would want to
use.
Elegant software lets you grow with it, providing shortcuts and advanced features which you can use
when you're ready to but which don't get in your way or bewilder you when you're not. Mundane
software is targeted at exactly one audience, and is either droolingly user obsequious but maddeningly
frustrating for an experienced user to use, or expert friendly but impossible for a beginner to use.
Elegant software lets you explore its features with confidence, and protects you from mistakes. Mundane
software punishes you sternly for the slightest misstep.
Elegant software comes with a clearly-written manual, indexed by topic, written from your point of view,
which you'll rarely need to refer to because the software is so intuitive and natural to use. Mundane
software comes with an enormous and incomprehensible manual (probably not printed but on-line, using
a new clumsy interface completely different from any other on-line help system you've ever used),
indexed by command, written from the technical writer's best approximation of the programmer's point of
view, supplemented after the fact by a cottage industry of third-party books which you can buy to really
help you use your new software.
Elegant software is self-contained and easy to install and uninstall. Mundane software requires 73
ancillary components before it will run; and if it is able to install them itself, does so in a way which
blows away newer versions of those components if you happen to have them; and if it is able to uninstall
itself, deletes those components even if other programs are relying on them.
Elegant software works with your machine and other programs on it. Mundane software takes over your
machine.
Elegant software adapts to its environment. Mundane software requires you to configure it or its
environment explicitly, and if some aspect of the environment prevents it from running correctly, either
crashes or exits with an error message which doesn't tell you what the problem is.
Elegant software can work with as much data as you have memory or disk space for. Mundane software
is riddled with arbitrary, fixed limits on the number of pieces of data you can work with and their sizes.
When it has restrictions, elegant software imposes them in such a way that they can be gracefully relaxed
in future versions. Mundane software imposes limits which immediately become locked in, and which
can never be removed, or can be worked around only with gross kludges, or can be abandoned only by
abandoning the old software and everything that it works with, and starting from scratch.
Elegant
Elegant software is programmable (when you're ready for it), and can be used by other programs.
Mundane software (as already mentioned) can perform only those functions which its designers explicitly
thought of, and none of those functions can ever be automated, because they depend heavily on visual,
graphical, user interactions.
Elegant software conforms (where applicable) to open, industry standards for file formats, networking
protocols, and the like. Mundane software invents its own formats and protocols, and depending on the
nerve of its vendor, attempts to foist them upon the industry as closed standards.
Elegant software lets you "pop the hood" and inspect your data or work with it in ways that the program
can't, i.e. makes it possible for you to access your data with external tools. Mundane software stores your
data in inscrutable formats which you can't use standard tools to access, and may actively prevent you
from even trying, in part by imposing a software license which forbids reverse engineering.
Elegant software is backwards and forwards compatible with past and future versions of itself. Mundane
software might be able to read one or two of its previous version's formats, but it can't write them, and it
crashes if it tries to read one of its future version's formats. (For that matter, elegant software uses openended data formats which rarely if ever need to be supplanted, while mundane software gratuitously
changes its data formats with every release whether it needs to or not.)
Elegant software is relatively free of bugs, and those that it has are minor, or off in corners where hardly
anyone notices them, and are usually easy enough to work around. Mundane software is full of glaring
bugs, many of which cause unrecoverable crashes if the software is used to do anything its designer didn't
imagine, is ever used outside the one narrow usage pattern which the designer had in mind and tested for.
When it happens to contain a bug serious enough to cause a crash, elegant software takes care to save
your data even as it's crashing. When mundane software crashes, it not only loses (or subtly garbles) your
data, it may also (if running under a mundane operating system) take other applications down with it, or
crash the operating system itself for good measure.
Elegant software works automatically on all machines you would expect it to, with all peripherals you
would expect it to, including those which haven't been built yet. Mundane software either works only
with the one specific hardware configuration that its designer used and tested for, or has been laboriously
tuned to work with a list of 437 existing hardware configurations which does not include yours.
Elegant software is general-purpose. Mundane software is unnecessarily specialized.
Elegant software assumes that you are intelligent and can learn, and that you are interested in using a
computer to save time, automate repetitive tasks, and eliminate busywork. Mundane software tries to do
everything for you (presuming that you can't), but ends up requiring you to specify everything you want
to do in minute, tedious, timeconsuming, mind-numbing detail.
Elegant
Elegant software is written by programmers who have their users in mind, who listen to user feedback,
who are rewarded for their creativity, who code with an eye to demonstrable, lasting correctness, and who
are constantly searching for new and better ways to write interesting programs with fewer bugs. Mundane
software is written by programmers who have themselves in mind as users, who try to tell other users
how to think like they do, who are rewarded for the number of lines of code they write per day or for how
well they're able to implement specs delivered from on high, who are happy with a program if it seems to
work today, who would like best to keep writing programs exactly as they wrote them yesterday, and who
regard bugs as inevitable.
Elegant software is a joy to use. Mundane software is a pain in the ass.
Steve Summit
scs@eskimo.com
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
autoincrement.990118..> 11-Nov-1999 16:56
21k
correctness.950817.html 08-Apr-1997 11:48
4k
index2.html
27-May-2001 12:53
2k
precvsooe.20010512.html 27-May-2001 13:06
11k
precvsooe.960725.html
15-Jul-2003 07:22
21k
12-May-2001 08:10
13k
recursion.990209.html
09-Nov-2001 07:35
15k
software_elegance.html
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
undef.981105.html
26-Mar-1999 08:40
11k
voidmain.960823.html
30-May-2000 21:12
18k
30-May-2000 21:10
12k
http://www.eskimo.com/~scs/readings/ [22/07/2003 5:53:51 PM]
Size
Description
Name
Last modified
Parent Directory
19-Jun-2003 11:00
30-May-2000 21:10
12k
30-May-2000 21:12
18k
undef.981105.html
26-Mar-1999 08:40
11k
undef.950321.html
03-Jan-1997 21:49
20k
undef.950311.html
03-Jan-1997 22:57
6k
strings.980926.html
05-Dec-1999 10:00
10k
03-Sep-2000 10:20
9k
09-Nov-2001 07:35
15k
12-May-2001 08:10
13k
15-Jul-2003 07:22
21k
11k
index2.html
Size
27-May-2001 12:53
2k
4k
21k
http://www.eskimo.com/~scs/readings/?N=D [22/07/2003 5:53:53 PM]
Description
Name
Last modified
Parent Directory
19-Jun-2003 11:00
undef.950321.html
03-Jan-1997 21:49
20k
undef.950311.html
03-Jan-1997 22:57
6k
4k
undef.981105.html
Size
26-Mar-1999 08:40
11k
21k
strings.980926.html
05-Dec-1999 10:00
10k
30-May-2000 21:10
12k
30-May-2000 21:12
18k
03-Sep-2000 10:20
9k
12-May-2001 08:10
13k
index2.html
27-May-2001 12:53
2k
11k
09-Nov-2001 07:35
15k
15-Jul-2003 07:22
21k
http://www.eskimo.com/~scs/readings/?M=A [22/07/2003 5:53:57 PM]
Description
Name
Last modified
Size
Parent Directory
19-Jun-2003 11:00
15-Jul-2003 07:22
21k
09-Nov-2001 07:35
15k
11k
index2.html
27-May-2001 12:53
2k
12-May-2001 08:10
13k
03-Sep-2000 10:20
9k
30-May-2000 21:12
18k
30-May-2000 21:10
12k
strings.980926.html
05-Dec-1999 10:00
10k
21k
undef.981105.html
26-Mar-1999 08:40
11k
4k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
http://www.eskimo.com/~scs/readings/?M=D [22/07/2003 5:54:00 PM]
Description
Name
Last modified
Parent Directory
19-Jun-2003 11:00
index2.html
27-May-2001 12:53
2k
4k
undef.950311.html
03-Jan-1997 22:57
6k
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.981105.html
26-Mar-1999 08:40
11k
11k
30-May-2000 21:10
12k
12-May-2001 08:10
13k
09-Nov-2001 07:35
15k
30-May-2000 21:12
18k
undef.950321.html
03-Jan-1997 21:49
20k
21k
21k
15-Jul-2003 07:22
http://www.eskimo.com/~scs/readings/?S=A [22/07/2003 5:54:03 PM]
Size
Description
Name
Last modified
Parent Directory
19-Jun-2003 11:00
15-Jul-2003 07:22
21k
21k
undef.950321.html
03-Jan-1997 21:49
20k
30-May-2000 21:12
18k
09-Nov-2001 07:35
15k
12-May-2001 08:10
13k
30-May-2000 21:10
12k
11k
undef.981105.html
26-Mar-1999 08:40
11k
strings.980926.html
05-Dec-1999 10:00
10k
03-Sep-2000 10:20
9k
undef.950311.html
03-Jan-1997 22:57
6k
4k
index2.html
2k
27-May-2001 12:53
http://www.eskimo.com/~scs/readings/?S=D [22/07/2003 5:54:07 PM]
Size
Description
Name
Last modified
Parent Directory
19-Jun-2003 11:00
21k
4k
index2.html
27-May-2001 12:53
2k
11k
15-Jul-2003 07:22
21k
12-May-2001 08:10
13k
09-Nov-2001 07:35
15k
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
undef.981105.html
26-Mar-1999 08:40
11k
30-May-2000 21:12
18k
30-May-2000 21:10
12k
http://www.eskimo.com/~scs/readings/?D=A [22/07/2003 5:54:11 PM]
Size
Description
Recursion
[The text here is extracted from a pair of articles originally posted on February 9, 1999.]
Subject: Re: Other examples of recursion?
Date: 10 Feb 1999 04:52:03 GMT
Message-ID: <79r39j$4vs$1@eskinews.eskimo.com>
Here is one way to think about recursion.
Think about where you live, and think about the last time it was really messy, and you asked your
roommate or spouse or significant other to help clean up. If you're unlucky, they picked up one totally
obvious but well-constrained and easy-to-pick-up piece of trash, and then decided that they'd done
enough, and curled up on the couch with a beer to watch a football game while you did all the rest of the
hard work. (Or, if your roommate/spouse/significant other is the unlucky one, perhaps you pulled this
stunt on them. It doesn't matter what the roles were, the point is that there was a totally unfair division of
labor.)
Shifting from housecleaning to computer programming, let's imagine it's our job to take a number (in
binary, say) and print out the digits of its base-ten representation in conventional left-to-right order. If
you've never attacked it before, this is an at least slightly tricky problem. You can divide by 10 to get
individual digits, but of course dividing by 10 gives you the rightmost digit first, so if you're not careful,
you end up with the digits in the wrong order.
But let's stick with this approach a bit longer, and see if we can get anywhere. We can start with the
original number, take the remainder when dividing it by 10, and print that digit. Then all we have to do it
figure out how to print the remaining digits, first. In pseudocode, we have something like
printdigits(int n)
{
int d;
print all but the last digit of n;
d = n % 10;
print digit d;
}
Printing one digit is easy -- the digit characters '0' to '9' are certain to have consecutive values in any
sane character set (and the ANSI/ISO C Standard mandates the choice of a character set that is sane in
this regard), so converting an integer d which is known to be in the range 0 to 9 simply requires adding
that integer as an offset to the character constant '0'. Applying that technique, and deferring the other
vaguely-defined part of the problem to a subfunction, we have a function which is no longer pseudocode,
but C:
http://www.eskimo.com/~scs/readings/recursion.990209.html (1 of 7) [22/07/2003 5:54:19 PM]
Recursion
printdigits(int n)
{
int d;
print_all_but_the_last_digit_of(n);
d = n % 10;
putchar('0' + d);
}
Notice how much like the lazy roommate this code is. It does the falling-off-a-log easy part, and lets
somebody else (namely, the hypothetical function print_all_but_the_last_digit_of) do all
the hard work. Moreover, it successfully pulls off an additional bit of brazen audacity which not even the
laziest roommate could get away with: it lets the somebody else do the hard part first, then waltzes back
from the couch and does the easy bit last, just before the final curtain falls and the applause begins. What
cheek!
Actually, we've got one piece of cheekiness left. We've still got to figure out how to print all but the last
digit of n, and of course we've got to print it as a string of left-to-right digits. So it looks as if we haven't
gotten anywhere -- (no surprise, right, with a roommate this lazy and unhelpful!) -- the
print_all_but_the_last_digit_of() function is going to be exactly as hard to write as
printdigits() was; the "work" that's been done (or that will have been done) so far hasn't helped us
out at all.
But wait -- printing out all but the last digit of n, in left-to-right-order, is exactly the same problem as
printing out the digits of the quantity n/10 in left-to-right order, where we use integer arithmetic to
compute n/10 and throw away the remainder. So we now have a glimmering idea of a staggeringly
audacious move which will make our earlier sloth seem virtuous by comparison: what if we call
ourselves to do the hard part, then finish up and be done with it? That is, what if we write
printdigits(int n)
{
int d;
printdigits(n/10);
d = n % 10;
putchar('0' + d);
}
where we've replaced the call to the unwritten function print_all_but_the_last_digit_of()
with a call to the function printdigits(), which is the function we were supposed to write in the
first place?? (A function which in fact we've already "written", except that our attempt so far seemed
laughingly incomplete and not at all functional...) We can't possibly get away with this, can we? This is
like trying to become rich by borrowing a dollar from yourself one million times so that you have
$1,000,000, or inventing a time machine by inventing a time machine and then going back in time and
Recursion
giving it to yourself, or attempting to fly by pulling yourself up by your own bootstraps, or trying to
clean your room by asking your little brother to do it, except that you're an only child, and the "little
brother" you're asking is just your reflection in a mirror you're holding behind you, in which the
ephemeral little brother you're asking to do the work is simultaneously asking an even-more-ephemeral
little brother standing behind him, who is asking the reflection behind him, ad infinitum...
Remarkably enough, we can do this, as long as we back off from our stance of utter sloth by one little
notch. In the case when the number we're printing has only one digit, we don't need to delegate any of the
work. (Perhaps this decision only cements our reputation as a glory-hogging layabout: for one-digit
numbers, we take all the credit.) At any rate, the final implementation looks like this:
printdigits(int n)
{
int d;
if(n > 9)
printdigits(n/10);
d = n % 10;
putchar('0' + d);
}
The additional test breaks the infinite recursion; the function won't call itself forever. It will call itself as
many times as there are digits in the number, and the deepest recursive call will print the leftmost digit
first, and on the way back, each successive invocation will append another to the right. If you're having
any trouble seeing this, try rewriting the function as
printdigits(int n)
{
int d;
printf("printdigits(%d)\n", n);
if(n > 9)
printdigits(n/10);
d = n % 10;
printf("printing digit %c\n", '0' + d);
}
When called with a number such as 12345, you'll first see a flurry of recursive calls, as each invocation
madly tries to pawn the job off on somebody else, until finally there's only one digit left, and the
innermost recursive call condescends to do some actual work, and the logjam is broken, and the
containing calls chip in their pieces in succession, and the job is done, in a symphony of concerted
laziness.
Recursion
[The next section is adapted from the web page http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html.]

An even cleaner demonstration of the power of recursion concerns the traversal of binary trees. A binary
tree is a linked data structure in which each node has up to two subtrees, with all the items in a node's left
subtree being (by definition) less than the item contained in the node, and all the items in the right
subtree being greater. For example, here is a binary tree containing the letters A though K:
F
/ \
/
/ \
/ \
/ \
C
/ \
E
This is not the only binary tree containing the letters A through K; there's quite a number of different
ways of arranging it. But note that, for any letter, all the letters in the subtree below and to the left of it
come before it in the alphabet, and all the letters in the right subtree come after. (The top item in a node's
left subtree is not necessarily immediately less than the item at that node or anything; the immediatelypreceding item is merely down in the left subtree somewhere, along with all the rest of the preceding
items. In the tree I've just shown, for example, the letter F appears at the top of the tree although it's
obviously neither the first nor the last letter in the list. The letter B which is at the top of F's left subtree
does not immediately precede F; the letter immediately preceding F is E, which is indeed down in F's left
subtree somewhere.)
We could represent one node in such a tree with an instance of the structure
struct tnode
{
char letter;
struct tnode *left;
struct tnode *right;
};
To be clear: that's one structure describing one node; for a tree with many nodes we'd have many
instances of the structure. (We'd probably allocate them dynamically using malloc.)
This structure may look dangerous -- if we're trying to describe a struct tnode, what are those little
struct tnodes doing inside it? They're pointers to the node's left and right subtrees, and the reason
they're okay is precisely that they are merely pointers; they are not complete struct tnodes. (If
Recursion
every struct tnode tried to contain one or more complete struct tnodes, the structure would
obviously need to be infinitely big.) It's perfectly acceptable (and quite common) for a structure to
contain pointers to other instances of itself; this is precisely how lists, trees, and all sorts of other linked
data structures are commonly implemented in C.
Suppose it's our job to print one of these binary trees. We've just been handed a pointer to the base (root)
of the tree. What do we do? The only node we've got easy access to is the root node, but as we saw, it's
not a very interesting node (i.e. not the first or the last element or anything); it's generally a random node
somewhere in the middle of the eventual sorted list (distinguished only by the fact that it happened to be
inserted first). The node that needs to be printed first is buried somewhere down in the left subtree, and
the node to print just before the node we've got easy access to is buried somewhere else down in the left
subtree, and the node to print next (after the one we've got) is buried somewhere down in the right
subtree. In fact, everything down in the left subtree is to be printed before the node we've got, and
everything down in the right subtree is to be printed after. A pseudocode description of our task,
therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another tree, so printing it out
sounds about as hard as printing an entire tree, which is what we were supposed to do in the first place.
In fact, it's exactly as hard: it's the same problem. Are we going in circles? Are we getting anywhere?
Yes, we are: the left subtree, even though it is still a tree, is at least smaller than the full tree we started
with. The same is true of the right subtree. Therefore, we can use a recursive call to do the hard work of
printing the subtrees, and all we have to do is the easy part: print the node we're at. The function might
look like this:
{
if(p->left != NULL)
treeprint(p->left);
printf("%c\n", p->letter);
if(p->right != NULL)
}
In any recursive function, it is (obviously) important to terminate the recursion, that is, to make sure that
the function doesn't recursively call itself forever. In the case of binary trees, the fact that the left and
right subtrees are smaller (than the tree containing them, that is) gives us the leverage we need to make a
recursive algorithm work. When we reach a "leaf" of the tree (more precisely, when the left or right
subtree is a null pointer), there's nothing more to visit, so the recursion can stop. It turns out that we can
Recursion
test for this in two different ways, either before or after we make the "last" recursive call. In the code just
above, we tested before the call. Here's the other way to do it:
{
if(p == NULL)
return;
treeprint(p->left);
printf("%c\n", p->letter);
}
Sometimes, there's little difference between one approach and the other. Here, though, the second
approach has a distinct advantage: it will work even if the very first call is on an empty tree. It's
extremely nice if programs work well at their boundary conditions, even if we don't think those
conditions are likely to occur.
Another impressive thing about a recursive treeprint function is that it's not just a way of writing it,
or a nifty way of writing it; it's really the only way of writing it. You might try to figure out how to write
a nonrecursive version. Once you've printed something down in the left subtree, how do you know where
to go back up to? Our struct tnode only has pointers down the tree; there aren't any pointers back to
the "parent" of each node. If you write a nonrecursive version, you have to keep track of how you got to
where you are, and it's not enough to keep track of the parent of the node you're at; you have to keep a
stack of all the nodes you've passed down through. (Even if we did have back pointers, it's not clear that
they'd give us enough extra information to work our way though the tree in the correct order.) When you
write a recursive version, on the other hand, the normal function-call stack essentially keeps track of all
this for you.
Subject: Re: Other examples of recursion?

Date: 10 Feb 1999 05:52:21 GMT
Message-ID: <79r6ql$72q$1@eskinews.eskimo.com>
Here are two more examples, one for computing the greatest common denominator by Euclid's method,
and one for computing Fibonacci numbers:
int gcd(int a, int b)
{
if(b == 0)
return a;
Recursion
else
return gcd(b, a % b);
}
int fib(int n)
{
static int fibs[24] = {0, 1};
if(n < 0 || n >= 24)
return -1;
if(fibs[n] == 0 && n > 0)
fibs[n] = fib(n-1) + fib(n-2);
return fibs[n];
}
[Both of these examples are lifted from Volume II, Chapter 3 of Macmillan's Handbook of Programming
Languages.]
The fib() implementation uses "memoization", involving static data, to avoid the exponential recursive
explosion which otherwise dooms Fibonacci numbers to being the classic example of when not to use
recursion.
Steve Summit
scs@eskimo.com
`Ùseful'' Links
Search engines
Dave Walker's Eskimo Weather Station
UW weather info
UW earthquake info
U.S. Patent & Trademark Office
U.S. Geological Survey (including EROS Data Center, USGS map data online!)
Northwest Folklife festival
The RISKS archive
DeLorme map company
U.S. Library of Congress (search)
Monty Python's Flying Circus
Unicode
NOAA Coastal & Estuarine Oceanography Branch, including tide predictions
U.S. Naval Observatory Directorate of Time
Transit and Transportation info
data file formats
cryptography, authentication, privacy, and anonymity links
http://www.eskimo.com/~scs/links.html [22/07/2003 5:54:23 PM]
Search Engines
Digital's Alta Vista (web and Usenet)

Northern Light
Google
Yahoo!
"Google groups" (Usenet, formerly Dejanews) (also advanced search)
Excite
Infoseek
http://www.eskimo.com/~scs/searchengines.html [22/07/2003 5:54:25 PM]
Transportation info
Seattle & Pacific Northwest
Metro Transit ``Riderlink'' (Seattle, including bus timetables)

Washington State Department of Transportation (DOT)
Washington State Ferries
Trains Serving the Pacific Northwest
British Columbia Ferry Corporation
Northeast
Massachussets Bay Transportation Authority (Boston's "T", esp. subway)

New York City subways:
official NYC transit

www.nycsubway.org
Joeseph Brennan
Other
Bi-State Development Agency (St. Louis Metro, etc.)

American Public Transit Association
Amtrak (including timetables)
Travelocity
http://www.eskimo.com/~scs/transit.html [22/07/2003 5:54:28 PM]
Data File Formats
Data File Formats

Generic Data File Formats
Hierarchical Data Format (HDF)

Common Data Format (CDF)
NetCDF
BER ("Basic Encoding Rules", associated with ASN.1, X.409, and ISO8825)
XDR (the eXternal Data Representation) is described in RFC 1014 and RFC 1832 (Even though
RFC's never change, they seem to move around a lot; if those links don't work try RFC1014 and
RFC1832)
Resources on Specific Data File Formats
Wotsit
James D. Murray's Graphics File Formats FAQ list (also published as the Encyclopedia of
Graphics File Formats)
The Audio File Formats FAQ (currently maintained by Chris Bagwell; originally by Guido van
Rossum)
.dbf files
Related Links
Ilana Stern's old sci.data.formats FAQ list

(this link is somewhat stale; if it doesn't work, try here or here or here or here)
Jean-loup Gailly's compression FAQ
Chapter 17 in my C Programming Notes is all about Data Files
There is a little bit of information about data files in question 20.5 in the comp.lang.c FAQ list
http://www.eskimo.com/~scs/datafiles.html [22/07/2003 5:54:29 PM]
Cryptography Resources and References
PGP attack FAQ

MIT PGP info
Jeff Licquia's original PGP FAQ list
Arnoud Engelfriet's updated PGP FAQ list
an MIT PGP FAQ list
Tom Knig's SSH FAQ
SSH home page
Kerberos FAQ list
RSA
Randall Williams's passphrase security FAQ list
Galactus's Anonymity and privacy page
http://www.eskimo.com/~scs/crypto.html [22/07/2003 5:54:31 PM]
Home Pages
Personal:
Jutta Degener (maintainer of the Lysator C archive)

Pete Keleher (author of Alpha)
Don Knuth (author of The Art of Computer Programming)
Ron Newman, keeper of the Good Netkeeping Seal for newsreaders and a history of Scientology
vs. the Net
Eric Raymond, current editor of the Jargon File
Dennis Ritchie
Clive Feather
Other:
Browser-specific home pages (the ones you get with default-configured browsers)
Eskimo North
RSA
Federal Express
U.S. Naval Observatory Directorate of Time
(back to) scs
http://www.eskimo.com/~scs/homepages.html [22/07/2003 5:54:33 PM]
Browser Home Pages
Lynx home page?

NCSA Mosaic home page
Netscape home page
http://www.eskimo.com/~scs/browserhome.html [22/07/2003 5:54:35 PM]
Linux ISP Eskimo North 56k, ISDN, Shell Access, Lists
Full featured Linux friendly Internet dial and ISDN access, shell accounts, web hosting, usenet news and mail, with human tech
support as low as $12.00 per month (5-year dial plan L)
Unix Shell, 56k, Dual 56k, and ISDN!

So many features you won't know where to begin!
Web Hosting, game hosting, virtual domains too!
We go the extra mile to provide service with a smile!
Click here for a FREE two-week trial!
Services & Rates

Internet Access
* Two-Week Trial
* 56k Dial Access
* 112k Dial Access
* ISDN Dial Access
* UUCP Dialup
Internet Hosting
* Game Hosting
* NNTP Read/Post
* Shell Access
* Virtual Domains
* Web Hosting
Western WA Only
* Dedicated PPP
* Frame Relay
Support
Dial Access Settings
* FreeBSD (console)
* Linux (console and X)
* Macintosh
* OS/2
* Sega Dreamcast
* Windows
Application Settings
* Email Settings
* PostgreSQL Database
* SSH Port Forwarding
* Usenet News Settings
* Website Creation
Other Support
Linux User Group Members

10% Discount On Eskimo
North Services
Linux User Group members can get a 10%
discount on Eskimo North's retail services. To
qualify for this discount your user group must
have a link on their website to our home page
and must list Eskimo North as a "Linux Friendly
ISP". You must provide the URL of the page
containing the link and evidence of membership
in the Linux User Group in question.
Download a file of the access numbers:

http://www.eskimo.com/locations.txt
Canada 56k, 112K, and ISDN Dial Access

Numbers
Alberta
Nova
Scotia
British
New
Manitoba
Newfoundland
Columbia
Brunswick
Prince
Ontario
Quebec Saskatchewan
Edward Is.
United States 56k, 112k, and ISDN Dial Access

Numbers
Alabama
Arizona
http://www.eskimo.com/ (1 of 2) [22/07/2003 5:54:43 PM]
Arkansas
California Colorado
Linux Friendly
System Information
Announcements
Eskimo Login Message *
General Announcements *
System Outages *
Miscellaneous
Equipment Details *
Eskimo North History *
Job Opportunities *
Links
Internal
Link to Us *
Random User Homepage *
User Pages Directory *
Business/Organizations
Animals *
Automotive *
Clothing / Accessories *
Computers / Internet *
Entertainment *
Food / Drink *
Gift Ideas *
Health / Medicine *
Home / Garden *
Media *
Recreation *
Science / Technology *
Society / Culture *
External
Linux ISP Eskimo North 56k, ISDN, Shell Access, Lists
* Contact Us
* Find Your IP Address
* Swish Web-Indexing
* UnixHelp (local mirror)
* Virus Alerts and Filters
Search Dialup Pool
Search
States / Provinces
Phone Numbers
Cities/Regions
District of
Florida Georgia
Columbia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky Louisiana
Maine Maryland
Massachusetts Michigan Minnesota Mississippi Missouri
New
New
Montana
Nebraska
Nevada
Hampshire Jersey
North
North
New Mexico New York
Ohio
Carolina
Dakota
Rhode
South
Oklahoma
Oregon Pennsylvania
Island Carolina
South Dakota Tennessee
Texas
Utah
Vermont
Virginia
WashingtonWest Virginia Wisconsin Wyoming
Connecticut
Delaware
Eskimo North, Inc.

Postal Mail: P.O. Box 55816
Shoreline, WA 98155-0816
Telephone: (206) 812-0051
Toll Free: (800) 246-6874
Fax: (206) 812-0054
E-Mail: support@eskimo.com
Support hours
Weekdays: 7:00am - 10:45pm Pacific (US)
Weekends: 11:00am - 6:45pm Pacific (US)
Copyright 1998-2003
Eskimo North
http://www.eskimo.com/ (2 of 2) [22/07/2003 5:54:43 PM]
Linux User Groups *

Miscellaneous Links *
Powered by Apache *
Powered by Linux *
Search Engines *
Search Our Site
Search
System Pages
User Pages (not yet)
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
Linux Friendly 56k V.90 and Dual-56k (112k MLP) Dial Access
Included Services
56K or 112K dial access includes, Shell access, E-mail, 2-mail lists, Usenet News, web hosting with CGI and
SSL, FTP hosting, and two IRC bots.
Apply for a FREE two week 56k dial access trial.

56k/112k V.90 Dial Access Rates
Dial Access
Monthly
Session
HalfOne
Two
Five
Chans Speed Month Quarter
Subscription
Hours
Limit
Year
Year
Year
Year
Term
Disk Space
20MB
20MB
20MB
50MB 110MB 300MB
Included
Dial plan L is a combination of our own 56k/ISDN dialup pools in CA,IL,OR,NV, and WA, plus multiple national
networks.
Single or dual channel service analog or ISDN is available in most locations.
24x7 service, which disables idle and session timeouts and allows running servers, is available only on Eskimo
POPs.
Unlimited 12 Hours
56k Plan L 1
56k US $22 US $60 US $108 US $180 US $336 US $720
Not 24x7
*
Eskimo
US
Unlimited 12 Hours
112k Plan L 2
112k US $44 US $120 US $216 US $360 US $672
Not 24x7
*
$1440
Eskimo
US
56k Plan L 24x7/Servers Unlimited 1
56k US $40 US $117 US $228 US $420 US $792
$1920
Eskimo 24x7
US
US
112k Plan L 24x7/Servers Unlimited 2
112k US $80 US $234 US $456 US $840
$1584
$3840
Eskimo 24x7
Other Dial Plans
56k Plan W 150/month 4 hours
1
56k US $22 US $60 US $108 US $180 US $336
N/A
WorldCom/UUNet
Unlimited
4 Hours
1
56k US $22 US $60 US $108 US $180 US $336
N/A
56k Plan Y - YNP
Not 24x7
Longer period subscriptions cost less and reduce time that must be spent doing accounting
each day allowing more time to be spent improving services and addressing customer concerns.
56k and 112k Dial Access Plan Differences
http://www.eskimo.com/services/dial.html (1 of 3) [22/07/2003 5:54:48 PM]
Dial access plan L is a combination of our own modem pools plus multiple national networks. Plan L provides
unlimited monthly usage, the longest session times, and the most access numbers. It is available as single or
dual channel, analog or ISDN, in most locations. 24x7 nailed use is available on Eskimo POPs only. We
recommend using our dial plan L modem pool. We can provide better service with our own POPs because we
have more control over factors that affect service. In areas where we don't have our own POPs, dial plan L has
multiple networks for redundancy.
The 12 hour session limit is only guaranteed on our own POPs. We pass this value in Radius to other networks
we use but some of them override it with shorter values.
24x7 service is available on Eskimo POPs only (the numbers in teal). 24x7 service disables idle and session
timeouts so your session will remain up indefinitely and using this service it is permissable to run servers over
your connection.
Canada Dial Access Numbers

Alberta
Nova Scotia
British Columbia
Ontario
Manitoba
Prince Edward Is.
New Brunswick
Quebec
Newfoundland
Saskatchewan
Arkansas
District of Columbia
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Florida
Indiana
Maine
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Colorado
Georgia
Iowa
Maryland
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
United States Dial Access Numbers

Alabama
Connecticut
Hawaii
Kansas
Massachusetts
Montana
New Mexico
Oklahoma
South Dakota
Utah
Arizona
Delaware
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Dynamic DNS - Alternative to Static IP

Dynamic DNS services which map a dynamic IP to a static hostname provide an alternative to a static IP
address for many functions such as ETRN polling, Net Meeting, and some VPN applications. You register a
hostname with these services and download a small client which runs when you establish your dialup
connection. This contacts their host and then they map a static hostname to the IP address you were assigned.
Most of these services are free of charge. A partial list of dynamic DNS services: d2g.com, dns2go.com,
dnsq.org, dyndns.org, dynip.de, dynodns.net, dynup.net, hn.org, myip.org, and orgdns.org.
Setup
You will be called to confirm the information on your free trial account application. A 56k or dual-56k dial access
Username and Password will be e-mailed to you at the address specified on your application after successful
confirmation. The dial access Username and Password is used only for establishing your dial-up connection.
The login and password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP
read/post access, shell access, ftp access to your account, etc.
For help with specific operating system setup, click on one of the following links:
Linux
Windows 95/98
MacOS
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no refund.
Spamming, sending mass unsolicited e-mails or mass news group cross-posting.

Attempting to bypass system security of our hosts or using our hosts as a point of attack on other
Internet hosts.
Denial of service attacks such as packet flooding and mail-bombs.
Originating or participating in chain letter pyramid schemes.
Any illegal or unethical activity such as exchanging child pornography, exchanging copyrighted software
without the copyright holders permission, or offering controlled substances for sale without a
prescription, Internet gambling, etc.
Excepting dial plan L 24x7 service, defeating idle timeouts to hold a dialup connection open when it is
not actively being used.
Excepting dial plan L 24x7 service, dial acess is not intended for running servers. We bundle hosting
services free with your dial-up account for those needs. Client-Server applications like Napster, WinMX,
Net Meeting, are permissable so long as you only leave them up while you are actively using them.
Since a free trial period is offered and lower rates are given for long term accounts, we do not provide refunds.
This is true even if your account is terminated for abuse (activities listed above).
Eskimo North: Shells
Shells!
We sell C-shells by the seashore!
Ok, not quite by the Seashore, about four miles from Puget Sound shore.
Apply for a FREE no risk, no obligation, two week trial by filing out the New User Application.
Shell Account Rates
Shell Subscription
Month
Term
Disk Space
Included
20MB
Quarter
20MB
Half-Year
20MB
One Year
Two Year
Five Year
50MB
110MB
300MB
Primary shell accounts allow you to use our host services from another access provider.
Primary Shell
US $12
US $30
US $54
US $96
US $180
US $420
Secondary shell accounts are available only as an adjunct to dial access subscriptions.
Secondary Shell
US $6
US $15
US $27
US $32
US $60
US $140
20MB of disk space is included (50MB with yearly payments, 110MB with two-year
payments). Additional disk quota is available for $1 per ten megabytes per month.
Primary shell accounts allow you to use our host services from another access
provider. If you have a cable modem, DSL, satellite connection, or dial access
through another provider but wish to utilize our host services, such as shell access,
e-mail, news, and web hosting, then you need a primary shell account.
Secondary shell accounts are available as an adjunct to dial access subscriptions.
The provide an additional pop3 e-mail box and web space for additional family
members or employees in the same physical location using the same computer or
router and associated dial access connection.
http://www.eskimo.com/services/shells.html (1 of 3) [22/07/2003 5:54:52 PM]
Shell Services
One shell account is included with each of
the following account types:
dial access
UUCP news/e-mail
frame relay
NNTP read/post access.
dedicated dial-up PPP
network game hosting
Web Hosting.
For an additional fee, Virtual
Domains.
An anonymous FTP directory
Two Smartlist mailing lists *
Hosting for two IRC bots *
X applications
E-mail
Usenet News
Menu Interface (optional)
Emulated SliP/PPP
Smartlist lists are used for discussion groups, subscriptions lists (like newsletters,
etc.) and the like. These are not extra POP accounts. Contact support@eskimo.com
to set these up with a requested list name and admin password.
IRC bots are allowed only on a dedicated bot server, "chat.eskimo.com". If
compiling an eggdrop, you currently need to compile version 1.1.15 or earlier on
the main shell server ("eskimo.com") and connect to the "chat" server to run it.
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no
refund.

Attempting to bypass system security of our hosts or using our hosts as a point of attack
on other Internet hosts.
Any illegal or unethical activity such as exchanging child pornography, exchanging
copyrighted software without the copyright holders permission, or offering controlled
substances for sale without a prescription, Internet gambling, etc.
Excepting dial plan L 24x7 service, defeating idle timeouts to hold a dialup connection
open when it is not actively being used.
Excepting dial plan L 24x7 service, dial acess is not intended for running servers. We
bundle hosting services free with your dial-up account for those needs. Client-Server
applications like Napster, WinMX, Net Meeting, are permissable so long as you only
leave them up while you are actively using them.
Since a free trial period is offered and lower rates are given for long term accounts, we do not
provide refunds. This is true even if your account is terminated for abuse (activities listed
above).
Eskimo North Trial Account Application Form
Eskimo North Free Two Week Trial

Apply for our FREE two week trial. We absolutely will not send spam or sell your
information!
Information provided will be used to establish your trial account and communicate setup
instructions.
We will contact you at the number you leave for verification. Your application will be discarded if
it can not be verified within two weeks.
Your account will be created the same or the next day after verification. Your trial account ends
when either two weeks has elapsed or a subscription payment is received turning your account
into a paid subscription. Send a check or money order to continue beyond the two week trial.
We do NOT bill you.
All fields are required by the script that processes this form. Notes about needed formats (such
as telephone numbers) are listed as well.
First and Last Name:
City, State and Country:
If you are dialing from within the United States, Canada or other country using the North
American dialing plan, please use the format XXX-XXX-XXXX. We can not process an
application which is missing the area code.
If you are dialing internationally please use the format CC-CTY-LOCAL where CC is your
country code, CTY is your city code (does not have to be three digits) and LOCAL is your local
telephone number. We can not process an application which is missing the country code
or city code.
Please separate the fields with dashes. We need to be able to tell which of the digits belong to
the various parts of the telephone number in order to dial it properly.
Telephone Number:
http://www.eskimo.com/newuser.html (1 of 3) [22/07/2003 5:54:57 PM]
The form requires a valid-format email address. If you're applying from a public terminal, library
computer, etc., and you do not have an existing email address yet, you can put
"none@eskimo.com".
Your current email address:
Please select a login name to use on Eskimo North. Your login:
May contain only the following:

Lower case alphabetic characters [a-z]
Numeric characters [0-9]
Dash [-]
Must NOT START with a dash.
Must contain at least one non-numeric character.
Must not exceed eight characters.
Is your e-mail address here at Eskimo North.
Remains your login name after the trial as well.
Login:
Please select a temporary password to use on Eskimo North. Your password should be:
Different from your login.

At least five characters long.
May contain any printable character. UPPER case is distinct from lower case.
Can be changed at a later time by you at the shell prompt.
Password:
Which type of account are you applying for?
(All dial-access accounts include one shell/POP3 account with the requested login.)
Dial plan L provides unlimited monthly hours, the longest session times, and the most access
numbers.
Select Plan
Dialup Availability
Remote access via ssh, telnet, rlogin
None
1CH ISDN / 56K PPP Dial Access - Plan L
Eskimo Canada/USA
Worldwide
2CH ISDN / Dual-56k PPP Dial Access - Plan L Eskimo Canada/USA
How did you hear about Eskimo North?
Clear Application
(restart the application)

Submit Application
(send the application)
Eskimo North, Inc.

P.O. Box 55816
USA
Voice Phone: (206) 812-0051
Toll Free: 1-800-246-6874
Fax: (206) 812-0054
EMail: support@eskimo.com
Content:
Copyright 1998, 1999
Eskimo North
Eskimo North: Web Site Hosting
World Wide Web Home Pages

Web hosting is included with all dsl, dial-up or remote shell accounts using
Apache. We offer virtual domain service for those who wish to receive mail and
make their web pages and ftp files visible under their own domain.
Content Restrictions
We have two major web sites that all users may place material on. One is intended
for general viewing and the other is intended specifically for adult content. Material
placed in the directory "public_html" will be visible at the URL
http://www.eskimo.com/~login where login is your account login. Material placed in
this directory should be suited to viewing by all audiences. Nudity, sexual content,
graphic violence, or anything else not suitable to audiences of all ages should not
be placed in this directory. Material placed in the directory "adult_html" will be
visible at the URL http://adult.eskimo.com/~login. Material intended for an adult
audience should be placed in this directory. Material governed by Child Protection
Act must have access restricted to adults only. It is suggested that you use adult
check or similar adult validation services to restrict access.
Do not use this site in any method that is not consistant with all applicable local,
state, and federal laws. Doing so will be considered abuse and grounds for
account termination.
SSL Web Server

An SSL (secure socket layer) web server is available to allow secure transmission
and reception of sensitive data such as credit card numbers, passwords, etc.
Disk Quotas
There is no limit to the number of pages you can place on the web except for that
imposed by your disk quota. The standard disk quota for a single user account is
20-110MB, depending on the payment plan selected. Additional quota is available
http://www.eskimo.com/services/hosting.html (1 of 2) [22/07/2003 5:54:59 PM]
Eskimo North: Web Site Hosting
for $1/10mb/month and must be purchased in 10mb increments for the term of the
subscription period.
CGI Scripts
CGI scripts are allowed and can be executed under your own private "cgi-bin"
directory. There is also a system "cgi-bin" directory and an assortment of CGI
programs already installed there for your use.
For more information on creating or uploading your web site to Eskimo North,
please see http://www.eskimo.com/support/www.html.
Web Traffic Information

There are no "hit" charges, however we may ask you to remove material or
negotiate additional compensation to pay for resources if something you provide
swamps our server. We understand that sometimes loads may be heavy
temporarily.
http://www.eskimo.com/services/hosting.html (2 of 2) [22/07/2003 5:54:59 PM]
Eskimo North Linux Friendly Digital Subscriber Loop DSL
Digital Subscriber Loop (DSL)

We used a DSL wholesaler, MegaPOP, a service of StarNet Inc, to provide DSL service to our subscribers.
A while ago, MegaPOP informed us that they were getting out of the DSL business but would transfer our DSL
accounts to another wholesaler. Not great news as it was bound to result in some disruption but not the end of
the world.
On November 30, 2000, they informed us that they were discontinuing service by December 30, 2000 and that
it was OUR responsibility to find another wholesale provider. Leaving us and our customers in a serious lurch.
We are looking for an alternate solution to MegaPOP and when we have something solid in place we will again
sell DSL service. In the meantime we are scrambling to find ways to service our existing customers.
If you know of any DSL wholesalers that use Covad and have a national presence and are affordable on a
small scale, please contact me. nanook@eskimo.com.
http://www.eskimo.com/services/dsl.html [22/07/2003 5:55:00 PM]
Eskimo North: Domain Hosting
Virtual Domains
Virtual domain service provides web, ftp, and mail service under your own domain
name(s) hosted on our facilities. This service is available as an adjunct to dial
access or remote shell accounts.
Virtual Domain Rates
Discount
Month
Quarter
Half-Year
One Year
Two Year
Five Year
$15
$30
$54
$96
$180
$420
10%
$13.50
$27
$48.60
$86.40
$162
$378
20%
$12
$24
$43.20
$76.80
$144
$336
A 10% discount is given in exchange for promotional consideration if your

domain contains a visible link to our home page at http://www.eskimo.com
somewhere on your site.
A 20% discount is given in exchange for promotional consideration if your
domain contains a visible link to our home page at http://www.eskimo.com
on your home page.
Buttons and banners are available at:
http://www.eskimo.com/LinkToUs.html
Help with creating and uploading your web site:
http://www.eskimo.com/support/www.html
Domain Registration Information

In order to register a domain, you will need to go to the web site of a domain
registrar, such as Gandi.net (12 Euro's ($10-$12) per year), Register.com
($35/year), or NetworkSolutions.com ($35/year, $70/first-two-years). There, you
will enter your desired domain. After a search to determine availability, you will be
given the opportunity to register or reserve the domain.
If using Network Solutions registration service, please select the Tech-Savvy (also
http://www.eskimo.com/services/domains.html (1 of 3) [22/07/2003 5:55:04 PM]
called ISP-Hosted in some sections of the site) option. The "Fast-Track" option
will map the domain to their name servers and we will not be able to service it.
Additionally the "Fast-Track" and "Essentials" options cost more.
There are other competing registries. If you register your domain through a
competing registery and are successful, please let us know. I would be happy to
provide links to any competing registeries.
Important: When you register your domain please specify the

following two name servers:
NS1.ESKIMO.COM
NS2.ESKIMO.COM
204.122.16.8
204.122.16.9
If you are registering a domain to be hosted by us, and registering it through

networksolutions.com, please use your own contact information for Admin and
Billing contact, and specify the following as the Technical contact (varies
depending on the registrar used):
NetworkSolutions: "RD160"
Gandi:
"RD249-GANDI"
Required Information
Please include the following information with your payment for your virtual domain
when establishing a new domain or moving a domain from another provider:
Domain Name
Must be no more than 67 characters including the top level domain.
Must include the top-level domain: ".com", ".org", or ".net"
To register a domain in a national top-level domain check with us
first. Different nations place different restrictions on domain
registrations. For example, Hong Kong requires a business presence
in a Hong Kong business registry for a ".hk" domain.
Must include only letters, numbers, and dashes.
Must not start or end with a dash.
Organization Name
Organization Mailing Address
The Internic makes it very difficult to change your organization name
or address. They require a signed notorized letter of authorization on
company letterhead or their special form signed and notorized.
Organization Administrative Contact Person

Admin contact persons first and last name
Admin contact persons organization
Admin contact persons mail address
Admin contact persons telephone number
Optional:Admin contact persons fax number
Admin contact persons e-mail address
Organization Billing Contact Person
Billing contact persons first and last name
Billing contact persons organization
Billing contact persons mail address
Billing contact persons telephone number
Optional:Billing contact persons fax number
Billing contact persons e-mail address
The above information must be included with your payment in order for us to
process your virtual domain request.
Virtual domains are mapped to either a dial access or remote shell account. We
need to know the login of that account in order to build the virtual domain. Also, if
your shell account expires, the files will be removed and the virtual domain will no
longer function.
By default, the domain will be mapped to "public_html" in your directory. If the
domain has adult content, it will be mapped to "adult_html". Please let us know in
advance if your domain will contain adult content.
Your domain may be mapped to a subdirectory under "public_html" or "adult_html"
in order to allow multiple virtual domains to be mapped to a single account or to
allow your personal web page to have content that is different from your virtual
domain. If you wish your virtual domain to be mapped to a subdirectory please be
sure to include that directory name.
Normally we will process your virtual domain request the day that payment is
received.
Eskimo North: Please Add a Link to Our Site!
Please use these graphics to link to our site from yours.
If you would like us to add a link to your site, please contact the site mangler.
Be sure to include the Title, URL, and a short description of your site.
http://www.eskimo.com/LinkToUs.html [22/07/2003 5:55:12 PM]
http://www.eskimo.com/img/link/eskiplain.gif
http://www.eskimo.com/img/link/eskiplain.gif [22/07/2003 5:55:13 PM]
http://www.eskimo.com/img/link/eskiborder.gif
http://www.eskimo.com/img/link/eskiborder.gif [22/07/2003 5:55:15 PM]
http://www.eskimo.com/img/link/eskilittle2.gif
http://www.eskimo.com/img/link/eskilittle2.gif [22/07/2003 5:55:17 PM]
http://www.eskimo.com/img/link/eskilittle.gif
http://www.eskimo.com/img/link/eskilittle.gif [22/07/2003 5:55:19 PM]
http://www.eskimo.com/img/link/eskitiny.gif
http://www.eskimo.com/img/link/eskitiny.gif [22/07/2003 5:55:22 PM]
http://www.eskimo.com/img/link/eskiwhite.gif
http://www.eskimo.com/img/link/eskiwhite.gif [22/07/2003 5:55:24 PM]
http://www.eskimo.com/img/link/eskitiny2.gif
http://www.eskimo.com/img/link/eskitiny2.gif [22/07/2003 5:55:27 PM]
http://www.eskimo.com/img/link/eskiname.gif
http://www.eskimo.com/img/link/eskiname.gif [22/07/2003 5:55:30 PM]
http://www.eskimo.com/img/link/eskiname2.gif
http://www.eskimo.com/img/link/eskiname2.gif [22/07/2003 5:55:33 PM]
Eskimo North Technical Support: WWW
Your World Wide Web Site on Eskimo North

Contents:
Intro & File Locations
CGI Scripting
SSL (Secure Socket Layer)
SSI (Server Side Includes)
Password Protection
Blocking Access by IP
Alternate Error Pages
Hosting a Domain
Other Resources
Every user account, whether it be a remote shell account or a dial-up subscription,

comes with the ability to host a web site. Please do not use this site for any illegal
activities. Adult material is permitted however we have a separate server name for
adult material and it must be confined to that server.
In your home directory there are two places you can put web files:
public_html Only web pages suitable for all ages may be placed in this directory. This
directory maps to www.eskimo.com/~login.
adult_html
Web pages containing adult oriented material should be placed in this

directory. This directory maps to adult.eskimo.com/~login. In addition,
please use some method of restricting access, such as Adult Check or
password protection to restrict access to adults only.
If you have a virtual domain which allows your web files and ftp files to exist under
your own domain name, the root of the virtual domain would normally be mapped
to public_html or adult_html in your directory but if you have multiple domains or
wish to have your virtual domain separate from your personal web site, it is
possible for us to map your virtual domain to a sub-directory of one of those
directories.
http://www.eskimo.com/support/www.html (1 of 11) [22/07/2003 5:55:42 PM]
You can create your web pages on-line using any of the available Unix editors and
various image manipulation tools, or you can create your files off-line and then
upload them to our site. Either way, your main web page must be named
index.html, index.htm, or index.shtml (if using Server Side Includes, described
below) and placed in a directory called public_html if it is for a general audience
or adult_html if it is material intended for an adult audience.
To ftp your web files created off-line to our site, ftp ftp.eskimo.com. Be sure to use
ftp.eskimo.com as the ftp host, not eskimo.com, or you will be unable to connect.
Another thing that will prevent you from being able to connect to our ftp server is if
the DNS for your originating IP address is incorrect. Your IP address must have
forward and inverse DNS and they must agree with each other. Login to ftp using
your login and password rather than anonymous. This will place you in your home
directory here.
If the public_html or adult_html directory does not already exist, you will need to
create it before you can upload or create web pages on-line. You can do this by
typing:
mkdir public_html
-- or -mkdir adult_html
The mkdir command in ftp will create a directory with read, write, and search
permissions for the owner, and read and search (but not write) permissions for
everyone else. There is no need to change the permissions when you create the
directory in ftp. If you create the directory from the Unix shell, the permissions the
directory is created with will be influenced by the your umask value and may not
be initially correct. If you created the directory from the Unix shell, set the
permissions with the following command:
chmod 755 public_html
-- or --chmod 755 adult_html
Now you are ready to create or upload your web files. Move to the directory you
just created, using the following command:
cd public_html
-- or -http://www.eskimo.com/support/www.html (2 of 11) [22/07/2003 5:55:42 PM]
cd adult_html
If you are uploading your web files, you can now use the ftp put command to send
each file:
put index.html
put otherfiles.html
The ftp put command will create the files with read and write permissions for the
owner and read permissions for everyone else. These are the permissions you
want so it is not necessary to change the permissions if you uploaded your web
files with ftp.
If you are creating your files on-line you can use the text editor of your choice to
create the files now. The permissions on the files will be affected by your umask
value so they may not be initially correct. To set the file permissions correctly use
the following command for each file that you upload:
chmod 644 index.html
chmod 644 otherfiles.html
Under your public_html or index_html directories you can create any number of
sub-directories and web pages subject only to the limits of your disk quota. You
can use sub-directories to organize your web files. For example, you may wish to
create a sub-directory called images and place your images there. You might want
to create additional subdirectories under images to further organize the image
files. The directory separator used by Unix is the forward slash, '/' rather than the
backslash '\' used by Windows.
CGI on Your Web Site
You can use CGI programs on your web site. To do so you will need to create a
cgi-bin directory under your public_html directory and you will need to make it
mode 711:
cd ~/public_html
mkdir cgi-bin
chmod 711 cgi-bin

Then place your CGI programs in that directory with an extension of ".cgi". The
extension needs to be ".cgi" regardless of what type of CGI program it is, that is,
perl, perl5, python, shell scripts, a compiled program of some sort, all need to have
an extension of ".cgi". The mode of your CGI programs should be 500. Set the
mode with the following command:
chmod 500 program.cgi
Perl is the most frequently used CGI programming language. To use a perl CGI,
the top of your script has to properly specify the perl interpreter. The first line in a
perl4 CGI script should have:
#!/usr/local/bin/perl
The first line of a perl5 script should have:
#!/usr/local/bin/perl5
The web servers here are Sparc based systems running SparcLinux. You will
need access to a SparcLinux platform, or a system with a cross-compiler in order
to compile C CGI programs for use here or you will need to ask us to compile them
for you. Our shell servers are running SunOS 4.1.4 and so programs compiled on
them will not run on the web servers. There is a SunOS compatibility capability in
SparcLinux but it will not work with more than 256 descriptors and unfortunately,
the web server requires a higher number.
If you have a virtual domain, your CGI scripts would be called as follows:
http://www.your-virtual-domain.com/~username/cgi-bin/program.cgi
Programs in the system cgi-bin directory are called this way:
http://www.your-virtual-domain.com/cgi-bin/program.cgi
Presently, you can not have a cgi-bin directory under adult_html. This is because
the wrapper only allows them to exist under one directory. So if you have an adult
site, it will be necessary to put CGI's under your public_html directory and call
them from your adult web page. If you have permissions set correctly, nobody will
be able to see them from the main web server so this should not be a problem.
WARNING!
CGI programs execute with your user ID and your group ID. Evil people can
assume your identiy and do evil things if you let them.
Poorly designed programs can result in damage to your files not limited to
your web site. When you put a CGI program on your web site you are
allowing people all over the Internet to execute programs with your
permissions.
Be very sure that your CGI programs do not allow a remote user to execute
arbitrary commands or read or write arbitrary filenames. Programs should
be very careful to eliminate any "../" back references, wild card or regular
expression references to filenames or commands, references to files or
commands starting at "/", and potential buffer overflow conditions which
can allow someone to cause arbitrary commands to be executed with your
permissions.
Secure/Encrypted Server
Presently secure browsing is available on the "commerce.eskimo.com" server
only, due to the way domain certificates are obtained. Using this, you can use a
CGI script in your directory to send secure data, such as preparing credit card
transactions (we're currently working on that aspect; until we have that up and
running, you'll need to use another server for the actual processing of such card
data). Use the following type of URL for secure browsing:
https://commerce.eskimo.com/~username/filename.html
-- or -https://commerce.eskimo.com/~username/cgi-bin/program.cgi
Server Side Includes

Server side includes are allowed with the exception of Exec which is disallowed.
Although it is slightly more work (you have to spit out a header), just about
anything you could do with Exec can be done with a CGI script.
Normally, ".html" files are not parsed for SSI directives. There are two ways to tell
the server to parse a file for SSI directives. If you give the file an extension of
".shtml" the server will parse the file for SSI directives. Also, if you set the user
execute bit the file will be parsed for SSI directives. To do this type:
chmod u+x file.html
If you set the group execute bit, it will allow the content of the file to be cached in
proxy servers. Do this for STATIC content (pages kept in a file). Do not do this for
dynamic content such as a CGI script which produces different output each time it
is run. Do not do it in situations where you need to disallow content such as for a
counter or banner advertisement where a different advertisement is displayed for
each hit.
Password Protecting a Portion of Your Web Site
Another reason that you may wish to use sub-directories is so that you can
password protect a portion of your web site. You may wish to do this in order to
provide information only to internal members of an organization or to members of
a subscription service. The standard htaccess method of password protection is
supported. To use this method you need to create two files.
The following assumes that you are logged into the Unix shell. This is necessary
because some of the necessary commands like "encrypt" are not available from
ftp. There may be a Windows program that can generate encrypted password data
but I do not know of any off hand.
One file called ".htpasswd" contains a list of users and encrypted passwords. This
file should be placed in your home directory tree but it should NOT be placed
under either your public_html or adult_html directory. The reason for this is that
this file must have public read permission in order for the web server to be able to
access it but you do not want it available for download via the web because
people can run dictionary attack programs in an attempt to discover the passwords
that are encrypted within. The format of this file is as follows:
alice:kwzwe.fMQXsCU
bob:k2ZAZbMvR/eOM
charlie:keZqlq1fzdLxY
To create the encrypted passwords, use the encrypt command from the Unix shell:
encrypt password1
You get a response that looks like this:
"password1" = "kwzwe.fMQXsCU"
Once you have that you just need to paste the password into the file or copy it by
hand if you do not have cut-and-paste capability.
Another method of creating the ".htpasswd" file is with the shell command
"htpasswd":
Usage: htpasswd [-c] passwordfile username
The -c flag creates a new file.
Enter the password at both given prompts.
htpasswd -c .htpasswd alice
htpasswd .htpasswd bob
...
Once you have created the ".htpasswd" file you should make sure the permissions
are set correctly by using the following command:
chmod 644 .htpasswd
The second file required to password protect a portion of your web site is a file
called ".htaccess". This file must be placed in the directory to be protected and
protects the contents of that directory and any subdirectories nested within that
directory. The contents of this file tell the web server where to find the ".htpasswd"
file and what form of authentication to apply. In this example I am showing you
how to use password authentication to protect a portion of your web space. It is
also possible to limit access using groups, or by domain name or address space.
The format of the ".htaccess" file is:
AuthUserFile /u/l/login/.htpasswd
AuthGroupFile /dev/null
AuthName ByPassword
AuthType Basic
<Limit GET PUT POST>
require user alice
</Limit>
This limits the access to this directory to only 'alice'; although 'bob' and 'charlie'
have passwords, too, they can't access this directory. To allow anyone with a
password to gain entrance you can use:
...
require valid-user
</Limit>
Substitute the path of your home directory for "/u/l/login". Case is important in the
above file, for example the "GET PUT POST" must be UPPER case, the "Limit"
must have an upper case "L" and lower case "imit". Placing all three limiters (GET,
PUT, and POST) there ensures that all access to that directory will be protected,
rather than any one specific access with others unlimited. The keywords in the
above file should be up against the left column with no spaces on the line prior to
the keywords.
Set the permissions so that the .htaccess file is publically readable:
chmod 644 .htaccess
Now when you attempt to access the URL corrosponding to the directory the
".htaccess" file is located in, your browser will prompt you for a username and
password. If you supply the correct password you will be permitted to access the
material requested, otherwise access will be denied.
Blocking Access by IP
There are situations where you may want to block (or allow) access based on the
hostname or IP address of the browsing user rather than use a password method.
For instance, if someone out there is flooding your logs with all sorts of errors
trying to find pages without using your links, etc., you can block access to their site
without blocking everyone else. Or, if you want your workstation at work to be able
to access a directory but no-one else, you can set the server to allow it from your
machine and deny it for everyone else. The trick lies in the same "<Limit GET PUT
POST>" section, but with different limiting rules.
To deny access to a set of addresses (sample address of "192.168.12.34"), while
allowing access from everywhere else, the pattern is:
order allow,deny
allow from all
deny from 192.168.12.34
</Limit>
Similarly, to allow connections from a limited set of addresses (again with our
sample of "192.168.12.34"); note that the "order" changes as well:
order deny,allow
deny from all
allow from 192.168.12.34
</Limit>
These limits can also be done with hostnames or patterns as well. For instance, a
directory specifically for Boeing employees might use:
order deny,allow
deny from all
allow from .boeing.com
</Limit>
Alternate Error Pages

Another line you can have in your ".htaccess" file will replace our default "404 Not Found" page or the empty "401 - Authentication Failed" screen. This can be
done to keep a similar look and format across an entire set of pages. The format
of this line is shown here to replace both error pages (remember to replace "login"
with your own login name):
ErrorDocument 401 /~login/401.html
ErrorDocument 404 /~login/404.html
Only 400-series errors can be redirected in this manner. Replace your login name
in the lines as well as the name of the file(s) you've created for the particular error
page. Remember that the error pages will need proper permissions (644)
themselves as well. This will change the referenced error outputs for the directory
including the ".htaccess" file as well as all subdirectories.
Hosting a Domain
Now that you have a website up on our server, you might want to have your own
domain name ("http://www.something-of-your-own.com/") rather than listing it as a
user site on our main server's name ("http://www.eskimo.com/~yourname"). For
information on how to do this (registering a domain, sending in needed payments
to us for domain hosting, etc.), please see our page about "Virtual Domain
Hosting".
Additional Resources
Subject
URL
A Beginners
Guide to http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html
HTML
CGI
http://www.eskimo.com/cgihelp
Programming
HTML quick
http://www.cc.ukans.edu/~acs/docs/other/HTML_quick.shtml
reference
Starter Home
http://www.eskimo.com/old/starter.html
Page
Top Ten
Ways To Tell
If You Have
A Sucky
Home Page
http://jeffglover.com/sucky.html
[Home]
CGI Authoring on Eskimo
The Real Stuff!
How to develop cgi scripts on Eskimo North.

Here is a good place to start to look for info on the web, html and cgi.
The list of the cgi scripts and tools available on Eskimo North.
Send questions and comments about this document here.
This page is maintained by ECT, erik@skytouch.com (aka. webmaster@eskimo.com)
http://www.eskimo.com/cgihelp/ [22/07/2003 5:55:47 PM]
Web Authoring References
On this page you'll find a collection of links pointing to material relating to HTML and
CGI authoring, development and programming. If you know of a resource, which is not
currently listed below, please take a second to submit it and I'll add it to the list.
CGI
Mail Filtering
CGI Authoring (Eskimo North)
Filtering Mail FAQ
CGI Devel. Guide (Eskimo North)
Processing Mail with Procmail
CGI Overview (W3C)
Using Procmail (Eskimo North)
CGI Overview (NCSA)

CGI Resource Index
Newsgroups
CGI Scripts Available (Eskimo North)
lobby (Eskimo North subscribers only)
CGI Server Support (W3C)
alt.hypertext
CGI Specifications
comp.infosystems.www.authoring.cgi
Server Side Includes
comp.infosystems.www.authoring.html
Yahoo's Index
comp.infosystems.www.authoring.images
comp.infosystems.www.authoring.misc
Gizmos & Tools
comp.infosystems.www.announce
6X6X6 Color Palette Map
comp.infosystems.www.misc
HTML Validation Service
comp.infosystems.www.providers
HTTP Status Codes
comp.infosystems.www.users
ISO8859-1 (Latin 1) Table

Weblint
PERL
CGI Form Handling in Perl with cgi-lib.pl
HTML
Perl CGI Programming FAQ
A Beginner's Guide to HTML
Yahoo's Perl Language Index
Builder.Com
Yahoo's Perl Web Scripts Index
Carlos' Forms Tutorial

HTML 3.2
UNIX
HTML 4.0
Get UNIX Help
HTML 4.01
UNIX Reference Desk
HTML Bad Style Page
Yahoo's Index
HTML Converters
HTML Quick Reference
WWW
HTML-Writers-Guild
Yahoo's Web Programming Index
HyperText Markup Language (W3C)
Web Developer's Virtual Library
http://www.eskimo.com/cgihelp/cgi-refs.html (1 of 2) [22/07/2003 5:55:53 PM]
Web Authoring References
Technical Specifications
Webmaster Reference Library
Yahoo's Index
WWW OpenFAQs

Copyright 1998-2000 SkyTouch Communications. All rights reserved.
http://www.eskimo.com/cgihelp/cgi-refs.html (2 of 2) [22/07/2003 5:55:53 PM]
Web Authoring References (Submission)
If you have links you'd like to see listed in Web Authoring References page, please fill
in the form below. This form is for submitting links only. Comments and suggestions
should be sent to erik@skytouch.com.
Please select a category:

Common Gateway Interface (CGI)
Gizmos & Tools (Web tools & utilities, etc.)
HyperText Markup Language (HTML)
Mail Filtering (Procmail, etc.)
Practical Extraction and Report Language (PERL)
UNIX (Reference, help, FAQ, etc.)
World Wide Web (WWW)
Title of the URL:

The URL:
Additional Comments:
Your First and/or Last Name:
Your E-mail Address:
Clear Form
(reset the form)
Submit Form
(send your submission)

Copyright 1997 SkyTouch Communications. All rights reserved.
http://www.eskimo.com/cgihelp/cgi-refs-submit.html [22/07/2003 5:55:57 PM]
CGI Scripts Available on Eskimo North
Here's a quick rundown of the scripts, utilities and programs installed on Eskimo
North:
AnyURL
A CGI form processing program which provides a way to use
forms input to point to URLs without doing any programming.
Look here for more information.
cgi-lib.pl
A Perl library to manipulate CGI input. Look here for its
documentation.
CGI.pm
This Perl library uses perl5 objects to simplify the creation and
parsing of fill-out HTML forms. The CGI.pm library can also be
found in /usr/local/lib/perl5/ on Eskimo. Look here
for its documentation.
Count.cgi
A very powerful and fully customizable CGI
counter. Look here for more information.
Please limit yourself to one data file per account. All data files
must be of the form: your_login.dat. For example, if your
login name is JoeUser, the counter data file will have to be
defined as follows:
<IMG SRC="/cgibin/Count.cgi?df=JoeUser.dat">
Data files which do not comply with this format will be
automatically deleted. (Note: data files cannot be edited directly
on Eskimo, you'll need to use a form to do so.)
date
A simple Bourne shell script which returns the current date/time.
http://www.eskimo.com/cgihelp/cgi-list.html (1 of 2) [22/07/2003 5:56:04 PM]
CGI Scripts Available on Eskimo North
Look here for an example.

email
A simple email form handler. Look here for an example. Please,
send mail to webmaster with the relevant information (email
address and return page location) if you wish to be added to the
email configuration file. The email form, documentation and
source can be found here.
Imagemap Module
Our server handles inline clickage image automatically. Look here
for a tutorial.
lib5-cgi.pl
A Perl 5 library of CGI functions. (A Perl 4 version is also
available as lib-cgi.pl.)
mailback
A script to mail the results of an HTML form to its author. Look
here or even here for examples of forms using this script.
random
A Perl script to generate random URL from a text file "database".
Look here for an example.
sendmail-cgi.pl
A Perl/CGI library to send mail via sendmail/smtp.
subscribe
A Perl script to automate the process of subscribing to
majordomo, listproc, listserv and smartlist mailing lists. An
example can be found here.
test.cgi
A very useful Perl script to check on all arguments and variables
passed to CGI scripts. For an example look here.
http://www.eskimo.com/cgihelp/cgi-list.html (2 of 2) [22/07/2003 5:56:04 PM]
$title
# Perl Routines to Manipulate CGI input # S.E.Brenner@bioc.cam.ac.uk # $Id: cgi-lib.pl,v 2.14

1996/10/20 12:41:02 brenner Exp $ # # Copyright (c) 1996 Steven E. Brenner # Unpublished work. #
Permission granted to use and modify this library so long as the # copyright above is maintained,
modifications are documented, and # credit is given for any use of the library. # # Thanks are due to
many people for reporting bugs and suggestions # especially Meng Weng Wong, Maki Watanabe, Bo
Frese Rasmussen, # Andrew Dalke, Mark-Jason Dominus, Dave Dittrich, Jason Mathews # For more
information, see: # http://www.bio.cam.ac.uk/cgi-lib/ # Modified by Erik C. Thauvin
(erik@skytouch.com) # Sun Apr 27 01:43:36 PDT 1997 # # Added FILEPATH_INFO support.
$cgi_lib'version = sprintf("%d.%02d", q$Revision: 2.14 $ =~ /(\d+)\.(\d+)/); # Parameters affecting cgilib behavior # User-configurable parameters affecting file upload. $cgi_lib'maxdata = 131072; #
maximum bytes to accept via POST - 2^17 $cgi_lib'writefiles = 0; # directory to which to write files, or #
0 if files should not be written $cgi_lib'filepre = "cgi-lib"; # Prefix of file names, in directory above # Do
not change the following parameters unless you have special reasons $cgi_lib'bufsize = 8192; # default
buffer size when reading multipart $cgi_lib'maxbound = 100; # maximum boundary length to be
encounterd $cgi_lib'headerout = 0; # indicates whether the header has been printed # ReadParse # Reads
in GET or POST data, converts it to unescaped text, and puts # key/value pairs in %in, using "\0" to
separate multiple selections # Returns >0 if there was input, 0 if there was no input # undef indicates
some failure. # Now that cgi scripts can be put in the normal file space, it is useful # to combine both the
form and the script in one place. If no parameters # are given (i.e., ReadParse returns FALSE), then a
form could be output. # If a reference to a hash is given, then the data will be stored in that # hash, but
the data from $in and @in will become inaccessable. # If a variable-glob (e.g., *cgi_input) is the first
parameter to ReadParse, # information is stored there, rather than in $in, @in, and %in. # Second, third,
and fourth parameters fill associative arrays analagous to # %in with data relevant to file uploads. # If no
method is given, the script will process both command-line arguments # of the form: name=value and
any text that is in $ENV{'QUERY_STRING'} # This is intended to aid debugging and may be changed
in future releases sub ReadParse { local (*in) = shift if @_; # CGI input local (*incfn, # Client's filename
(may not be provided) *inct, # Client's content-type (may not be provided) *insfn) = @_; # Server's
filename (for spooled files) local ($len, $type, $meth, $errflag, $cmdflag, $perlwarn, $got); # Disable
warnings as this code deliberately uses local and environment # variables which are preset to undef (i.e.,
not explicitly initialized) $perlwarn = $^W; $^W = 0; binmode(STDIN); # we need these for DOS-based
systems binmode(STDOUT); # and they shouldn't hurt anything else binmode(STDERR); # Get several
useful env variables $type = $ENV{'CONTENT_TYPE'}; $len = $ENV{'CONTENT_LENGTH'};
$meth = $ENV{'REQUEST_METHOD'}; if ($len > $cgi_lib'maxdata) { #' &CgiDie("cgi-lib.pl: Request
to receive too much data: $len bytes\n"); } if (!defined $meth || $meth eq '' || $meth eq 'GET' || $type eq
'application/x-www-form-urlencoded') { local ($key, $val, $i); # Read in text if (!defined $meth || $meth
eq '') { $in = $ENV{'QUERY_STRING'}; $cmdflag = 1; # also use command-line options } elsif($meth
eq 'GET' || $meth eq 'HEAD') { $in = $ENV{'QUERY_STRING'}; } elsif ($meth eq 'POST') { if (($got
= read(STDIN, $in, $len) != $len)) {$errflag="Short Read: wanted $len, got $got\n";}; } else {
&CgiDie("cgi-lib.pl: Unknown request method: $meth\n"); } @in = split(/[&;]/,$in); push(@in,
@ARGV) if $cmdflag; # add command-line parameters foreach $i (0 .. $#in) { # Convert plus to space
$in[$i] =~ s/\+/ /g; # Split into key and value. ($key, $val) = split(/=/,$in[$i],2); # splits on the first =. #
Convert %XX from hex numbers to alphanumeric $key =~ s/%([A-Fa-f0-9]{2})/pack("c",hex($1))/ge;
$val =~ s/%([A-Fa-f0-9]{2})/pack("c",hex($1))/ge; # Associate key and value $in{$key} .= "\0" if
http://www.eskimo.com/cgihelp/scripts/cgi-lib.pl.txt (1 of 8) [22/07/2003 5:56:09 PM]
$title
(defined($in{$key})); # \0 is the multiple separator $in{$key} .= $val; } } elsif

($ENV{'CONTENT_TYPE'} =~ m#^multipart/form-data#) { # for efficiency, compile multipart code
only if needed $errflag = !(eval <<'END_MULTIPART'); local ($buf, $boundary, $head, @heads, $cd,
$ct, $fname, $ctype, $blen); local ($bpos, $lpos, $left, $amt, $fn, $ser); local ($bufsize, $maxbound,
$writefiles) = ($cgi_lib'bufsize, $cgi_lib'maxbound, $cgi_lib'writefiles); # The following lines exist
solely to eliminate spurious warning messages $buf = ''; ($boundary) = $type =~ /boundary="([^"]+)"/;
#"; # find boundary ($boundary) = $type =~ /boundary=(\S+)/ unless $boundary; &CgiDie ("Boundary
not provided: probably a bug in your server") unless $boundary; $boundary = "--" . $boundary; $blen =
length ($boundary); if ($ENV{'REQUEST_METHOD'} ne 'POST') { &CgiDie("Invalid request method
for multipart/form-data: $meth\n"); } if ($writefiles) { local($me); stat ($writefiles); $writefiles = "/tmp"
unless -d _ && -r _ && -w _; # ($me) = $0 =~ m#([^/]*)$#; $writefiles .= "/$cgi_lib'filepre"; } # read in
the data and split into parts: # put headers in @in and data in %in # General algorithm: # There are two
dividers: the border and the '\r\n\r\n' between # header and body. Iterate between searching for these #
Retain a buffer of size(bufsize+maxbound); the latter part is # to ensure that dividers don't get lost by
wrapping between two bufs # Look for a divider in the current batch. If not found, then # save all of
bufsize, move the maxbound extra buffer to the front of # the buffer, and read in a new bufsize bytes. If a
divider is found, # save everything up to the divider. Then empty the buffer of everything # up to the end
of the divider. Refill buffer to bufsize+maxbound # Note slightly odd organization. Code before BODY:
really goes with # code following HEAD:, but is put first to 'pre-fill' buffers. BODY: # is placed before
HEAD: because we first need to discard any 'preface,' # which would be analagous to a body without a
preceeding head. $left = $len; PART: # find each part of the multi-part while reading data while (1) { die
$@ if $errflag; $amt = ($left > $bufsize+$maxbound-length($buf) ? $bufsize+$maxbound-length($buf):
$left); $errflag = (($got = read(STDIN, $buf, $amt, length($buf))) != $amt); die "Short Read: wanted
$amt, got $got\n" if $errflag; $left -= $amt; $in{$name} .= "\0" if defined $in{$name}; $in{$name} .=
$fn if $fn; $name=~/([-\w]+)/; # This allows $insfn{$name} to be untainted if (defined $1) { $insfn{$1}
.= "\0" if defined $insfn{$1}; $insfn{$1} .= $fn if $fn; } BODY: while (($bpos = index($buf,
$boundary)) == -1) { die $@ if $errflag; if ($name) { # if no $name, then it's the prologue -- discard if
($fn) { print FILE substr($buf, 0, $bufsize); } else { $in{$name} .= substr($buf, 0, $bufsize); } } $buf =
substr($buf, $bufsize); $amt = ($left > $bufsize ? $bufsize : $left); #$maxbound==length($buf); $errflag
= (($got = read(STDIN, $buf, $amt, $maxbound)) != $amt); die "Short Read: wanted $amt, got $got\n" if
$errflag; $left -= $amt; } if (defined $name) { # if no $name, then it's the prologue -- discard if ($fn) {
print FILE substr($buf, 0, $bpos-2); } else { $in {$name} .= substr($buf, 0, $bpos-2); } # kill last \r\n }
close (FILE); last PART if substr($buf, $bpos + $blen, 4) eq "--\r\n"; substr($buf, 0, $bpos+$blen+2) = '';
$amt = ($left > $bufsize+$maxbound-length($buf) ? $bufsize+$maxbound-length($buf) : $left); $errflag
= (($got = read(STDIN, $buf, $amt, length($buf))) != $amt); die "Short Read: wanted $amt, got $got\n"
if $errflag; $left -= $amt; undef $head; undef $fn; HEAD: while (($lpos = index($buf, "\r\n\r\n")) == -1)
{ die $@ if $errflag; $head .= substr($buf, 0, $bufsize); $buf = substr($buf, $bufsize); $amt = ($left >
$bufsize ? $bufsize : $left); #$maxbound==length($buf); $errflag = (($got = read(STDIN, $buf, $amt,
$maxbound)) != $amt); die "Short Read: wanted $amt, got $got\n" if $errflag; $left -= $amt; } $head .=
substr($buf, 0, $lpos+2); push (@in, $head); @heads = split("\r\n", $head); ($cd) = grep (/^\s*ContentDisposition:/i, @heads); ($ct) = grep (/^\s*Content-Type:/i, @heads); ($name) = $cd =~
/\bname="([^"]+)"/i; #"; ($name) = $cd =~ /\bname=([^\s:;]+)/i unless defined $name; ($fname) = $cd =~
/\bfilename="([^"]*)"/i; #"; # filename can be null-str ($fname) = $cd =~ /\bfilename=([^\s:;]+)/i unless
$title
defined $fname; $incfn{$name} .= (defined $in{$name} ? "\0" : "") . (defined $fname ? $fname : "");
($ctype) = $ct =~ /^\s*Content-type:\s*"([^"]+)"/i; #"; ($ctype) = $ct =~ /^\s*Content-Type:\s*([^\s:;]+)/i
unless defined $ctype; $inct{$name} .= (defined $in{$name} ? "\0" : "") . $ctype; if ($writefiles &&
defined $fname) { $ser++; $fn = $writefiles . ".$$.$ser"; open (FILE, ">$fn") || &CgiDie("Couldn't open
$fn\n"); binmode (FILE); # write files accurately } substr($buf, 0, $lpos+4) = ''; undef $fname; undef
$ctype; } 1; END_MULTIPART if ($errflag) { local ($errmsg, $value); $errmsg = $@ || $errflag;
foreach $value (values %insfn) { unlink(split("\0",$value)); } &CgiDie($errmsg); } else { # everything's
ok. } } else { &CgiDie("cgi-lib.pl: Unknown Content-type: $ENV{'CONTENT_TYPE'}\n"); } # no-ops
to avoid warnings $insfn = $insfn; $incfn = $incfn; $inct = $inct; $^W = $perlwarn; return ($errflag ?
undef : scalar(@in)); } # PrintHeader # Returns the magic line which tells WWW that we're an HTML
document sub PrintHeader { return "Content-type: text/html\n\n"; } # HtmlTop # Returns the of a
document and the beginning of the body # with the title and a body
header as specified by the parameter sub

HtmlTop { local ($title) = @_; return <
$title
END_OF_TEXT } # HtmlBot # Returns the
, codes for the bottom of every HTML
page sub HtmlBot { return "
\n\n"; } # SplitParam # Splits a multivalued parameter into a list of the
constituent parameters sub SplitParam {
local ($param) = @_; local (@params) =
split ("\0", $param); return (wantarray ?
@params : $params[0]); } # MethGet #
$title
Return true if this cgi call was using the

GET request, false otherwise sub
MethGet { return (defined
$ENV{'REQUEST_METHOD'} &&
$ENV{'REQUEST_METHOD'} eq "GET"); }
# MethPost # Return true if this cgi call
was using the POST request, false
otherwise sub MethPost { return (defined
$ENV{'REQUEST_METHOD'} &&
$ENV{'REQUEST_METHOD'} eq "POST");
} # MyBaseUrl # Returns the base URL to
the script (i.e., no extra path or query
string) sub MyBaseUrl { local ($ret,
$perlwarn); $perlwarn = $^W; $^W = 0;
$ret = 'http://' . $ENV{'SERVER_NAME'} .
($ENV{'SERVER_PORT'} != 80 ?
":$ENV{'SERVER_PORT'}" : '') .
$ENV{'SCRIPT_NAME'}; $^W =
$perlwarn; return $ret; } # MyFullUrl #
Returns the full URL to the script (i.e.,
$title
with extra path or query string) sub

MyFullUrl { local ($ret, $perlwarn,
$path_info); $perlwarn = $^W; $^W = 0;
$path_info = $ENV{'FILEPATH_INFO'} ||
$ENV{'PATH_INFO'}; $ret = 'http://' .
$ENV{'SERVER_NAME'} .
($ENV{'SERVER_PORT'} != 80 ?
":$ENV{'SERVER_PORT'}" : '') .
$ENV{'SCRIPT_NAME'} . $path_info .
(length ($ENV{'QUERY_STRING'}) ?
"?$ENV{'QUERY_STRING'}" : ''); $^W =
$perlwarn; return $ret; } # MyURL #
Returns the base URL to the script (i.e.,
no extra path or query string) # This is
obsolete and will be removed in later
versions sub MyURL { return
&MyBaseUrl; } # CgiError # Prints out an
error message which which containes
appropriate headers, # markup, etcetera.
# Parameters: # If no parameters, gives a
$title
generic error message # Otherwise, the

first parameter will be the title and the
rest will # be given as different
paragraphs of the body sub CgiError {
local (@msg) = @_; local ($i,$name); if
(!@msg) { $name = &MyFullUrl; @msg =
("Error: script $name encountered fatal
error\n"); }; if (!$cgi_lib'headerout) { #')
print &PrintHeader; print "\n\n\n\n\n"; }
print "
$msg[0]
\n"; foreach $i (1 .. $#msg) { print "
$msg[$i]
\n"; } $cgi_lib'headerout++; } # CgiDie #
Identical to CgiError, but also quits with
the passed error message. sub CgiDie {
local (@msg) = @_; &CgiError (@msg);
die @msg; } # PrintVariables # Nicely
$title
formats variables. Three calling options:

# A non-null associative array - prints the
items in that array # A type-glob - prints
the items in the associated assoc array #
nothing - defaults to use %in # Typical
use: &PrintVariables() sub PrintVariables
{ local (*in) = @_ if @_ == 1; local (%in) =
@_ if @_ > 1; local ($out, $key, $output);
$output = "\n
\n"; foreach $key (sort keys(%in)) {
foreach (split("\0", $in{$key})) { ($out =
$_) =~ s/\n/
\n/g; $output .= "
$key\n
:$out:
\n"; } } $output .= "
\n"; return $output; } # PrintEnv # Nicely
formats all environment variables and
$title
returns HTML string sub PrintEnv {

&PrintVariables(*ENV); } # The following
lines exist only to avoid warning
messages $cgi_lib'writefiles =
$cgi_lib'writefiles; $cgi_lib'bufsize =
$cgi_lib'bufsize ; $cgi_lib'maxbound =
$cgi_lib'maxbound; $cgi_lib'version =
$cgi_lib'version; $cgi_lib'filepre =
$cgi_lib'filepre; 1; #return true
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
package CGI;
require 5.001;
# See the bottom of this file for the POD documentation.
# string '=head'.
Search for the
# You can run this file through either pod2man or pod2html to produce pretty
# documentation in manual or html file format (these utilities are part of the
# Perl 5 distribution).
#
#
#
#
#
Copyright 1995-1997 Lincoln D. Stein. All rights reserved.

It may be used and modified freely, but I do request that this copyright
notice remain attached to the file. You may modify this module as you
wish, but if you redistribute a modified version, please attach a note
listing the modifications you have made.
# The most recent version and complete docs are available at:
#
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
#
ftp://ftp-genome.wi.mit.edu/pub/software/WWW/
# Modified by Erik C. Thauvin (erik@skytouch.com)
# Sun Apr 27 02:18:46 PDT 1997
#
# Added FILEPATH_INFO support.
# Set this to 1 to enable copious autoloader debugging messages
$AUTOLOAD_DEBUG=0;
# Set this to 1 to enable NPH scripts
# or:
#
1) use CGI qw(:nph)
#
2) $CGI::nph(1)
#
3) print header(-nph=>1)
$NPH=0;
$CGI::revision = '$Id: CGI.pm,v 2.35 1997/4/20 20:19 lstein Exp $';
$CGI::VERSION='2.35';
# OVERRIDE THE OS HERE IF CGI.pm GUESSES WRONG
$OS = 'UNIX';
# $OS = 'MACINTOSH';
# $OS = 'WINDOWS';
# $OS = 'VMS';
# $OS = 'OS2';
# HARD-CODED LOCATION FOR FILE UPLOAD TEMPORARY FILES.
# UNCOMMENT THIS ONLY IF YOU KNOW WHAT YOU'RE DOING.
$TempFile::TMPDIRECTORY = '/usr/tmp';
# ------------------ START OF THE LIBRARY -----------# FIGURE OUT THE OS WE'RE RUNNING UNDER
# Some systems support the $Ô variable. If not
# available then require() the Config library
unless ($OS) {
unless ($OS = $Ô) {
require Config;
$OS = $Config::Config{'osname'};
}
}
if ($OS=~/Win/i) {
$OS = 'WINDOWS';
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (1 of 79) [22/07/2003 5:57:02 PM]
} elsif ($OS=~/vms/i) {
$OS = 'VMS';
} elsif ($OS=~/Mac/i) {
$OS = 'MACINTOSH';
} elsif ($OS=~/os2/i) {
$OS = 'OS2';
} else {
$OS = 'UNIX';
}
# Some OS logic. Binary mode enabled on DOS, NT and VMS
$needs_binmode = $OS=~/^(WINDOWS|VMS|OS2)/;
# This is the default class for the CGI object to use when all else fails.
$DefaultClass = 'CGI' unless defined $CGI::DefaultClass;
# This is where to look for autoloaded routines.
$AutoloadClass = $DefaultClass unless defined $CGI::AutoloadClass;
# The path separator is a slash, backslash or semicolon, depending
# on the paltform.
$SL = {
UNIX=>'/',
OS2=>'\\',
WINDOWS=>'\\',
MACINTOSH=>':',
VMS=>'\\'
}->{$OS};
# Turn on NPH scripts by default when running under IIS server!
$NPH++ if defined($ENV{'SERVER_SOFTWARE'}) && $ENV{'SERVER_SOFTWARE'}=~/IIS/;
# Turn on special checking for Doug MacEachern's modperl
if (defined($ENV{'GATEWAY_INTERFACE'}) && ($MOD_PERL = $ENV{'GATEWAY_INTERFACE'} =~ /^CGI-Perl/))
{
$NPH++;
$| = 1;
$SEQNO = 1;
}
# This is really "\r\n", but the meaning of \n is different
# in MacPerl, so we resort to octal here.
$CRLF = "\015\012";
if ($needs_binmode) {
$CGI::DefaultClass->binmode(main::STDOUT);
$CGI::DefaultClass->binmode(main::STDIN);
$CGI::DefaultClass->binmode(main::STDERR);
}
# Cute feature, but it broke when the overload mechanism changed...
# %OVERLOAD = ('""'=>'as_string');
%EXPORT_TAGS = (
':html2'=>[h1..h6,qw/p br hr ol ul li dl dt dd menu code var strong em
tt i b blockquote pre img a address cite samp dfn html head
base body link nextid title meta kbd start_html end_html
input Select option/],
':html3'=>[qw/div table caption th td TR Tr super sub strike applet PARAM embed
basefont/],
':netscape'=>[qw/blink frameset frame script font fontsize center/],
':form'=>[qw/textfield textarea filefield password_field hidden checkbox
checkbox_group
submit reset defaults radio_group popup_menu button autoEscape

scrolling_list image_button start_form end_form startform endform
start_multipart_form isindex tmpFileName uploadInfo URL_ENCODED
MULTIPART/],
':cgi'=>[qw/param path_info path_translated url self_url script_name cookie dump
raw_cookie request_method query_string accept user_agent remote_host
remote_addr referer server_name server_software server_port server_protocol
virtual_host remote_ident auth_type http
remote_user user_name header redirect import_names put/],
':ssl' => [qw/https/],
':cgi-lib' => [qw/ReadParse PrintHeader HtmlTop HtmlBot SplitParam/],
':html' => [qw/:html2 :html3 :netscape/],
':standard' => [qw/:html2 :form :cgi/],
':all' => [qw/:html2 :html3 :netscape :form :cgi/]
);
# to import symbols into caller
sub import {
my $self = shift;
my ($callpack, $callfile, $callline) = caller;
foreach (@_) {
$NPH++, next if $_ eq ':nph';
foreach (&expand_tags($_)) {
tr/a-zA-Z0-9_//cd; # don't allow weird function names
$EXPORT{$_}++;
}
}
# To allow overriding, search through the packages
# Till we find one in which the correct subroutine is defined.
my @packages = ($self,@{"$self\:\:ISA"});
foreach $sym (keys %EXPORT) {
my $pck;
my $def = ${"$self\:\:AutoloadClass"} || $DefaultClass;
foreach $pck (@packages) {
if (defined(&{"$pck\:\:$sym"})) {
$def = $pck;
last;
}
}
*{"${callpack}::$sym"} = \&{"$def\:\:$sym"};
}
}
sub expand_tags {
my($tag) = @_;
my(@r);
return ($tag) unless $EXPORT_TAGS{$tag};
foreach (@{$EXPORT_TAGS{$tag}}) {
push(@r,&expand_tags($_));
}
return @r;
}
#### Method: new
# The new routine. This will check the current environment
# for an existing query string, and initialize itself, if so.
####
sub new {
my($class,$initializer) = @_;
my $self = {};
bless $self,ref $class || $class || $DefaultClass;
$CGI::DefaultClass->_reset_globals() if $MOD_PERL;
$initializer = to_filehandle($initializer) if $initializer;
$self->init($initializer);
return $self;
}
# We provide a DESTROY method so that the autoloader
# doesn't bother trying to find it.
sub DESTROY { }
#### Method: param
# Returns the value(s)of a named parameter.
# If invoked in a list context, returns the
# entire list. Otherwise returns the first
# member of the list.
# If name is not provided, return a list of all
# the known parameters names available.
# If more than one argument is provided, the
# second and subsequent arguments are used to
# set the value of the parameter.
####
sub param {
my($self,@p) = self_or_default(@_);
return $self->all_parameters unless @p;
my($name,$value,@other);
# For compatibility between old calling style and use_named_parameters() style,
# we have to special case for a single parameter present.
if (@p > 1) {
($name,$value,@other) = $self->rearrange([NAME,[DEFAULT,VALUE,VALUES]],@p);
my(@values);
if (substr($p[0],0,1) eq '-' || $self->use_named_parameters) {
@values = defined($value) ? (ref($value) && ref($value) eq 'ARRAY' ? @{$value} :
$value) : ();
} else {
foreach ($value,@other) {
push(@values,$_) if defined($_);
}
}
# If values is provided, then we set it.
if (@values) {
$self->add_parameter($name);
$self->{$name}=[@values];
}
} else {
$name = $p[0];
}
return () unless defined($name) && $self->{$name};
return wantarray ? @{$self->{$name}} : $self->{$name}->[0];
}
#### Method: delete
# Deletes the named parameter entirely.
####
sub delete {
my($self,$name) = self_or_default(@_);
delete $self->{$name};
delete $self->{'.fieldnames'}->{$name};
@{$self->{'.parameters'}}=grep($_ ne $name,$self->param());
return wantarray ? () : undef;
}
sub self_or_default {
return @_ if defined($_[0]) && !ref($_[0]) && ($_[0] eq 'CGI');
unless (defined($_[0]) &&
ref($_[0]) &&
(ref($_[0]) eq 'CGI' ||
eval "\$_[0]->isaCGI()")) { # optimize for the common case
$CGI::DefaultClass->_reset_globals()
if defined($Q) && $MOD_PERL && $CGI::DefaultClass->_new_request();
$Q = $CGI::DefaultClass->new unless defined($Q);
unshift(@_,$Q);
}
return @_;
}
sub _new_request {
return undef unless (defined(Apache->seqno()) or eval { require Apache });
if (Apache->seqno() != $SEQNO) {
$SEQNO = Apache->seqno();
return 1;
} else {
return undef;
}
}
sub _reset_globals {
undef $Q;
undef @QUERY_PARAM;
}
sub self_or_CGI {
local $^W=0;
# prevent a warning
if (defined($_[0]) &&
(substr(ref($_[0]),0,3) eq 'CGI'
|| eval "\$_[0]->isaCGI()")) {
return @_;
} else {
return ($DefaultClass,@_);
}
}
sub isaCGI {
return 1;
}
#### Method: import_names
# Import all parameters into the given namespace.
# Assumes namespace 'Q' if not specified
####
sub import_names {
my($self,$namespace) = self_or_default(@_);
$namespace = 'Q' unless defined($namespace);
die "Can't import names into 'main'\n"
if $namespace eq 'main';
my($param,@value,$var);
foreach $param ($self->param) {
# protect against silly names
($var = $param)=~tr/a-zA-Z0-9_/_/c;
$var = "${namespace}::$var";
@value = $self->param($param);
@{$var} = @value;
${$var} = $value[0];
}
}
#### Method: use_named_parameters
# Force CGI.pm to use named parameter-style method calls
# rather than positional parameters. The same effect
# will happen automatically if the first parameter
# begins with a -.
sub use_named_parameters {
my($self,$use_named) = self_or_default(@_);
return $self->{'.named'} unless defined ($use_named);
# stupidity to avoid annoying warnings
return $self->{'.named'}=$use_named;
}
########################################
# THESE METHODS ARE MORE OR LESS PRIVATE
# GO TO THE __DATA__ SECTION TO SEE MORE
# PUBLIC METHODS
########################################
#
#
#
#
#
#
Initialize the query object from the environment.

If a parameter list is found, this object will be set
to an associative array in which parameter names are keys
and the values are stored as lists
If a keyword list is found, this method creates a bogus
parameter list with the single parameter 'keywords'.
sub init {
my($self,$initializer) = @_;
my($query_string,@lines);
my($meth) = '';
# if we get called more than once, we want to initialize
# ourselves from the original query (which may be gone
# if it was read from STDIN originally.)
if (defined(@QUERY_PARAM) && !defined($initializer)) {
foreach (@QUERY_PARAM) {
$self->param('-name'=>$_,'-value'=>$QUERY_PARAM{$_});
}
return;
}
$meth=$ENV{'REQUEST_METHOD'} if defined($ENV{'REQUEST_METHOD'});
# If initializer is defined, then read parameters
# from it.
METHOD: {
if (defined($initializer)) {
if (ref($initializer) && ref($initializer) eq 'HASH') {
foreach (keys %$initializer) {
$self->param('-name'=>$_,'-value'=>$initializer->{$_});
}
last METHOD;
}
$initializer = $$initializer if ref($initializer);
if (defined(fileno($initializer))) {
while (<$initializer>) {
chomp;
last if /^=/;
push(@lines,$_);
}
# massage back into standard format
if ("@lines" =~ /=/) {
$query_string=join("&",@lines);
} else {
$query_string=join("+",@lines);
}
last METHOD;
}
$query_string = $initializer;
last METHOD;
}
# If method is GET or HEAD, fetch the query from
# the environment.
if ($meth=~/^(GET|HEAD)$/) {
$query_string = $ENV{'QUERY_STRING'};
last METHOD;
}
# If the method is POST, fetch the query from standard
# input.
if ($meth eq 'POST') {
if (defined($ENV{'CONTENT_TYPE'})
&&
$ENV{'CONTENT_TYPE'}=~m|^multipart/form-data|) {
my($boundary) = $ENV{'CONTENT_TYPE'}=~/boundary=(\S+)/;
$self->read_multipart($boundary,$ENV{'CONTENT_LENGTH'});
} else {
$self->read_from_client(\*STDIN,\$query_string,$ENV{'CONTENT_LENGTH'},0)
if $ENV{'CONTENT_LENGTH'} > 0;
}
# Some people want to have their cake and eat it too!
# Uncomment this line to have the contents of the query string
# APPENDED to the POST data.
# $query_string .= ($query_string ? '&' : '') . $ENV{'QUERY_STRING'} if
$ENV{'QUERY_STRING'};
last METHOD;
}
# If neither is set, assume we're being debugged offline.
# Check the command line and then the standard input for data.
# We use the shellwords package in order to behave the way that
# UN*X programmers expect.
$query_string = &read_from_cmdline;
}
# We now have the query string in hand. We do slightly
# different things for keyword lists and parameter lists.
if ($query_string) {
if ($query_string =~ /=/) {
$self->parse_params($query_string);
} else {
$self->add_parameter('keywords');
$self->{'keywords'} = [$self->parse_keywordlist($query_string)];
}
}
# Special case. Erase everything if there is a field named
# .defaults.
if ($self->param('.defaults')) {
undef %{$self};
}
# Associative array containing our defined fieldnames
$self->{'.fieldnames'} = {};
foreach ($self->param('.cgifields')) {
$self->{'.fieldnames'}->{$_}++;
}
# Clear out our default submission button flag if present
$self->delete('.submit');
$self->delete('.cgifields');
$self->save_request unless $initializer;
}
# FUNCTIONS TO OVERRIDE:
# Turn a string into a filehandle
sub to_filehandle {
my $string = shift;
if ($string && !ref($string)) {
my($package) = caller(1);
my($tmp) = $string=~/[':]/ ? $string : "$package\:\:$string";
return $tmp if defined(fileno($tmp));
}
return $string;
}
# Create a new multipart buffer
sub new_MultipartBuffer {
my($self,$boundary,$length,$filehandle) = @_;
return MultipartBuffer->new($self,$boundary,$length,$filehandle);
}
# Read data from a file handle
sub read_from_client {
my($self, $fh, $buff, $len, $offset) = @_;
local $^W=0;
# prevent a warning
return read($fh, $$buff, $len, $offset);
}
# put a filehandle into binary mode (DOS)
sub binmode {
binmode($_[1]);
}
# send output to the browser
sub put {
$self->print(@p);
}
# print to standard output (for overriding in mod_perl)
sub print {
shift;
CORE::print(@_);
}
# unescape URL-encoded data
sub unescape {
my($todecode) = @_;
$todecode =~ tr/+/ /;
# pluses become spaces
$todecode =~ s/%([0-9a-fA-F]{2})/pack("c",hex($1))/ge;
return $todecode;
}
# URL-encode data
sub escape {
my($toencode) = @_;
$toencode=~s/([â-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
return $toencode;
}
sub save_request {
my($self) = @_;
# We're going to play with the package globals now so that if we get called
# again, we initialize ourselves in exactly the same way. This allows
# us to have several of these objects.
@QUERY_PARAM = $self->param; # save list of parameters
foreach (@QUERY_PARAM) {
$QUERY_PARAM{$_}=$self->{$_};
}
}
sub parse_keywordlist {
my($self,$tosplit) = @_;
$tosplit = &unescape($tosplit); # unescape the keywords
$tosplit=~tr/+/ /;
# pluses to spaces
my(@keywords) = split(/\s+/,$tosplit);
return @keywords;
}
sub parse_params {
my($self,$tosplit) = @_;
my(@pairs) = split('&',$tosplit);
my($param,$value);
foreach (@pairs) {
($param,$value) = split('=');
$param = &unescape($param);
$value = &unescape($value);
$self->add_parameter($param);
push (@{$self->{$param}},$value);
}
}
sub add_parameter {
my($self,$param)=@_;
push (@{$self->{'.parameters'}},$param)
unless defined($self->{$param});
}
sub all_parameters {
my $self = shift;
return () unless defined($self) && $self->{'.parameters'};
return () unless @{$self->{'.parameters'}};
return @{$self->{'.parameters'}};
}
#### Method as_string
#
# synonym for "dump"
####
sub as_string {
&dump(@_);
}
sub AUTOLOAD {
print STDERR "CGI::AUTOLOAD for $AUTOLOAD\n" if $CGI::AUTOLOAD_DEBUG;
my($func) = $AUTOLOAD;
my($pack,$func_name) = $func=~/(.+)::([^:]+)$/;
$pack = ${"$pack\:\:AutoloadClass"} || $CGI::DefaultClass
unless defined(${"$pack\:\:AUTOLOADED_ROUTINES"});
my($sub) = \%{"$pack\:\:SUBS"};
unless (%$sub) {
my($auto) = \${"$pack\:\:AUTOLOADED_ROUTINES"};
eval "package $pack; $$auto";
die $@ if $@;
}
my($code) = $sub->{$func_name};
$code = "sub $AUTOLOAD { }" if (!$code and $func_name eq 'DESTROY');
if (!$code) {
if ($EXPORT{':any'} ||
$EXPORT{$func_name} ||
(%EXPORT_OK || grep(++$EXPORT_OK{$_},&expand_tags(':html')))
&& $EXPORT_OK{$func_name}) {
$code = $sub->{'HTML_FUNC'};
$code=~s/func_name/$func_name/mg;
}
}
die "Undefined subroutine $AUTOLOAD\n" unless $code;
eval "package $pack; $code";
if ($@) {
$@ =~ s/ at .*\n//;
die $@;
}
goto &{"$pack\:\:$func_name"};
}
# PRIVATE SUBROUTINE
# Smart rearrangement of parameters to allow named parameter
# calling. We do the rearangement if:
# 1. The first parameter begins with a # 2. The use_named_parameters() method returns true
sub rearrange {
my($self,$order,@param) = @_;
return () unless @param;
return @param unless (defined($param[0]) && substr($param[0],0,1) eq '-')
|| $self->use_named_parameters;
my $i;
for ($i=0;$i<@param;$i+=2) {
$param[$i]=~s/^\-//;
# get rid of initial - if present
$param[$i]=~tr/a-z/A-Z/; # parameters are upper case
}
my(%param) = @param;
my(@return_array);
# convert into associative array
my($key)='';
foreach $key (@$order) {
my($value);
# this is an awful hack to fix spurious warnings when the
# -w switch is set.
if (ref($key) && ref($key) eq 'ARRAY') {
foreach (@$key) {
last if defined($value);
$value = $param{$_};
delete $param{$_};
}
} else {
$value = $param{$key};
delete $param{$key};
}
push(@return_array,$value);
}
push (@return_array,$self->make_attributes(\%param)) if %param;
return (@return_array);
}
###############################################################################
################# THESE FUNCTIONS ARE AUTOLOADED ON DEMAND ####################
###############################################################################
$AUTOLOADED_ROUTINES = '';
# get rid of -w warning
$AUTOLOADED_ROUTINES=<<'END_OF_AUTOLOAD';
%SUBS = (
'URL_ENCODED'=> <<'END_OF_FUNC',
sub URL_ENCODED { 'application/x-www-form-urlencoded'; }
END_OF_FUNC
'MULTIPART' => <<'END_OF_FUNC',
sub MULTIPART { 'multipart/form-data'; }
END_OF_FUNC
'HTML_FUNC' => <<'END_OF_FUNC',
sub func_name {
# handle various cases in which we're called
# most of this bizarre stuff is to avoid -w errors
shift if $_[0] &&
(!ref($_[0]) && $_[0] eq $CGI::DefaultClass) ||
(ref($_[0]) &&
(substr(ref($_[0]),0,3) eq 'CGI' ||
eval "\$_[0]->isaCGI()"));
my($attr) = '';
if (ref($_[0]) && ref($_[0]) eq 'HASH') {
my(@attr) = CGI::make_attributes('',shift);
$attr = " @attr" if @attr;
}
my($tag,$untag) = ("\U<func_name\E$attr>","\U</func_name>\E");
return $tag unless @_;
if (ref($_[0]) eq 'ARRAY') {
my(@r);
foreach (@{$_[0]}) {
push(@r,"$tag$_$untag");
}
return "@r";
} else {
return "$tag@_$untag";
}
}
END_OF_FUNC
#### Method: keywords
# Keywords acts a bit differently. Calling it in a list context
# returns the list of keywords.
# Calling it in a scalar context gives you the size of the list.
####
'keywords' => <<'END_OF_FUNC',
sub keywords {
my($self,@values) = self_or_default(@_);
# If values is provided, then we set it.
$self->{'keywords'}=[@values] if @values;
my(@result) = @{$self->{'keywords'}};
@result;
}
END_OF_FUNC
# These are some tie() interfaces for compatibility
# with Steve Brenner's cgi-lib.pl routines
'ReadParse' => <<'END_OF_FUNC',
sub ReadParse {
local(*in);
if (@_) {
*in = $_[0];
} else {
my $pkg = caller();
*in=*{"${pkg}::in"};
}
tie(%in,CGI);
}
END_OF_FUNC
'PrintHeader' => <<'END_OF_FUNC',
sub PrintHeader {
my($self) = self_or_default(@_);
return $self->header();
}
END_OF_FUNC
'HtmlTop' => <<'END_OF_FUNC',
sub HtmlTop {
return $self->start_html(@p);
}
END_OF_FUNC
'HtmlBot' => <<'END_OF_FUNC',
sub HtmlBot {
return $self->end_html(@p);
}
END_OF_FUNC
'SplitParam' => <<'END_OF_FUNC',
sub SplitParam {
my ($param) = @_;
my (@params) = split ("\0", $param);

return (wantarray ? @params : $params[0]);
}
END_OF_FUNC
'MethGet' => <<'END_OF_FUNC',
sub MethGet {
return request_method() eq 'GET';
}
END_OF_FUNC
'MethPost' => <<'END_OF_FUNC',
sub MethPost {
return request_method() eq 'POST';
}
END_OF_FUNC
'TIEHASH' => <<'END_OF_FUNC',
sub TIEHASH {
return new CGI;
}
END_OF_FUNC
'STORE' => <<'END_OF_FUNC',
sub STORE {
$_[0]->param($_[1],split("\0",$_[2]));
}
END_OF_FUNC
'FETCH' => <<'END_OF_FUNC',
sub FETCH {
return $_[0] if $_[1] eq 'CGI';
return undef unless defined $_[0]->param($_[1]);
return join("\0",$_[0]->param($_[1]));
}
END_OF_FUNC
'FIRSTKEY' => <<'END_OF_FUNC',
sub FIRSTKEY {
$_[0]->{'.iterator'}=0;
$_[0]->{'.parameters'}->[$_[0]->{'.iterator'}++];
}
END_OF_FUNC
'NEXTKEY' => <<'END_OF_FUNC',
sub NEXTKEY {
$_[0]->{'.parameters'}->[$_[0]->{'.iterator'}++];
}
END_OF_FUNC
'EXISTS' => <<'END_OF_FUNC',
sub EXISTS {
exists $_[0]->{$_[1]};
}
END_OF_FUNC
'DELETE' => <<'END_OF_FUNC',
sub DELETE {
$_[0]->delete($_[1]);
}
END_OF_FUNC
'CLEAR' => <<'END_OF_FUNC',

sub CLEAR {
%{$_[0]}=();
}
####
END_OF_FUNC
####
# Append a new value to an existing query
####
'append' => <<'EOF',
sub append {
my($self,@p) = @_;
my($name,$value) = $self->rearrange([NAME,[VALUE,VALUES]],@p);
my(@values) = defined($value) ? (ref($value) ? @{$value} : $value) : ();
if (@values) {
$self->add_parameter($name);
push(@{$self->{$name}},@values);
}
return $self->param($name);
}
EOF
#### Method: delete_all
# Delete all parameters
####
'delete_all' => <<'EOF',
sub delete_all {
undef %{$self};
}
EOF
#### Method: autoescape
# If you want to turn off the autoescaping features,
# call this method with undef as the argument
'autoEscape' => <<'END_OF_FUNC',
sub autoEscape {
my($self,$escape) = self_or_default(@_);
$self->{'dontescape'}=!$escape;
}
END_OF_FUNC
#### Method: version
# Return the current version
####
'version' => <<'END_OF_FUNC',
sub version {
return $VERSION;
}
END_OF_FUNC
'make_attributes' => <<'END_OF_FUNC',
sub make_attributes {
my($self,$attr) = @_;
return () unless $attr && ref($attr) && ref($attr) eq 'HASH';
my(@att);
foreach (keys %{$attr}) {
my($key) = $_;
$key=~s/^\-//;
# get rid of initial - if present
$key=~tr/a-z/A-Z/; # parameters are upper case
push(@att,$attr->{$_} ne '' ? qq/$key="$attr->{$_}"/ : qq/$key/);

}
return @att;
}
END_OF_FUNC
#### Method: dump
# Returns a string in which all the known parameter/value
# pairs are represented as nested lists, mainly for the purposes
# of debugging.
####
'dump' => <<'END_OF_FUNC',
sub dump {
my($param,$value,@result);
return '<UL></UL>' unless $self->param;
push(@result,"<UL>");
my($name)=$self->escapeHTML($param);
push(@result,"<LI><STRONG>$param</STRONG>");
push(@result,"<UL>");
foreach $value ($self->param($param)) {
$value = $self->escapeHTML($value);
push(@result,"<LI>$value");
}
push(@result,"</UL>");
}
push(@result,"</UL>\n");
return join("\n",@result);
}
END_OF_FUNC
#### Method: save
# Write values out to a filehandle in such a way that they can
# be reinitialized by the filehandle form of the new() method
####
'save' => <<'END_OF_FUNC',
sub save {
my($self,$filehandle) = self_or_default(@_);
my($param);
my($package) = caller;
# Check that this still works!
#
$filehandle = $filehandle=~/[':]/ ? $filehandle : "$package\:\:$filehandle";
$filehandle = to_filehandle($filehandle);
my($escaped_param) = &escape($param);
my($value);
print $filehandle "$escaped_param=",escape($value),"\n";
}
}
print $filehandle "=\n";
# end of record
}
END_OF_FUNC
#### Method: header
# Return a Content-Type: style header
#
####
'header' => <<'END_OF_FUNC',

sub header {
my(@header);
my($type,$status,$cookie,$target,$expires,$nph,@other) =
$self->rearrange([TYPE,STATUS,[COOKIE,COOKIES],TARGET,EXPIRES,NPH],@p);
# rearrange() was designed for the HTML portion, so we
# need to fix it up a little.
foreach (@other) {
next unless my($header,$value) = /([^\s=]+)=(.+)/;
substr($header,1,1000)=~tr/A-Z/a-z/;
($value)=$value=~/^"(.*)"$/;
$_ = "$header: $value";
}
$type = $type || 'text/html';
push(@header,'HTTP/1.0 ' . ($status || '200 OK')) if $nph || $NPH;
push(@header,"Status: $status") if $status;
push(@header,"Window-target: $target") if $target;
# push all the cookies -- there may be several
if ($cookie) {
my(@cookie) = ref($cookie) ? @{$cookie} : $cookie;
foreach (@cookie) {
push(@header,"Set-cookie: $_");
}
}
# if the user indicates an expiration time, then we need
# both an Expires and a Date header (so that the browser is
# uses OUR clock)
push(@header,"Expires: " . &expires($expires)) if $expires;
push(@header,"Date: " . &expires(0)) if $expires;
push(@header,"Pragma: no-cache") if $self->cache();
push(@header,@other);
push(@header,"Content-type: $type");
my $header = join($CRLF,@header);
return $header . "${CRLF}${CRLF}";
}
END_OF_FUNC
#### Method: cache
# Control whether header() will produce the no-cache
# Pragma directive.
####
'cache' => <<'END_OF_FUNC',
sub cache {
my($self,$new_value) = self_or_default(@_);
$new_value = '' unless $new_value;
if ($new_value ne '') {
$self->{'cache'} = $new_value;
}
return $self->{'cache'};
}
END_OF_FUNC
#### Method: redirect
# Return a Location: style header
#
####
'redirect' => <<'END_OF_FUNC',
sub redirect {
my($url,$target,$cookie,$nph,@other) = $self->rearrange([[URI,URL],TARGET,COOKIE,NPH],@p);
$url = $url || $self->self_url;
my(@o);
foreach (@other) { push(@o,split("=")); }
if($MOD_PERL or exists $self->{'.req'}) {
my $r = $self->{'.req'} || Apache->request;
$r->header_out(Location => $url);
$r->err_header_out(Location => $url);
$r->status(302);
return;
}
push(@o,
'-Status'=>'302 Found',
'-Location'=>$url,
'-URI'=>$url,
'-nph'=>($nph||$NPH));
push(@o,'-Target'=>$target) if $target;
push(@o,'-Cookie'=>$cookie) if $cookie;
return $self->header(@o);
}
END_OF_FUNC
#### Method: start_html
# Canned HTML header
#
# Parameters:
# $title -> (optional) The title for this HTML document (-title)
# $author -> (optional) e-mail address of the author (-author)
# $base -> (optional) if set to true, will enter the BASE address of this document
#
for resolving relative references (-base)
# $xbase -> (optional) alternative base at some remote location (-xbase)
# $target -> (optional) target window to load all links into (-target)
# $script -> (option) Javascript code (-script)
# $no_script -> (option) Javascript <noscript> tag (-noscript)
# $meta -> (optional) Meta information tags
# @other -> (optional) any other named parameters you'd like to incorporate into
#
the <BODY> tag.
####
'start_html' => <<'END_OF_FUNC',
sub start_html {
my($self,@p) = &self_or_default(@_);
my($title,$author,$base,$xbase,$script,$noscript,$target,$meta,@other) =
$self->rearrange([TITLE,AUTHOR,BASE,XBASE,SCRIPT,NOSCRIPT,TARGET,META],@p);
# strangely enough, the title needs to be escaped as HTML
# while the author needs to be escaped as a URL
$title = $self->escapeHTML($title || 'Untitled Document');
$author = $self->escapeHTML($author);
my(@result);
push(@result,'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">');
push(@result,"<HTML><HEAD><TITLE>$title</TITLE>");
push(@result,"<LINK REV=MADE HREF=\"mailto:$author\">") if $author;
if ($base || $xbase || $target) {
my $href = $xbase || $self->url();
my $t = $target ? qq/ TARGET="$target"/ : '';

push(@result,qq/<BASE HREF="$href"$t>/);
}
if ($meta && ref($meta) && (ref($meta) eq 'HASH')) {
foreach (keys %$meta) { push(@result,qq(<META NAME="$_" CONTENT="$meta->{$_}">)); }
}
push(@result,<<END) if $script;
<SCRIPT>

</SCRIPT>
END
;
push(@result,<<END) if $noscript;
<NOSCRIPT>
$noscript
</NOSCRIPT>
END
;
my($other) = @other ? " @other" : '';
push(@result,"</HEAD><BODY$other>");
return join("\n",@result);
}
END_OF_FUNC
#### Method: end_html
# End an HTML document.
# Trivial method for completeness.
####
'end_html' => <<'END_OF_FUNC',
sub end_html {
return "</BODY></HTML>";
}
END_OF_FUNC
Just returns "</BODY>"
################################
# METHODS USED IN BUILDING FORMS
################################
#### Method: isindex
# Just prints out the isindex tag.
# Parameters:
# $action -> optional URL of script to run
# Returns:
#
A string containing a <ISINDEX> tag
'isindex' => <<'END_OF_FUNC',
sub isindex {
my($action,@other) = $self->rearrange([ACTION],@p);
$action = qq/ACTION="$action"/ if $action;
return "<ISINDEX $action$other>";
}
END_OF_FUNC
#### Method: startform
# Start a form
# Parameters:
#
$method -> optional submission method to use (GET or POST)
#
$action -> optional URL of script to run
#
$enctype ->encoding to use (URL_ENCODED or MULTIPART)
'startform' => <<'END_OF_FUNC',
sub startform {
my($method,$action,$enctype,@other) =
$self->rearrange([METHOD,ACTION,ENCTYPE],@p);
$method = $method || 'POST';
$enctype = $enctype || &URL_ENCODED;
$action = $action ? qq/ACTION="$action"/ : $method eq 'GET' ?
'ACTION="'.$self->script_name.'"' : '';
$self->{'.parametersToAdd'}={};
return qq/<FORM METHOD="$method" $action ENCTYPE="$enctype"$other>\n/;
}
END_OF_FUNC
#### Method: start_form
# synonym for startform
'start_form' => <<'END_OF_FUNC',
sub start_form {
&startform;
}
END_OF_FUNC
#### Method: start_multipart_form
# synonym for startform
'start_multipart_form' => <<'END_OF_FUNC',
sub start_multipart_form {
if ($self->use_named_parameters ||
(defined($param[0]) && substr($param[0],0,1) eq '-')) {
my(%p) = @p;
$p{'-enctype'}=&MULTIPART;
return $self->startform(%p);
} else {
my($method,$action,@other) =
$self->rearrange([METHOD,ACTION],@p);
return $self->startform($method,$action,&MULTIPART,@other);
}
}
END_OF_FUNC
#### Method: endform
# End a form
'endform' => <<'END_OF_FUNC',
sub endform {
return ($self->get_fields,"</FORM>");
}
END_OF_FUNC
#### Method: end_form
# synonym for endform
'end_form' => <<'END_OF_FUNC',

sub end_form {
&endform;
}
END_OF_FUNC
#### Method: textfield
# Parameters:
#
$name -> Name of the text field
#
$default -> Optional default value of the field if not
#
already defined.
#
$size -> Optional width of field in characaters.
#
$maxlength -> Optional maximum number of characters.
# Returns:
#
A string containing a <INPUT TYPE="text"> field
#
'textfield' => <<'END_OF_FUNC',
sub textfield {
my($name,$default,$size,$maxlength,$override,@other) =
$self->rearrange([NAME,[DEFAULT,VALUE],SIZE,MAXLENGTH,[OVERRIDE,FORCE]],@p);
my $current = $override ? $default :
(defined($self->param($name)) ? $self->param($name) : $default);
$current = defined($current) ? $self->escapeHTML($current) : '';
$name = defined($name) ? $self->escapeHTML($name) : '';
my($s) = defined($size) ? qq/ SIZE=$size/ : '';
my($m) = defined($maxlength) ? qq/ MAXLENGTH=$maxlength/ : '';
return qq/<INPUT TYPE="text" NAME="$name" VALUE="$current"$s$m$other>/;
}
END_OF_FUNC
#### Method: filefield
# Parameters:
#
$name -> Name of the file upload field
#
$size -> Optional width of field in characaters.
#
$maxlength -> Optional maximum number of characters.
# Returns:
#
A string containing a <INPUT TYPE="text"> field
#
'filefield' => <<'END_OF_FUNC',
sub filefield {
$current = $override ? $default :
$other = ' ' . join(" ",@other);
return qq/<INPUT TYPE="file" NAME="$name" VALUE="$current"$s$m$other>/;
}
END_OF_FUNC
#### Method: password

# Create a "secret password" entry field
# Parameters:
#
$name -> Name of the field
#
#
already defined.
#
$size -> Optional width of field in characters.
#
$maxlength -> Optional maximum characters that can be entered.
# Returns:
#
A string containing a <INPUT TYPE="password"> field
#
'password_field' => <<'END_OF_FUNC',
sub password_field {
my ($self,@p) = self_or_default(@_);
my($current) = $override ? $default :
return qq/<INPUT TYPE="password" NAME="$name" VALUE="$current"$s$m$other>/;
}
END_OF_FUNC
#### Method: textarea
# Parameters:
#
$name -> Name of the text field
#
#
already defined.
#
$rows -> Optional number of rows in text area
#
$columns -> Optional number of columns in text area
# Returns:
#
A string containing a <TEXTAREA></TEXTAREA> tag
#
'textarea' => <<'END_OF_FUNC',
sub textarea {
my($name,$default,$rows,$cols,$override,@other) =
$self->rearrange([NAME,[DEFAULT,VALUE],ROWS,[COLS,COLUMNS],[OVERRIDE,FORCE]],@p);
my($current)= $override ? $default :
my($r) = $rows ? " ROWS=$rows" : '';
my($c) = $cols ? " COLS=$cols" : '';
return qq{<TEXTAREA NAME="$name"$r$c$other>$current</TEXTAREA>};
}
END_OF_FUNC
#### Method: button

# Create a javascript button.
# Parameters:
#
$name -> (optional) Name for the button. (-name)
#
$value -> (optional) Value of the button when selected (and visible name) (-value)
#
$onclick -> (optional) Text of the JavaScript to run when the button is
#
clicked.
# Returns:
#
A string containing a <INPUT TYPE="button"> tag
####
'button' => <<'END_OF_FUNC',
sub button {
my($label,$value,$script,@other) = $self->rearrange([NAME,[VALUE,LABEL],
[ONCLICK,SCRIPT]],@p);
$label=$self->escapeHTML($label);
$value=$self->escapeHTML($value);
$script=$self->escapeHTML($script);
my($name) = '';
$name = qq/ NAME="$label"/ if $label;
$value = $value || $label;
my($val) = '';
$val = qq/ VALUE="$value"/ if $value;
$script = qq/ ONCLICK="$script"/ if $script;
return qq/<INPUT TYPE="button"$name$val$script$other>/;
}
END_OF_FUNC
#### Method: submit
# Create a "submit query" button.
# Parameters:
#
$name -> (optional) Name for the button.
#
$value -> (optional) Value of the button when selected (also doubles as label).
#
$label -> (optional) Label printed on the button(also doubles as the value).
# Returns:
#
A string containing a <INPUT TYPE="submit"> tag
####
'submit' => <<'END_OF_FUNC',
sub submit {
my($label,$value,@other) = $self->rearrange([NAME,[VALUE,LABEL]],@p);
$value=$self->escapeHTML($value);
my($name) = ' NAME=".submit"';
$name = qq/ NAME="$label"/ if $label;
$value = $value || $label;
my($val) = '';
$val = qq/ VALUE="$value"/ if defined($value);
return qq/<INPUT TYPE="submit"$name$val$other>/;
}
END_OF_FUNC
#### Method: reset

# Create a "reset" button.
# Parameters:
#
# Returns:
#
A string containing a <INPUT TYPE="reset"> tag
####
'reset' => <<'END_OF_FUNC',
sub reset {
my($label,@other) = $self->rearrange([NAME],@p);
my($value) = defined($label) ? qq/ VALUE="$label"/ : '';
return qq/<INPUT TYPE="reset"$value$other>/;
}
END_OF_FUNC
#### Method: defaults
# Create a "defaults" button.
# Parameters:
#
# Returns:
#
A string containing a <INPUT TYPE="submit" NAME=".defaults"> tag
#
# Note: this button has a special meaning to the initialization script,
# and tells it to ERASE the current query string so that your defaults
# are used again!
####
'defaults' => <<'END_OF_FUNC',
sub defaults {
my($label,@other) = $self->rearrange([[NAME,VALUE]],@p);
$label = $label || "Defaults";
my($value) = qq/ VALUE="$label"/;
return qq/<INPUT TYPE="submit" NAME=".defaults"$value$other>/;
}
END_OF_FUNC
#### Method: checkbox
# Create a checkbox that is not logically linked to any others.
# The field value is "on" when the button is checked.
# Parameters:
#
$name -> Name of the checkbox
#
$checked -> (optional) turned on by default if true
#
$value -> (optional) value of the checkbox, 'on' by default
#
$label -> (optional) a user-readable label printed next to the box.
#
Otherwise the checkbox name is used.
# Returns:
#
A string containing a <INPUT TYPE="checkbox"> field
####
'checkbox' => <<'END_OF_FUNC',
sub checkbox {
my($name,$checked,$value,$label,$override,@other) =
$self->rearrange([NAME,[CHECKED,SELECTED,ON],VALUE,LABEL,[OVERRIDE,FORCE]],@p);
if (!$override && defined($self->param($name))) {
$value = $self->param($name) unless defined $value;
$checked = $self->param($name) eq $value ? ' CHECKED' : '';
} else {
$checked = $checked ? ' CHECKED' : '';
$value = defined $value ? $value : 'on';
}
my($the_label) = defined $label ? $label : $name;
$name = $self->escapeHTML($name);
$value = $self->escapeHTML($value);
$the_label = $self->escapeHTML($the_label);
$self->register_parameter($name);
return <<END;
<INPUT TYPE="checkbox" NAME="$name" VALUE="$value"$checked$other>$the_label
END
}
END_OF_FUNC
#### Method: checkbox_group
# Create a list of logically-linked checkboxes.
# Parameters:
#
$name -> Common name for all the check boxes
#
$values -> A pointer to a regular array containing the
#
values for each checkbox in the group.
#
$defaults -> (optional)
#
1. If a pointer to a regular array of checkbox values,
#
then this will be used to decide which
#
checkboxes to turn on by default.
#
2. If a scalar, will be assumed to hold the
#
value of a single checkbox in the group to turn on.
#
$linebreak -> (optional) Set to true to place linebreaks
#
between the buttons.
#
$labels -> (optional)
#
A pointer to an associative array of labels to print next to each checkbox
#
in the form $label{'value'}="Long explanatory label".
#
Otherwise the provided values are used as the labels.
# Returns:
#
An ARRAY containing a series of <INPUT TYPE="checkbox"> fields
####
'checkbox_group' => <<'END_OF_FUNC',
sub checkbox_group {
my($name,$values,$defaults,$linebreak,$labels,$rows,$columns,
$rowheaders,$colheaders,$override,$nolabels,@other) =
$self->rearrange([NAME,[VALUES,VALUE],[DEFAULTS,DEFAULT],
LINEBREAK,LABELS,ROWS,[COLUMNS,COLS],
ROWHEADERS,COLHEADERS,
[OVERRIDE,FORCE],NOLABELS],@p);
my($checked,$break,$result,$label);
my(%checked) = $self->previous_or_default($name,$defaults,$override);
$break = $linebreak ? "<BR>" : '';
$name=$self->escapeHTML($name);
# Create the elements

my(@elements);
my(@values) = $values ? @$values : $self->param($name);
foreach (@values) {
$checked = $checked{$_} ? ' CHECKED' : '';
$label = '';
unless (defined($nolabels) && $nolabels) {
$label = $_;
$label = $labels->{$_} if defined($labels) && $labels->{$_};
$label = $self->escapeHTML($label);
}
$_ = $self->escapeHTML($_);
push(@elements,qq/<INPUT TYPE="checkbox" NAME="$name" VALUE="$_"$checked$other>${label}
${break}/);
}
return wantarray ? @elements : join('',@elements) unless $columns;
return _tableize($rows,$columns,$rowheaders,$colheaders,@elements);
}
END_OF_FUNC
# Escape HTML -- used internally
'escapeHTML' => <<'END_OF_FUNC',
sub escapeHTML {
my($self,$toencode) = @_;
return undef unless defined($toencode);
return $toencode if $self->{'dontescape'};
$toencode=~s/&/&/g;
$toencode=~s/\"/"/g;
$toencode=~s/>/>/g;
$toencode=~s/</</g;
return $toencode;
}
END_OF_FUNC
# Internal procedure - don't use
'_tableize' => <<'END_OF_FUNC',
sub _tableize {
my($rows,$columns,$rowheaders,$colheaders,@elements) = @_;
my($result);
$rows = int(0.99 + @elements/$columns) unless $rows;
# rearrange into a pretty table
$result = "<TABLE>";
my($row,$column);
unshift(@$colheaders,'') if @$colheaders && @$rowheaders;
$result .= "<TR>" if @{$colheaders};
foreach (@{$colheaders}) {
$result .= "<TH>$_</TH>";
}
for ($row=0;$row<$rows;$row++) {
$result .= "<TR>";
$result .= "<TH>$rowheaders->[$row]</TH>" if @$rowheaders;
for ($column=0;$column<$columns;$column++) {
$result .= "<TD>" . $elements[$column*$rows + $row] . "</TD>";
}
$result .= "</TR>";
}
$result .= "</TABLE>";
return $result;
}
END_OF_FUNC
#### Method: radio_group
# Create a list of logically-linked radio buttons.
# Parameters:
#
$name -> Common name for all the buttons.
#
#
values for each button in the group.
#
$default -> (optional) Value of the button to turn on by default. Pass '-'
#
to turn _nothing_ on.
#
$linebreak -> (optional) Set to true to place linebreaks
#
between the buttons.
#
#
#
#
# Returns:
#
An ARRAY containing a series of <INPUT TYPE="radio"> fields
####
'radio_group' => <<'END_OF_FUNC',
sub radio_group {
my($name,$values,$default,$linebreak,$labels,
$rows,$columns,$rowheaders,$colheaders,$override,$nolabels,@other) =
$self->rearrange([NAME,[VALUES,VALUE],DEFAULT,LINEBREAK,LABELS,
ROWS,[COLUMNS,COLS],
ROWHEADERS,COLHEADERS,
[OVERRIDE,FORCE],NOLABELS],@p);
my($result,$checked);
$checked = $self->param($name);
} else {
$checked = $default;
}
# If no check array is specified, check the first by default
$checked = $values->[0] unless $checked;
my(@elements);
foreach (@values) {
my($checkit) = $checked eq $_ ? ' CHECKED' : '';
my($break) = $linebreak ? '<BR>' : '';
my($label)='';
unless (defined($nolabels) && $nolabels) {
$label = $_;
$label = $self->escapeHTML($label);
}
$_=$self->escapeHTML($_);
push(@elements,qq/<INPUT TYPE="radio" NAME="$name" VALUE="$_"$checkit$other>${label}
${break}/);
}
return wantarray ? @elements : join('',@elements) unless $columns;

return _tableize($rows,$columns,$rowheaders,$colheaders,@elements);
}
END_OF_FUNC
#### Method: popup_menu
# Create a popup menu.
# Parameters:
#
$name -> Name for all the menu
#
#
text of each menu item.
#
$default -> (optional) Default item to display
#
#
#
#
# Returns:
#
A string containing the definition of a popup menu.
####
'popup_menu' => <<'END_OF_FUNC',
sub popup_menu {
my($name,$values,$default,$labels,$override,@other) =
$self->rearrange([NAME,[VALUES,VALUE],[DEFAULT,DEFAULTS],LABELS,[OVERRIDE,FORCE]],@p);
my($result,$selected);
$selected = $self->param($name);
} else {
$selected = $default;
}
$result = qq/<SELECT NAME="$name"$other>\n/;
foreach (@values) {
my($selectit) = defined($selected) ? ($selected eq $_ ? 'SELECTED' : '' ) : '';
my($label) = $_;
my($value) = $self->escapeHTML($_);
$result .= "<OPTION $selectit VALUE=\"$value\">$label\n";
}
$result .= "</SELECT>\n";
return $result;
}
END_OF_FUNC
#### Method: scrolling_list
# Create a scrolling list.
# Parameters:
#
$name -> name for the list
#
#
values for each option line in the list.
#
$defaults -> (optional)
#
1. If a pointer to a regular array of options,
#
then this will be used to decide which
#
lines to turn on by default.
#
2. Otherwise holds the value of the single line to turn on.
#
$size -> (optional) Size of the list.
#
$multiple -> (optional) If set, allow multiple selections.
#
#
#
#
# Returns:
#
A string containing the definition of a scrolling list.
####
'scrolling_list' => <<'END_OF_FUNC',
sub scrolling_list {
my($name,$values,$defaults,$size,$multiple,$labels,$override,@other)
= $self->rearrange([NAME,[VALUES,VALUE],[DEFAULTS,DEFAULT],
SIZE,MULTIPLE,LABELS,[OVERRIDE,FORCE]],@p);
my($result);
$size = $size || scalar(@values);
my(%selected) = $self->previous_or_default($name,$defaults,$override);
my($is_multiple) = $multiple ? ' MULTIPLE' : '';
my($has_size) = $size ? " SIZE=$size" : '';
$result = qq/<SELECT NAME="$name"$has_size$is_multiple$other>\n/;
foreach (@values) {
my($selectit) = $selected{$_} ? 'SELECTED' : '';
my($label) = $_;
my($value)=$self->escapeHTML($_);
$result .= "<OPTION $selectit VALUE=\"$value\">$label\n";
}
$result .= "</SELECT>\n";
return $result;
}
END_OF_FUNC
#### Method: hidden
# Parameters:
#
$name -> Name of the hidden field
#
@default -> (optional) Initial values of field (may be an array)
#
or
#
$default->[initial values of field]
# Returns:
#
A string containing a <INPUT TYPE="hidden" NAME="name" VALUE="value">
####
'hidden' => <<'END_OF_FUNC',
sub hidden {
# this is the one place where we departed from our standard
# calling scheme, so we have to special-case (darn)
my(@result,@value);
my($name,$default,$override,@other) =
$self->rearrange([NAME,[DEFAULT,VALUE,VALUES],[OVERRIDE,FORCE]],@p);
my $do_override = 0;
if ( substr($p[0],0,1) eq '-' || $self->use_named_parameters ) {
@value = ref($default) ? @{$default} : $default;
$do_override = $override;
} else {
foreach ($default,$override,@other) {
push(@value,$_) if defined($_);
}
}
# use previous values if override is not set
my @prev = $self->param($name);
@value = @prev if !$do_override && @prev;
foreach (@value) {
$_=$self->escapeHTML($_);
push(@result,qq/<INPUT TYPE="hidden" NAME="$name" VALUE="$_">/);
}
return wantarray ? @result : join('',@result);
}
END_OF_FUNC
#### Method: image_button
# Parameters:
#
$name -> Name of the button
#
$src -> URL of the image source
#
$align -> Alignment style (TOP, BOTTOM or MIDDLE)
# Returns:
#
A string containing a <INPUT TYPE="image" NAME="name" SRC="url" ALIGN="alignment">
####
'image_button' => <<'END_OF_FUNC',
sub image_button {
my($name,$src,$alignment,@other) =
$self->rearrange([NAME,SRC,ALIGN],@p);
my($align) = $alignment ? " ALIGN=\U$alignment" : '';
return qq/<INPUT TYPE="image" NAME="$name" SRC="$src"$align$other>/;
}
END_OF_FUNC
#### Method: self_url
# Returns a URL containing the current script and all its
# param/value pairs arranged as a query. You can use this
# to create a link that, when selected, will reinvoke the
# script with all its state information preserved.
####
'self_url' => <<'END_OF_FUNC',
sub self_url {
my($query_string) = $self->query_string;
my $protocol = $self->protocol();
my $name = "$protocol://" . $self->server_name;
$name .= ":" . $self->server_port
unless $self->server_port == 80;
$name .= $self->script_name;
$name .= $self->path_info if $self->path_info;
return $name unless $query_string;
return "$name?$query_string";
}
END_OF_FUNC
# This is provided as a synonym to self_url() for people unfortunate
# enough to have incorporated it into their programs already!
'state' => <<'END_OF_FUNC',
sub state {
&self_url;
}
END_OF_FUNC
#### Method: url
# Like self_url, but doesn't return the query string part of
# the URL.
####
'url' => <<'END_OF_FUNC',
sub url {
my $protocol = $self->protocol();
my $name = "$protocol://" . $self->server_name;
$name .= ":" . $self->server_port
unless $self->server_port == 80;
$name .= $self->script_name;
return $name;
}
END_OF_FUNC
#### Method: cookie
# Set or read a cookie from the specified name.
# Cookie can then be passed to header().
# Usual rules apply to the stickiness of -value.
# Parameters:
#
-name -> name for this cookie (optional)
#
-value -> value of this cookie (scalar, array or hash)
#
-path -> paths for which this cookie is valid (optional)
#
-domain -> internet domain in which this cookie is valid (optional)
#
-secure -> if true, cookie only passed through secure channel (optional)
#
-expires -> expiry date in format Wdy, DD-Mon-YY HH:MM:SS GMT (optional)
####
'cookie' => <<'END_OF_FUNC',
# temporary, for debugging.
sub cookie {
my($name,$value,$path,$domain,$secure,$expires) =
$self->rearrange([NAME,[VALUE,VALUES],PATH,DOMAIN,SECURE,EXPIRES],@p);
# if no value is supplied, then we retrieve the
# value of the cookie, if any. For efficiency, we cache the parsed
# cookie in our state variables.
unless (defined($value)) {
unless ($self->{'.cookies'}) {
my(@pairs) = split("; ",$self->raw_cookie);
foreach (@pairs) {
my($key,$value) = split("=");
my(@values) = map unescape($_),split('&',$value);

$self->{'.cookies'}->{unescape($key)} = [@values];
}
}
# If no name is supplied, then retrieve the names of all our cookies.
return () unless $self->{'.cookies'};
return wantarray ? @{$self->{'.cookies'}->{$name}} : $self->{'.cookies'}->{$name}->[0]
if defined($name) && $name ne '';
return keys %{$self->{'.cookies'}};
}
my(@values);
# Pull out our parameters.
if (ref($value)) {
if (ref($value) eq 'ARRAY') {
@values = @$value;
} elsif (ref($value) eq 'HASH') {
@values = %$value;
}
} else {
@values = ($value);
}
@values = map escape($_),@values;
# I.E. requires the path to be present.
($path = $ENV{'SCRIPT_NAME'})=~s![^/]+$!! unless $path;
my(@constant_values);
push(@constant_values,"domain=$domain") if $domain;
push(@constant_values,"path=$path") if $path;
push(@constant_values,"expires=".&expires($expires)) if $expires;
push(@constant_values,'secure') if $secure;
my($key) = &escape($name);
my($cookie) = join("=",$key,join("&",@values));
return join("; ",$cookie,@constant_values);
}
END_OF_FUNC
# This internal routine creates an expires string exactly some number of
# hours from the current time in GMT. This is the format
# required by Netscape cookies, and I think it works for the HTTP
# Expires: header as well.
'expires' => <<'END_OF_FUNC',
sub expires {
my($time) = @_;
my(@MON)=qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/;
my(@WDAY) = qw/Sunday Monday Tuesday Wednesday Thursday Friday Saturday/;
my(%mult) = ('s'=>1,
'm'=>60,
'h'=>60*60,
'd'=>60*60*24,
'M'=>60*60*24*30,
'y'=>60*60*24*365);
# format for time can be in any of the forms...
# "now" -- expire immediately
# "+180s" -- in 180 seconds
# "+2m" -- in 2 minutes
# "+12h" -- in 12 hours
# "+1d" -- in 1 day
# "+3M" -- in 3 months
# "+2y" -- in 2 years
# "-3m" -- 3 minutes ago(!)
# If you don't supply one of these forms, we assume you are
# specifying the date yourself
my($offset);
if (!$time || ($time eq 'now')) {
$offset = 0;
} elsif ($time=~/^([+-]?\d+)([mhdMy]?)/) {
$offset = ($mult{$2} || 1)*$1;
} else {
return $time;
}
my($sec,$min,$hour,$mday,$mon,$year,$wday) = gmtime(time+$offset);
$year += 1900 unless $year < 100;
return sprintf("%s, %02d-%s-%02d %02d:%02d:%02d GMT",
$WDAY[$wday],$mday,$MON[$mon],$year,$hour,$min,$sec);
}
END_OF_FUNC
###############################################
# OTHER INFORMATION PROVIDED BY THE ENVIRONMENT
###############################################
#### Method: path_info
# Return the extra virtual path information provided
# after the URL (if any)
####
'path_info' => <<'END_OF_FUNC',
sub path_info {
local($path_info) = $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'};
return $path_info;
}
END_OF_FUNC
#### Method: request_method
# Returns 'POST', 'GET', 'PUT' or 'HEAD'
####
'request_method' => <<'END_OF_FUNC',
sub request_method {
return $ENV{'REQUEST_METHOD'};
}
END_OF_FUNC
#### Method: path_translated
# Return the physical path information provided
# by the URL (if any)
####
'path_translated' => <<'END_OF_FUNC',
sub path_translated {
return $ENV{'PATH_TRANSLATED'};
}
END_OF_FUNC
#### Method: query_string
# Synthesize a query string from our current
# parameters
####
'query_string' => <<'END_OF_FUNC',
sub query_string {
my($param,$value,@pairs);
my($eparam) = &escape($param);
$value = &escape($value);
push(@pairs,"$eparam=$value");
}
}
return join("&",@pairs);
}
END_OF_FUNC
#### Method: accept
# Without parameters, returns an array of the
# MIME types the browser accepts.
# With a single parameter equal to a MIME
# type, will return undef if the browser won't
# accept it, 1 if the browser accepts it but
# doesn't give a preference, or a floating point
# value between 0.0 and 1.0 if the browser
# declares a quantitative score for it.
# This handles MIME type globs correctly.
####
'accept' => <<'END_OF_FUNC',
sub accept {
my($self,$search) = self_or_CGI(@_);
my(%prefs,$type,$pref,$pat);
my(@accept) = split(',',$self->http('accept'));
foreach (@accept) {
($pref) = /q=(\d\.\d+|\d+)/;
($type) = m#(\S+/[^;]+)#;
next unless $type;
$prefs{$type}=$pref || 1;
}
return keys %prefs unless $search;
#
#
#
#
if a search type is provided, we may need to

perform a pattern matching operation.
The MIME types use a glob mechanism, which
is easily translated into a perl pattern match
# First return the preference for directly supported

# types:
return $prefs{$search} if $prefs{$search};
# Didn't get it, so try pattern matching.
foreach (keys %prefs) {
next unless /\*/;
# not a pattern match
($pat = $_) =~ s/([^\w*])/\\$1/g; # escape meta characters
$pat =~ s/\*/.*/g; # turn it into a pattern
return $prefs{$_} if $search=~/$pat/;
}
}
END_OF_FUNC
#### Method: user_agent

# If called with no parameters, returns the user agent.
# If called with one parameter, does a pattern match (case
# insensitive) on the user agent.
####
'user_agent' => <<'END_OF_FUNC',
sub user_agent {
my($self,$match)=self_or_CGI(@_);
return $self->http('user_agent') unless $match;
return $self->http('user_agent') =~ /$match/i;
}
END_OF_FUNC
#### Method: cookie
# Returns the magic cookie for the session.
# To set the magic cookie for new transations,
# try print $q->header('-Set-cookie'=>'my cookie')
####
'raw_cookie' => <<'END_OF_FUNC',
sub raw_cookie {
my($self) = self_or_CGI(@_);
return $self->http('cookie') || $ENV{'COOKIE'} || '';
}
END_OF_FUNC
#### Method: virtual_host
# Return the name of the virtual_host, which
# is not always the same as the server
######
'virtual_host' => <<'END_OF_FUNC',
sub virtual_host {
return http('host') || server_name();
}
END_OF_FUNC
#### Method: remote_host
# Return the name of the remote host, or its IP
# address if unavailable. If this variable isn't
# defined, it returns "localhost" for debugging
# purposes.
####
'remote_host' => <<'END_OF_FUNC',
sub remote_host {
return $ENV{'REMOTE_HOST'} || $ENV{'REMOTE_ADDR'}
|| 'localhost';
}
END_OF_FUNC
#### Method: remote_addr
# Return the IP addr of the remote host.
####
'remote_addr' => <<'END_OF_FUNC',
sub remote_addr {
return $ENV{'REMOTE_ADDR'} || '127.0.0.1';
}
END_OF_FUNC
#### Method: script_name
# Return the partial URL to this script for
# self-referencing scripts. Also see

# self_url(), which returns a URL with all state information
# preserved.
####
'script_name' => <<'END_OF_FUNC',
sub script_name {
return $ENV{'SCRIPT_NAME'} if $ENV{'SCRIPT_NAME'};
# These are for debugging
return "/$0" unless $0=~/^\//;
return $0;
}
END_OF_FUNC
#### Method: referer
# Return the HTTP_REFERER: useful for generating
# a GO BACK button.
####
'referer' => <<'END_OF_FUNC',
sub referer {
my($self) = self_or_CGI(@_);
return $self->http('referer');
}
END_OF_FUNC
#### Method: server_name
# Return the name of the server
####
'server_name' => <<'END_OF_FUNC',
sub server_name {
return $ENV{'SERVER_NAME'} || 'localhost';
}
END_OF_FUNC
#### Method: server_software
# Return the name of the server software
####
'server_software' => <<'END_OF_FUNC',
sub server_software {
return $ENV{'SERVER_SOFTWARE'} || 'cmdline';
}
END_OF_FUNC
#### Method: server_port
# Return the tcp/ip port the server is running on
####
'server_port' => <<'END_OF_FUNC',
sub server_port {
return $ENV{'SERVER_PORT'} || 80; # for debugging
}
END_OF_FUNC
#### Method: server_protocol
# Return the protocol (usually HTTP/1.0)
####
'server_protocol' => <<'END_OF_FUNC',
sub server_protocol {
return $ENV{'SERVER_PROTOCOL'} || 'HTTP/1.0'; # for debugging
}
END_OF_FUNC
#### Method: http

# Return the value of an HTTP variable, or
# the list of variables if none provided
####
'http' => <<'END_OF_FUNC',
sub http {
my ($self,$parameter) = self_or_CGI(@_);
return $ENV{$parameter} if $parameter=~/^HTTP/;
return $ENV{"HTTP_\U$parameter\E"} if $parameter;
my(@p);
foreach (keys %ENV) {
push(@p,$_) if /^HTTP/;
}
return @p;
}
END_OF_FUNC
#### Method: https
# Return the value of HTTPS
####
'https' => <<'END_OF_FUNC',
sub https {
local($^W)=0;
my ($self,$parameter) = self_or_CGI(@_);
return $ENV{HTTPS} unless $parameter;
return $ENV{$parameter} if $parameter=~/^HTTPS/;
return $ENV{"HTTPS_\U$parameter\E"} if $parameter;
my(@p);
foreach (keys %ENV) {
push(@p,$_) if /^HTTPS/;
}
return @p;
}
END_OF_FUNC
#### Method: protocol
# Return the protocol (http or https currently)
####
'protocol' => <<'END_OF_FUNC',
sub protocol {
local($^W)=0;
my $self = shift;
return 'https' if $self->https() eq 'ON';
return 'https' if $self->server_port == 443;
my $prot = $self->server_protocol;
my($protocol,$version) = split('/',$prot);
return "\L$protocol\E";
}
END_OF_FUNC
#### Method: remote_ident
# Return the identity of the remote user
# (but only if his host is running identd)
####
'remote_ident' => <<'END_OF_FUNC',
sub remote_ident {
return $ENV{'REMOTE_IDENT'};
}
END_OF_FUNC
#### Method: auth_type

# Return the type of use verification/authorization in use, if any.
####
'auth_type' => <<'END_OF_FUNC',
sub auth_type {
return $ENV{'AUTH_TYPE'};
}
END_OF_FUNC
#### Method: remote_user
# Return the authorization name used for user
# verification.
####
'remote_user' => <<'END_OF_FUNC',
sub remote_user {
return $ENV{'REMOTE_USER'};
}
END_OF_FUNC
#### Method: user_name
# Try to return the remote user's name by hook or by
# crook
####
'user_name' => <<'END_OF_FUNC',
sub user_name {
my ($self) = self_or_CGI(@_);
return $self->http('from') || $ENV{'REMOTE_IDENT'} || $ENV{'REMOTE_USER'};
}
END_OF_FUNC
#### Method: nph
# Set or return the NPH global flag
####
'nph' => <<'END_OF_FUNC',
sub nph {
my ($self,$param) = self_or_CGI(@_);
$CGI::nph = $param if defined($param);
return $CGI::nph;
}
END_OF_FUNC
# -------------- really private subroutines ----------------'previous_or_default' => <<'END_OF_FUNC',
sub previous_or_default {
my($self,$name,$defaults,$override) = @_;
my(%selected);
if (!$override && ($self->{'.fieldnames'}->{$name} ||
defined($self->param($name)) ) ) {
grep($selected{$_}++,$self->param($name));
} elsif (defined($defaults) && ref($defaults) &&
(ref($defaults) eq 'ARRAY')) {
grep($selected{$_}++,@{$defaults});
} else {
$selected{$defaults}++ if defined($defaults);
}
return %selected;
}
END_OF_FUNC
'register_parameter' => <<'END_OF_FUNC',

sub register_parameter {
my($self,$param) = @_;
$self->{'.parametersToAdd'}->{$param}++;
}
END_OF_FUNC
'get_fields' => <<'END_OF_FUNC',
sub get_fields {
my($self) = @_;
return $self->hidden('-name'=>'.cgifields',
'-values'=>[keys %{$self->{'.parametersToAdd'}}],
'-override'=>1);
}
END_OF_FUNC
'read_from_cmdline' => <<'END_OF_FUNC',
sub read_from_cmdline {
require "shellwords.pl";
my($input,@words);
my($query_string);
if (@ARGV) {
$input = join(" ",@ARGV);
} else {
print STDERR "(offline mode: enter name=value pairs on standard input)\n";
chomp(@lines = <>); # remove newlines
$input = join(" ",@lines);
}
# minimal handling of escape characters
$input=~s/\\=/%3D/g;
$input=~s/\\&/%26/g;
@words = &shellwords($input);
if ("@words"=~/=/) {
$query_string = join('&',@words);
} else {
$query_string = join('+',@words);
}
return $query_string;
}
END_OF_FUNC
#####
# subroutine: read_multipart
#
# Read multipart data and store it into our parameters.
# An interesting feature is that if any of the parts is a file, we
# create a temporary file and open up a filehandle on it so that the
# caller can read from it if necessary.
#####
'read_multipart' => <<'END_OF_FUNC',
sub read_multipart {
my($self,$boundary,$length) = @_;
my($buffer) = $self->new_MultipartBuffer($boundary,$length);
return unless $buffer;
my(%header,$body);
while (!$buffer->eof) {
%header = $buffer->readHeader;
die "Malformed multipart POST\n" unless %header;
# In beta1 it was "Content-disposition". In beta2 it's "Content-Disposition"

# Sheesh.
my($key) = $header{'Content-disposition'} ? 'Content-disposition' : 'Content-Disposition';
my($param)= $header{$key}=~/ name="([^\"]*)"/;
# possible bug: our regular expression expects the filename= part to fall
# at the end of the line. Netscape doesn't escape quotation marks in file names!!!
my($filename) = $header{$key}=~/ filename="(.*)"$/;
# add this parameter to our list
$self->add_parameter($param);
# If no filename specified, then just read the data and assign it
# to our parameter list.
unless ($filename) {
my($value) = $buffer->readBody;
push(@{$self->{$param}},$value);
next;
}
# If we get here, then we are dealing with a potentially large
# uploaded form. Save the data to a temporary file, then open
# the file for reading.
my($tmpfile) = new TempFile;
my $tmp = $tmpfile->as_string;
open (OUT,">$tmp") || die "CGI open of $tmpfile: $!\n";
$CGI::DefaultClass->binmode(OUT) if $CGI::needs_binmode;
chmod 0666,$tmp;
# make sure anyone can delete it.
my $data;
while ($data = $buffer->read) {
print OUT $data;
}
close OUT;
# Now create a new filehandle in the caller's namespace.
# The name of this filehandle just happens to be identical
# to the original filename (NOT the name of the temporary
# file, which is hidden!)
my($filehandle);
if ($filename=~/^[a-zA-Z_]/) {
my($frame,$cp)=(1);
do { $cp = caller($frame++); } until !eval("'$cp'->isaCGI()");
$filehandle = "$cp\:\:$filename";
} else {
$filehandle = "\:\:$filename";
}
open($filehandle,$tmp) || die "CGI open of $tmp: $!\n";
$CGI::DefaultClass->binmode($filehandle) if $CGI::needs_binmode;
push(@{$self->{$param}},$filename);
#
#
#
#
#
#
#
#
#
Under Unix, it would be safe to let the temporary file

be deleted immediately. However, I fear that other operating
systems are not so forgiving. Therefore we save a reference
to the temporary file in the CGI object so that the file
isn't unlinked until the CGI object itself goes out of
scope. This is a bit hacky, but it has the interesting side
effect that one can access the name of the tmpfile by
asking for $query->{$query->param('foo')}, where 'foo'
is the name of the file upload field.
$self->{'.tmpfiles'}->{$filename}= {
name=>$tmpfile,
info=>{%header}
}
}
}
END_OF_FUNC
'tmpFileName' => <<'END_OF_FUNC',
sub tmpFileName {
my($self,$filename) = self_or_default(@_);
return $self->{'.tmpfiles'}->{$filename}->{name}->as_string;
}
END_OF_FUNC
'uploadInfo' => <<'END_OF_FUNC'
sub uploadInfo {
my($self,$filename) = self_or_default(@_);
return $self->{'.tmpfiles'}->{$filename}->{info};
}
END_OF_FUNC
);
END_OF_AUTOLOAD
;
# Globals and stubs for other packages that we use
package MultipartBuffer;
# how many bytes to read at a time. We use
# a 5K buffer by default.
$FILLUNIT = 1024 * 5;
$TIMEOUT = 10*60;
# 10 minute timeout
$SPIN_LOOP_MAX = 1000; # bug fix for some Netscape servers
$CRLF=$CGI::CRLF;
#reuse the autoload function
*MultipartBuffer::AUTOLOAD = \&CGI::AUTOLOAD;
###############################################################################
###############################################################################
# prevent -w error
%SUBS = (
'new' => <<'END_OF_FUNC',
sub new {
my($package,$interface,$boundary,$length,$filehandle) = @_;
my $IN;
if ($filehandle) {
my($package) = caller;
# force into caller's package if necessary
$IN = $filehandle=~/[':]/ ? $filehandle : "$package\:\:$filehandle";
}
$IN = "main::STDIN" unless $IN;
$CGI::DefaultClass->binmode($IN) if $CGI::needs_binmode;
# If the user types garbage into the file upload field,
# then Netscape passes NOTHING to the server (not good).
# We may hang on this read in that case. So we implement
# a read timeout. If nothing is ready to read

# by then, we return.
# Netscape seems to be a little bit unreliable
# about providing boundary strings.
if ($boundary) {
# Under the MIME spec, the boundary consists of the
# characters "--" PLUS the Boundary string
$boundary = "--$boundary";
# Read the topmost (boundary) line plus the CRLF
my($null) = '';
$length -= $interface->read_from_client($IN,\$null,length($boundary)+2,0);
} else { # otherwise we find it ourselves
my($old);
($old,$/) = ($/,$CRLF); # read a CRLF-delimited line
$boundary = <$IN>;
# BUG: This won't work correctly under mod_perl
$length -= length($boundary);
chomp($boundary);
# remove the CRLF
$/ = $old;
# restore old line separator
}
my $self = {LENGTH=>$length,
BOUNDARY=>$boundary,
IN=>$IN,
INTERFACE=>$interface,
BUFFER=>'',
};
$FILLUNIT = length($boundary)
if length($boundary) > $FILLUNIT;
return bless $self,ref $package || $package;
}
END_OF_FUNC
'readHeader' => <<'END_OF_FUNC',
sub readHeader {
my($self) = @_;
my($end);
my($ok) = 0;
my($bad) = 0;
do {
$self->fillBuffer($FILLUNIT);
$ok++ if ($end = index($self->{BUFFER},"${CRLF}${CRLF}")) >= 0;
$ok++ if $self->{BUFFER} eq '';
$bad++ if !$ok && $self->{LENGTH} <= 0;
$FILLUNIT *= 2 if length($self->{BUFFER}) >= $FILLUNIT;
} until $ok || $bad;
return () if $bad;
my($header) = substr($self->{BUFFER},0,$end+2);
substr($self->{BUFFER},0,$end+4) = '';
my %return;
while ($header=~/^([\w-]+): (.*)$CRLF/mog) {
$return{$1}=$2;
}
return %return;
}
END_OF_FUNC
# This reads and returns the body as a single scalar value.
'readBody' => <<'END_OF_FUNC',

sub readBody {
my($self) = @_;
my($data);
my($returnval)='';
while (defined($data = $self->read)) {
$returnval .= $data;
}
return $returnval;
}
END_OF_FUNC
# This will read $bytes or until the boundary is hit, whichever happens
# first. After the boundary is hit, we return undef. The next read will
# skip over the boundary and begin reading again;
'read' => <<'END_OF_FUNC',
sub read {
my($self,$bytes) = @_;
# default number of bytes to read
$bytes = $bytes || $FILLUNIT;
# Fill up our internal buffer in such a way that the boundary
# is never split between reads.
$self->fillBuffer($bytes);
# Find the boundary in the buffer (it may not be there).
my $start = index($self->{BUFFER},$self->{BOUNDARY});
# protect against malformed multipart POST operations
die "Malformed multipart POST\n" unless ($start >= 0) || ($self->{LENGTH} > 0);
# If the boundary begins the data, then skip past it
# and return undef. The +2 here is a fiendish plot to
# remove the CR/LF pair at the end of the boundary.
if ($start == 0) {
# clear us out completely if we've hit the last boundary.
if (index($self->{BUFFER},"$self->{BOUNDARY}--")==0) {
$self->{BUFFER}='';
$self->{LENGTH}=0;
return undef;
}
# just remove the boundary.
substr($self->{BUFFER},0,length($self->{BOUNDARY})+2)='';
return undef;
}
my $bytesToReturn;
if ($start > 0) {
# read up to the boundary
$bytesToReturn = $start > $bytes ? $bytes : $start;
} else {
# read the requested number of bytes
# leave enough bytes in the buffer to allow us to read
# the boundary. Thanks to Kevin Hendrick for finding
# this one.
$bytesToReturn = $bytes - (length($self->{BOUNDARY})+1);
}
my $returnval=substr($self->{BUFFER},0,$bytesToReturn);
substr($self->{BUFFER},0,$bytesToReturn)='';
# If we hit the boundary, remove the CRLF from the end.
return ($start > 0) ? substr($returnval,0,-2) : $returnval;

}
END_OF_FUNC
# This fills up our internal buffer in such a way that the
# boundary is never split between reads
'fillBuffer' => <<'END_OF_FUNC',
sub fillBuffer {
my($self,$bytes) = @_;
return unless $self->{LENGTH};
my($boundaryLength) = length($self->{BOUNDARY});
my($bufferLength) = length($self->{BUFFER});
my($bytesToRead) = $bytes - $bufferLength + $boundaryLength + 2;
$bytesToRead = $self->{LENGTH} if $self->{LENGTH} < $bytesToRead;
# Try to read some data. We may hang here if the browser is screwed up.
my $bytesRead = $self->{INTERFACE}->read_from_client($self->{IN},
\$self->{BUFFER},
$bytesToRead,
$bufferLength);
# An apparent bug in the Apache server causes the read()
# to return zero bytes repeatedly without blocking if the
# remote user aborts during a file transfer. I don't know how
# they manage this, but the workaround is to abort if we get
# more than SPIN_LOOP_MAX consecutive zero reads.
if ($bytesRead == 0) {
die "CGI.pm: Server closed socket during multipart read (client aborted?).\n"
if ($self->{ZERO_LOOP_COUNTER}++ >= $SPIN_LOOP_MAX);
} else {
$self->{ZERO_LOOP_COUNTER}=0;
}
$self->{LENGTH} -= $bytesRead;
}
END_OF_FUNC
# Return true when we've finished reading
'eof' => <<'END_OF_FUNC'
sub eof {
my($self) = @_;
return 1 if (length($self->{BUFFER}) == 0)
&& ($self->{LENGTH} <= 0);
undef;
}
END_OF_FUNC
);
END_OF_AUTOLOAD
####################################################################################
################################## TEMPORARY FILES #################################
####################################################################################
package TempFile;
$SL = $CGI::SL;
unless ($TMPDIRECTORY) {
@TEMP=("${SL}usr${SL}tmp","${SL}var${SL}tmp","${SL}tmp","${SL}temp","${SL}Temporary Items");
foreach (@TEMP) {
do {$TMPDIRECTORY = $_; last} if -d $_ && -w _;

}
}
$TMPDIRECTORY = "." unless $TMPDIRECTORY;
$SEQUENCE="CGItemp${$}0000";
# cute feature, but overload implementation broke it
# %OVERLOAD = ('""'=>'as_string');
*TempFile::AUTOLOAD = \&CGI::AUTOLOAD;
###############################################################################
###############################################################################
# prevent -w error
%SUBS = (
'new' => <<'END_OF_FUNC',
sub new {
my($package) = @_;
$SEQUENCE++;
my $directory = "${TMPDIRECTORY}${SL}${SEQUENCE}";
return bless \$directory;
}
END_OF_FUNC
'DESTROY' => <<'END_OF_FUNC',
sub DESTROY {
my($self) = @_;
unlink $$self;
}
END_OF_FUNC
# get rid of the file
'as_string' => <<'END_OF_FUNC'

sub as_string {
my($self) = @_;
return $$self;
}
END_OF_FUNC
);
END_OF_AUTOLOAD
package CGI;
# We get a whole bunch of warnings about "possibly uninitialized variables"
# when running with the -w switch. Touch them all once to get rid of the
# warnings. This is ugly and I hate it.
if ($^W) {
$CGI::CGI = '';
$CGI::CGI=<<EOF;
$CGI::VERSION;
$MultipartBuffer::SPIN_LOOP_MAX;
$MultipartBuffer::CRLF;
$MultipartBuffer::TIMEOUT;
$MultipartBuffer::FILLUNIT;
$TempFile::SEQUENCE;
EOF
;
}
$revision;
__END__
=head1 NAME
CGI - Simple Common Gateway Interface Class
=head1 SYNOPSIS
use CGI;
# the rest is too complicated for a synopsis; keep reading
=head1 ABSTRACT
This perl library uses perl5 objects to make it easy to create
Web fill-out forms and parse their contents. This package
defines CGI objects, entities that contain the values of the
current query string and other state variables.
Using a CGI object's methods, you can examine keywords and parameters
passed to your script, and create forms whose initial values
are taken from the current query (thereby preserving state
information).
The current version of CGI.pm is available at
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
ftp://ftp-genome.wi.mit.edu/pub/software/WWW/
=head1 INSTALLATION:
To install this package, just change to the directory in which this
file is found and type the following:
perl Makefile.PL
make
make install
This will copy CGI.pm to your perl library directory for use by all
perl scripts. You probably must be root to do this.
Now you can
load the CGI routines in your Perl scripts with the line:
use CGI;
If you don't have sufficient privileges to install CGI.pm in the Perl
library directory, you can put CGI.pm into some convenient spot, such
as your home directory, or in cgi-bin itself and prefix all Perl
scripts that call it with something along the lines of the following
preamble:
use lib '/home/davis/lib';
use CGI;
If you are using a version of perl earlier than 5.002 (such as NT perl), use
this instead:
BEGIN {
unshift(@INC,'/home/davis/lib');
}
use CGI;
The CGI distribution also comes with a cute module called L<CGI::Carp>.
It redefines the die(), warn(), confess() and croak() error routines
so that they write nicely formatted error messages into the server's
error log (or to the output stream of your choice). This avoids long
hours of groping through the error and access logs, trying to figure
out which CGI script is generating error messages. If you choose,
you can even have fatal error messages echoed to the browser to avoid
the annoying and uninformative "Server Error" message.
=head1 DESCRIPTION
=head2 CREATING A NEW QUERY OBJECT:
$query = new CGI;
This will parse the input (from both POST and GET methods) and store
it into a perl5 object called $query.
=head2 CREATING A NEW QUERY OBJECT FROM AN INPUT FILE
$query = new CGI(INPUTFILE);
If you provide a file handle to the new() method, it
will read parameters from the file (or STDIN, or whatever). The
file can be in any of the forms describing below under debugging
(i.e. a series of newline delimited TAG=VALUE pairs will work).
Conveniently, this type of file is created by the save() method
(see below). Multiple records can be saved and restored.
Perl purists will be pleased to know that this syntax accepts
references to file handles, or even references to filehandle globs,
which is the "official" way to pass a filehandle:
$query = new CGI(\*STDIN);
You can also initialize the query object from an associative array
reference:
$query = new CGI( {'dinosaur'=>'barney',
'song'=>'I love you',
'friends'=>[qw/Jessica George Nancy/]}
);
or from a properly formatted, URL-escaped query string:
$query = new CGI('dinosaur=barney&color=purple');
To create an empty query, initialize it from an empty string or hash:
$empty_query = new CGI("");
-or$empty_query = new CGI({});
=head2 FETCHING A LIST OF KEYWORDS FROM THE QUERY:
@keywords = $query->keywords
If the script was invoked as the result of an <ISINDEX> search, the
parsed keywords can be obtained as an array using the keywords() method.
=head2 FETCHING THE NAMES OF ALL THE PARAMETERS PASSED TO YOUR SCRIPT:
@names = $query->param
If the script was invoked with a parameter list
(e.g. "name1=value1&name2=value2&name3=value3"), the param()
method will return the parameter names as a list. If the
script was invoked as an <ISINDEX> script, there will be a

single parameter named 'keywords'.
NOTE: As of version 1.5, the array of parameter names returned will
be in the same order as they were submitted by the browser.
Usually this order is the same as the order in which the
parameters are defined in the form (however, this isn't part
of the spec, and so isn't guaranteed).
=head2 FETCHING THE VALUE OR VALUES OF A SINGLE NAMED PARAMETER:
@values = $query->param('foo');
-or$value = $query->param('foo');
Pass the param() method a single argument to fetch the value of the
named parameter. If the parameter is multivalued (e.g. from multiple
selections in a scrolling list), you can ask to receive an array. Otherwise
the method will return a single value.
=head2 SETTING THE VALUE(S) OF A NAMED PARAMETER:
$query->param('foo','an','array','of','values');
This sets the value for the named parameter 'foo' to an array of
values. This is one way to change the value of a field AFTER
the script has been invoked once before. (Another way is with
the -override parameter accepted by all methods that generate
form elements.)
param() also recognizes a named parameter style of calling described
in more detail later:
$query->param(-name=>'foo',-values=>['an','array','of','values']);
-or$query->param(-name=>'foo',-value=>'the value');
=head2 APPENDING ADDITIONAL VALUES TO A NAMED PARAMETER:
$query->append(-name=>;'foo',-values=>['yet','more','values']);
This adds a value or list of values to the named parameter. The
values are appended to the end of the parameter if it already exists.
Otherwise the parameter is created. Note that this method only
recognizes the named argument calling syntax.
=head2 IMPORTING ALL PARAMETERS INTO A NAMESPACE:
$query->import_names('R');
This creates a series of variables in the 'R' namespace. For example,
$R::foo, @R:foo. For keyword lists, a variable @R::keywords will appear.
If no namespace is given, this method will assume 'Q'.
WARNING: don't import anything into 'main'; this is a major security
risk!!!!
In older versions, this method was called B<import()>. As of version 2.20,
this name has been removed completely to avoid conflict with the built-in
Perl module B<import> operator.
=head2 DELETING A PARAMETER COMPLETELY:

$query->delete('foo');
This completely clears a parameter. It sometimes useful for
resetting parameters that you don't want passed down between
script invocations.
=head2 DELETING ALL PARAMETERS:
$query->delete_all();
This clears the CGI object completely. It might be useful to ensure
that all the defaults are taken when you create a fill-out form.
=head2 SAVING THE STATE OF THE FORM TO A FILE:
$query->save(FILEHANDLE)
This will write the current state of the form to the provided
filehandle. You can read it back in by providing a filehandle
to the new() method. Note that the filehandle can be a file, a pipe,
or whatever!
The format of the saved file is:
NAME1=VALUE1
NAME1=VALUE1'
NAME2=VALUE2
NAME3=VALUE3
=
Both name and value are URL escaped. Multi-valued CGI parameters are
represented as repeated names. A session record is delimited by a
single = symbol. You can write out multiple records and read them
back in with several calls to B<new>. You can do this across several
sessions by opening the file in append mode, allowing you to create
primitive guest books, or to keep a history of users' queries. Here's
a short example of creating multiple session records:
use CGI;
open (OUT,">>test.out") || die;
$records = 5;
foreach (0..$records) {
my $q = new CGI;
$q->param(-name=>'counter',-value=>$_);
$q->save(OUT);
}
close OUT;
# reopen for reading
open (IN,"test.out") || die;
while (!eof(IN)) {
my $q = new CGI(IN);
print $q->param('counter'),"\n";
}
The file format used for save/restore is identical to that used by the
Whitehead Genome Center's data exchange format "Boulderio", and can be
manipulated and even databased using Boulderio utilities. See
http://www.genome.wi.mit.edu/genome_software/other/boulder.html
for further details.

=head2 CREATING A SELF-REFERENCING URL THAT PRESERVES STATE INFORMATION:
$myself = $query->self_url;
print "<A HREF=$myself>I'm talking to myself.</A>";
self_url() will return a URL, that, when selected, will reinvoke
this script with all its state information intact. This is most
useful when you want to jump around within the document using
internal anchors but you don't want to disrupt the current contents
of the form(s). Something like this will do the trick.
$myself =
print "<A
print "<A
print "<A
$query->self_url;
HREF=$myself#table1>See table 1</A>";
HREF=$myself#table2>See table 2</A>";
HREF=$myself#yourself>See for yourself</A>";
If you don't want to get the whole query string, call

the method url() to return just the URL for the script:
$myself = $query->url;
print "<A HREF=$myself>No query string in this baby!</A>\n";
You can also retrieve the unprocessed query string with query_string():
$the_string = $query->query_string;
=head2 COMPATIBILITY WITH CGI-LIB.PL
To make it easier to port existing programs that use cgi-lib.pl
the compatibility routine "ReadParse" is provided. Porting is
simple:
OLD VERSION
require "cgi-lib.pl";
&ReadParse;
print "The value of the antique is $in{antique}.\n";
NEW VERSION
use CGI;
CGI::ReadParse
print "The value of the antique is $in{antique}.\n";
CGI.pm's ReadParse() routine creates a tied variable named %in,
which can be accessed to obtain the query variables. Like
ReadParse, you can also provide your own variable. Infrequently
used features of ReadParse, such as the creation of @in and $in
variables, are not supported.
Once you use ReadParse, you can retrieve the query object itself
this way:
$q = $in{CGI};
print $q->textfield(-name=>'wow',
-value=>'does this really work?');
This allows you to start using the more interesting features
of CGI.pm without rewriting your old scripts from scratch.
=head2 CALLING CGI FUNCTIONS THAT TAKE MULTIPLE ARGUMENTS
In versions of CGI.pm prior to 2.0, it could get difficult to remember
the proper order of arguments in CGI function calls that accepted five
or six different arguments. As of 2.0, there's a better way to pass
arguments to the various CGI functions. In this style, you pass a
series of name=>argument pairs, like this:
$field = $query->radio_group(-name=>'OS',
-values=>[Unix,Windows,Macintosh],
-default=>'Unix');
The advantages of this style are that you don't have to remember the
exact order of the arguments, and if you leave out a parameter, in
most cases it will default to some reasonable value. If you provide
a parameter that the method doesn't recognize, it will usually do
something useful with it, such as incorporating it into the HTML form
tag. For example if Netscape decides next week to add a new
JUSTIFICATION parameter to the text field tags, you can start using
the feature without waiting for a new version of CGI.pm:
$field = $query->textfield(-name=>'State',
-default=>'gaseous',
-justification=>'RIGHT');
This will result in an HTML tag that looks like this:
<INPUT TYPE="textfield" NAME="State" VALUE="gaseous"
JUSTIFICATION="RIGHT">
Parameter names are case insensitive: you can use -name, or -Name or
-NAME. You don't have to use the hyphen if you don't want to. After
creating a CGI object, call the B<use_named_parameters()> method with
a nonzero value. This will tell CGI.pm that you intend to use named
parameters exclusively:
$query = new CGI;
$query->use_named_parameters(1);
$field = $query->radio_group('name'=>'OS',
'values'=>['Unix','Windows','Macintosh'],
'default'=>'Unix');
Actually, CGI.pm only looks for a hyphen in the first parameter. So
you can leave it off subsequent parameters if you like. Something to
be wary of is the potential that a string constant like "values" will
collide with a keyword (and in fact it does!) While Perl usually
figures out when you're referring to a function and when you're
referring to a string, you probably should put quotation marks around
all string constants just to play it safe.
=head2 CREATING THE HTTP HEADER:
print $query->header;
-orprint $query->header('image/gif');
-orprint $query->header('text/html','204 No response');
-orprint $query->header(-type=>'image/gif',
-nph=>1,
-status=>'402 Payment required',

-expires=>'+3d',
-cookie=>$cookie,
-Cost=>'$2.00');
header() returns the Content-type: header. You can provide your own
MIME type if you choose, otherwise it defaults to text/html. An
optional second parameter specifies the status code and a human-readable
message. For example, you can specify 204, "No response" to create a
script that tells the browser to do nothing at all. If you want to
add additional fields to the header, just tack them on to the end:
print $query->header('text/html','200 OK','Content-Length: 3002');
The last example shows the named argument style for passing arguments
to the CGI methods using named parameters. Recognized parameters are
B<-type>, B<-status>, B<-expires>, and B<-cookie>. Any other
parameters will be stripped of their initial hyphens and turned into
header fields, allowing you to specify any HTTP header you desire.
Most browsers will not cache the output from CGI scripts. Every time
the browser reloads the page, the script is invoked anew. You can
change this behavior with the B<-expires> parameter. When you specify
an absolute or relative expiration interval with this parameter, some
browsers and proxy servers will cache the script's output until the
indicated expiration date. The following forms are all valid for the
-expires field:
+30s
+10m
+1h
-1d
now
+3M
+10y
Thursday, 25-Apr-96 00:40:33 GMT
30 seconds from now

ten minutes from now
one hour from now
yesterday (i.e. "ASAP!")
immediately
in three months
in ten years time
at the indicated time & date
(CGI::expires() is the static function call used internally that turns

relative time intervals into HTTP dates. You can call it directly if
you wish.)
The B<-cookie> parameter generates a header that tells the browser to provide
a "magic cookie" during all subsequent transactions with your script.
Netscape cookies have a special format that includes interesting attributes
such as expiration time. Use the cookie() method to create and retrieve
session cookies.
The B<-nph> parameter, if set to a true value, will issue the correct
headers to work with a NPH (no-parse-header) script. This is important
to use with certain servers, such as Microsoft Internet Explorer, which
expect all their scripts to be NPH.
=head2 GENERATING A REDIRECTION INSTRUCTION
print $query->redirect('http://somewhere.else/in/movie/land');
redirects the browser elsewhere. If you use redirection like this,
you should B<not> print out a header as well. As of version 2.0, we
produce both the unofficial Location: header and the official URI:
header. This should satisfy most servers and browsers.
One hint I can offer is that relative links may not work correctly
when you generate a redirection to another document on your site.
This is due to a well-intentioned optimization that some servers use.

The solution to this is to use the full URL (including the http: part)
of the document you are redirecting to.
You can use named parameters:
print $query->redirect(-uri=>'http://somewhere.else/in/movie/land',
-nph=>1);
The B<-nph> parameter, if set to a true value, will issue the correct
headers to work with a NPH (no-parse-header) script. This is important
to use with certain servers, such as Microsoft Internet Explorer, which
expect all their scripts to be NPH.
=head2 CREATING THE HTML HEADER:
print $query->start_html(-title=>'Secrets of the Pyramids',
-author=>'fred@capricorn.org',
-base=>'true',
-target=>'_blank',
-meta=>{'keywords'=>'pharaoh secret mummy',
'copyright'=>'copyright 1996 King Tut'},
-BGCOLOR=>'blue');
-orprint $query->start_html('Secrets of the Pyramids',
'fred@capricorn.org','true',
'BGCOLOR="blue"');
This will return a canned HTML header and the opening <BODY> tag.
All parameters are optional.
In the named parameter form, recognized
parameters are -title, -author, -base, -xbase and -target (see below for the
explanation). Any additional parameters you provide, such as the
Netscape unofficial BGCOLOR attribute, are added to the <BODY> tag.
The argument B<-xbase> allows you to provide an HREF for the <BASE> tag
different from the current location, as in
-xbase=>"http://home.mcom.com/"
All relative links will be interpreted relative to this tag.
The argument B<-target> allows you to provide a default target frame
for all the links and fill-out forms on the page. See the Netscape
documentation on frames for details of how to manipulate this.
-target=>"answer_window"
All relative links will be interpreted relative to this tag.
You add arbitrary meta information to the header with the B<-meta>
argument. This argument expects a reference to an associative array
containing name/value pairs of meta information. These will be turned
into a series of header <META> tags that look something like this:
<META NAME="keywords" CONTENT="pharaoh secret mummy">
<META NAME="description" CONTENT="copyright 1996 King Tut">
There is no support for the HTTP-EQUIV type of <META> tag. This is
because you can modify the HTTP header directly with the B<header()>
method.
JAVASCRIPTING: The B<-script>, B<-noScript>, B<-onLoad> and B<-onUnload> parameters

are used to add Netscape JavaScript calls to your pages. B<-script>
should point to a block of text containing JavaScript function
definitions. This block will be placed within a <SCRIPT> block inside
the HTML (not HTTP) header. The block is placed in the header in
order to give your page a fighting chance of having all its JavaScript
functions in place even if the user presses the stop button before the
page has loaded completely. CGI.pm attempts to format the script in
such a way that JavaScript-naive browsers will not choke on the code:
unfortunately there are some browsers, such as Chimera for Unix, that
get confused by it nevertheless.
The B<-onLoad> and B<-onUnload> parameters point to fragments of JavaScript
code to execute when the page is respectively opened and closed by the
browser. Usually these parameters are calls to functions defined in the
B<-script> field:
$query = new CGI;
$JSCRIPT=<<END;
// Ask a silly question
function riddle_me_this() {
var r = prompt("What walks on four legs in the morning, " +
"two legs in the afternoon, " +
"and three legs in the evening?");
response(r);
}
// Get a silly answer
function response(answer) {
if (answer == "man")
alert("Right you are!");
else
alert("Wrong! Guess again.");
}
END
print $query->start_html(-title=>'The Riddle of the Sphinx',
-script=>$JSCRIPT);
Use the B<-noScript> parameter to pass some HTML text that will be displayed on
browsers that do not have JavaScript (or browsers where JavaScript is turned
off).
See
http://home.netscape.com/eng/mozilla/2.0/handbook/javascript/
for more information about JavaScript.
The old-style positional parameters are as follows:
=over 4
=item B<Parameters:>
=item 1.
The title
=item 2.
The author's e-mail address (will create a <LINK REV="MADE"> tag if present
=item 3.
A 'true' flag if you want to include a <BASE> tag in the header. This
helps resolve relative addresses to absolute ones when the document is moved,
but makes the document hierarchy non-portable. Use with care!
=item 4, 5, 6...
Any other parameters you want to include in the <BODY> tag. This is a good
place to put Netscape extensions, such as colors and wallpaper patterns.
=back
=head2 ENDING THE HTML DOCUMENT:
print $query->end_html
This ends an HTML document by printing the </BODY></HTML> tags.
=head1 CREATING FORMS:
I<General note> The various form-creating methods all return strings
to the caller, containing the tag or tags that will create the requested
form element. You are responsible for actually printing out these strings.
It's set up this way so that you can place formatting tags
around the form elements.
I<Another note> The default values that you specify for the forms are only
used the B<first> time the script is invoked (when there is no query
string). On subsequent invocations of the script (when there is a query
string), the former values are used even if they are blank.
If you want to change the value of a field from its previous value, you have two
choices:
(1) call the param() method to set it.
(2) use the -override (alias -force) parameter (a new feature in version 2.15).
This forces the default value to be used, regardless of the previous value:
print $query->textfield(-name=>'field_name',
-default=>'starting value',
-override=>1,
-size=>50,
-maxlength=>80);
I<Yet another note> By default, the text and labels of form elements are
escaped according to HTML rules. This means that you can safely use
"<CLICK ME>" as the label for a button. However, it also interferes with
your ability to incorporate special HTML character sequences, such as Á,
into your fields. If you wish to turn off automatic escaping, call the
autoEscape() method with a false value immediately after creating the CGI object:
$query = new CGI;
$query->autoEscape(undef);
=head2 CREATING AN ISINDEX TAG
print $query->isindex(-action=>$action);
-orprint $query->isindex($action);
Prints out an <ISINDEX> tag. Not very exciting. The parameter

-action specifies the URL of the script to process the query. The
default is to process the query with the current script.
=head2 STARTING AND ENDING A FORM
print $query->startform(-method=>$method,
-action=>$action,
-encoding=>$encoding);
<... various form stuff ...>
print $query->endform;
-orprint $query->startform($method,$action,$encoding);
<... various form stuff ...>
print $query->endform;
startform() will return a <FORM> tag with the optional method,
action and form encoding that you specify. The defaults are:
method: POST
action: this script
encoding: application/x-www-form-urlencoded
endform() returns the closing </FORM> tag.
Startform()'s encoding method tells the browser how to package the various
fields of the form before sending the form to the server. Two
values are possible:
=over 4
=item B<application/x-www-form-urlencoded>
This is the older type of encoding used by all browsers prior to
Netscape 2.0. It is compatible with many CGI scripts and is
suitable for short fields containing text data. For your
convenience, CGI.pm stores the name of this encoding
type in B<$CGI::URL_ENCODED>.
=item B<multipart/form-data>
This is the newer type of encoding introduced by
It is suitable for forms that contain very large
are intended for transferring binary data. Most
it enables the "file upload" feature of Netscape
your convenience, CGI.pm stores the name of this
in B<$CGI::MULTIPART>
Netscape 2.0.
fields or that
importantly,
2.0 forms. For
encoding type
Forms that use this type of encoding are not easily interpreted
by CGI scripts unless they use CGI.pm or another library designed
to handle them.
=back
For compatibility, the startform() method uses the older form of
encoding by default. If you want to use the newer form of encoding
by default, you can call B<start_multipart_form()> instead of
B<startform()>.
JAVASCRIPTING: The B<-name> and B<-onSubmit> parameters are provided
for use with JavaScript. The -name parameter gives the
form a name so that it can be identified and manipulated by

JavaScript functions. -onSubmit should point to a JavaScript
function that will be executed just before the form is submitted to your
server. You can use this opportunity to check the contents of the form
for consistency and completeness. If you find something wrong, you
can put up an alert box or maybe fix things up yourself. You can
abort the submission by returning false from this function.
Usually the bulk of JavaScript functions are defined in a <SCRIPT>
block in the HTML header and -onSubmit points to one of these function
call. See start_html() for details.
=head2 CREATING A TEXT FIELD
-size=>50,
-maxlength=>80);
-orprint $query->textfield('field_name','starting value',50,80);
textfield() will return a text input field.
=over 4
=item B<Parameters>
=item 1.
The first parameter is the required name for the field (-name).
=item 2.
The optional second parameter is the default starting value for the field
contents (-default).
=item 3.
The optional third parameter is the size of the field in
characters (-size).
=item 4.
The optional fourth parameter is the maximum number of characters the
field will accept (-maxlength).
=back
As with all these methods, the field will be initialized with its
previous contents from earlier invocations of the script.
When the form is processed, the value of the text field can be
retrieved with:
$value = $query->param('foo');
If you want to reset it from its initial value after the script has been
called once, you can do so like this:
$query->param('foo',"I'm taking over this value!");
NEW AS OF VERSION 2.15: If you don't want the field to take on its previous
value, you can force its current value by using the -override (alias -force)
parameter:
-override=>1,
-size=>50,
-maxlength=>80);
JAVASCRIPTING: You can also provide B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters to register JavaScript event handlers.
The onChange handler will be called whenever the user changes the
contents of the text field. You can do text validation if you like.
onFocus and onBlur are called respectively when the insertion point
moves into and out of the text field. onSelect is called when the
user changes the portion of the text that is selected.
=head2 CREATING A BIG TEXT FIELD
print $query->textarea(-name=>'foo',
-rows=>10,
-columns=>50);
-or
print $query->textarea('foo','starting value',10,50);
textarea() is just like textfield, but it allows you to specify
rows and columns for a multiline text entry box. You can provide
a starting value for the field, which can be long and contain
multiple lines.
JAVASCRIPTING: The B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters are recognized. See textfield().
=head2 CREATING A PASSWORD FIELD
print $query->password_field(-name=>'secret',
-value=>'starting value',
-size=>50,
-maxlength=>80);
-orprint $query->password_field('secret','starting value',50,80);
password_field() is identical to textfield(), except that its contents
will be starred out on the web page.
and B<-onSelect> parameters are recognized. See textfield().
=head2 CREATING A FILE UPLOAD FIELD
print $query->filefield(-name=>'uploaded_file',
-size=>50,
-maxlength=>80);
-orprint $query->filefield('uploaded_file','starting value',50,80);
filefield() will return a file upload field for Netscape 2.0 browsers.
In order to take full advantage of this I<you must use the new
multipart encoding scheme> for the form. You can do this either
by calling B<startform()> with an encoding type of B<$CGI::MULTIPART>,

or by calling the new method B<start_multipart_form()> instead of
vanilla B<startform()>.
=over 4
=item B<Parameters>
=item 1.
The first parameter is the required name for the field (-name).
=item 2.
The optional second parameter is the starting value for the field contents
to be used as the default file name (-default).
The beta2 version of Netscape 2.0 currently doesn't pay any attention
to this field, and so the starting value will always be blank. Worse,
the field loses its "sticky" behavior and forgets its previous
contents. The starting value field is called for in the HTML
specification, however, and possibly later versions of Netscape will
honor it.
=item 3.
The optional third parameter is the size of the field in
characters (-size).
=item 4.
The optional fourth parameter is the maximum number of characters the
field will accept (-maxlength).
=back
When the form is processed, you can retrieve the entered filename
by calling param().
$filename = $query->param('uploaded_file');
In Netscape Gold, the filename that gets returned is the full local filename
on the B<remote user's> machine. If the remote user is on a Unix
machine, the filename will follow Unix conventions:
/path/to/the/file
On an MS-DOS/Windows and OS/2 machines, the filename will follow DOS conventions:
C:\PATH\TO\THE\FILE.MSW
On a Macintosh machine, the filename will follow Mac conventions:
HD 40:Desktop Folder:Sort Through:Reminders
The filename returned is also a file handle. You can read the contents
of the file using standard Perl file reading calls:
# Read a text file and print it out
while (<$filename>) {
print;
}
# Copy a binary file to somewhere safe
open (OUTFILE,">>/usr/local/web/users/feedback");
while ($bytesread=read($filename,$buffer,1024)) {
print OUTFILE $buffer;
}
When a file is uploaded the browser usually sends along some
information along with it in the format of headers. The information
usually includes the MIME content type. Future browsers may send
other information as well (such as modification date and size). To
retrieve this information, call uploadInfo(). It returns a reference to
an associative array containing all the document headers.
$filename = $query->param('uploaded_file');
$type = $query->uploadInfo($filename)->{'Content-Type'};
unless ($type eq 'text/html') {
die "HTML FILES ONLY!";
}
If you are using a machine that recognizes "text" and "binary" data
modes, be sure to understand when and how to use them (see the Camel book).
Otherwise you may find that binary files are corrupted during file uploads.
and B<-onSelect> parameters are recognized. See textfield()
for details.
=head2 CREATING A POPUP MENU
print $query->popup_menu('menu_name',
['eenie','meenie','minie'],
'meenie');
-or%labels = ('eenie'=>'your first choice',
'meenie'=>'your second choice',
'minie'=>'your third choice');
print $query->popup_menu('menu_name',
['eenie','meenie','minie'],
'meenie',\%labels);
-or (named parameter style)print $query->popup_menu(-name=>'menu_name',
-values=>['eenie','meenie','minie'],
-default=>'meenie',
-labels=>\%labels);
popup_menu() creates a menu.
=over 4
=item 1.
The required first argument is the menu's name (-name).
=item 2.
The required second
containing the list
method an anonymous
a named array, such
argument (-values) is an array B<reference>

of menu items in the menu. You can pass the
array, as shown in the example, or a reference to
as "\@foo".
=item 3.
The optional third parameter (-default) is the name of the default
menu choice. If not specified, the first item will be the default.
The values of the previous choice will be maintained across queries.
=item 4.
The optional fourth parameter (-labels) is provided for people who
want to use different values for the user-visible label inside the
popup menu nd the value returned to your script. It's a pointer to an
associative array relating menu values to user-visible labels. If you
leave this parameter blank, the menu values will be displayed by
default. (You can also leave a label undefined if you want to).
=back
When the form is processed, the selected value of the popup menu can
be retrieved using:
$popup_menu_value = $query->param('menu_name');
JAVASCRIPTING: popup_menu() recognizes the following event handlers:
B<-onChange>, B<-onFocus>, and B<-onBlur>. See the textfield()
section for details on when these handlers are called.
=head2 CREATING A SCROLLING LIST
print $query->scrolling_list('list_name',
['eenie','meenie','minie','moe'],
['eenie','moe'],5,'true');
-orprint $query->scrolling_list('list_name',
['eenie','moe'],5,'true',
\%labels);
-orprint $query->scrolling_list(-name=>'list_name',
-values=>['eenie','meenie','minie','moe'],
-default=>['eenie','moe'],
-size=>5,
-multiple=>'true',
-labels=>\%labels);
scrolling_list() creates a scrolling list.
=over 4
=item 1.
The first and second arguments are the list name (-name) and values
(-values). As in the popup menu, the second argument should be an
array reference.
=item 2.
The optional third argument (-default) can be either a reference to a
list containing the values to be selected by default, or can be a
single value to select. If this argument is missing or undefined,
then nothing is selected when the list first appears. In the named
parameter version, you can use the synonym "-defaults" for this
parameter.
=item 3.
The optional fourth argument is the size of the list (-size).
=item 4.
The optional fifth argument can be set to true to allow multiple
simultaneous selections (-multiple). Otherwise only one selection
will be allowed at a time.
=item 5.
The optional sixth argument is a pointer to an associative array
containing long user-visible labels for the list items (-labels).
If not provided, the values will be displayed.
When this form is processed, all selected list items will be returned as
a list under the parameter name 'list_name'. The values of the
selected items can be retrieved with:
@selected = $query->param('list_name');
=back
JAVASCRIPTING: scrolling_list() recognizes the following event handlers:
B<-onChange>, B<-onFocus>, and B<-onBlur>. See textfield() for
the description of when these handlers are called.
=head2 CREATING A GROUP OF RELATED CHECKBOXES
print $query->checkbox_group(-name=>'group_name',
-default=>['eenie','moe'],
-linebreak=>'true',
-labels=>\%labels);
print $query->checkbox_group('group_name',
['eenie','moe'],'true',\%labels);
HTML3-COMPATIBLE BROWSERS ONLY:
print $query->checkbox_group(-name=>'group_name',
-rows=2,-columns=>2);
checkbox_group() creates a list of checkboxes that are related
by the same name.
=over 4
=item 1.
The first and second arguments are the checkbox name and values,
respectively (-name and -values). As in the popup menu, the second
argument should be an array reference. These values are used for the
user-readable labels printed next to the checkboxes as well as for the
values passed to your script in the query string.

=item 2.
The optional third argument (-default) can be either a reference to a
list containing the values to be checked by default, or can be a
single value to checked. If this argument is missing or undefined,
then nothing is selected when the list first appears.
=item 3.
The optional fourth argument (-linebreak) can be set to true to place
line breaks between the checkboxes so that they appear as a vertical
list. Otherwise, they will be strung together on a horizontal line.
=item 4.
The optional fifth argument is a pointer to an associative array
relating the checkbox values to the user-visible labels that will
be printed next to them (-labels). If not provided, the values will
be used as the default.
=item 5.
B<HTML3-compatible browsers> (such as Netscape) can take advantage
of the optional
parameters B<-rows>, and B<-columns>. These parameters cause
checkbox_group() to return an HTML3 compatible table containing
the checkbox group formatted with the specified number of rows
and columns. You can provide just the -columns parameter if you
wish; checkbox_group will calculate the correct number of rows
for you.
To include row and column headings in the returned table, you
can use the B<-rowheader> and B<-colheader> parameters. Both
of these accept a pointer to an array of headings to use.
The headings are just decorative. They don't reorganize the
interpretation of the checkboxes -- they're still a single named
unit.
=back
When the form is processed, all checked boxes will be returned as
a list under the parameter name 'group_name'. The values of the
"on" checkboxes can be retrieved with:
@turned_on = $query->param('group_name');
The value returned by checkbox_group() is actually an array of button
elements. You can capture them and use them within tables, lists,
or in other creative ways:
@h = $query->checkbox_group(-name=>'group_name',-values=>\@values);
&use_in_creative_way(@h);
JAVASCRIPTING: checkbox_group() recognizes the B<-onClick>
parameter. This specifies a JavaScript code fragment or
function call to be executed every time the user clicks on
any of the buttons in the group. You can retrieve the identity
of the particular button clicked on using the "this" variable.
=head2 CREATING A STANDALONE CHECKBOX
print $query->checkbox(-name=>'checkbox_name',
-checked=>'checked',
-value=>'ON',
-label=>'CLICK ME');
-orprint $query->checkbox('checkbox_name','checked','ON','CLICK ME');
checkbox() is used to create an isolated checkbox that isn't logically
related to any others.
=over 4
=item 1.
The first parameter is the required name for the checkbox (-name).
will also be used for the user-readable label printed next to the
checkbox.
It
=item 2.
The optional second parameter (-checked) specifies that the checkbox
is turned on by default. Synonyms are -selected and -on.
=item 3.
The optional third parameter (-value) specifies the value of the
checkbox when it is checked. If not provided, the word "on" is
assumed.
=item 4.
The optional fourth parameter (-label) is the user-readable label to
be attached to the checkbox. If not provided, the checkbox name is
used.
=back
The value of the checkbox can be retrieved using:
$turned_on = $query->param('checkbox_name');
JAVASCRIPTING: checkbox() recognizes the B<-onClick>
parameter. See checkbox_group() for further details.
=head2 CREATING A RADIO BUTTON GROUP
print $query->radio_group(-name=>'group_name',
-values=>['eenie','meenie','minie'],
-default=>'meenie',
-linebreak=>'true',
-labels=>\%labels);
-orprint $query->radio_group('group_name',['eenie','meenie','minie'],
'meenie','true',\%labels);
HTML3-COMPATIBLE BROWSERS ONLY:
print $query->radio_group(-name=>'group_name',
-rows=2,-columns=>2);
radio_group() creates a set of logically-related radio buttons
(turning one member of the group on turns the others off)
=over 4
=item 1.
The first argument is the name of the group and is required (-name).
=item 2.
The second argument (-values) is the list of values for the radio
buttons. The values and the labels that appear on the page are
identical. Pass an array I<reference> in the second argument, either
using an anonymous array, as shown, or by referencing a named array as
in "\@foo".
=item 3.
The optional third parameter (-default) is the name of the default
button to turn on. If not specified, the first item will be the
default. You can provide a nonexistent button name, such as "-" to
start up with no buttons selected.
=item 4.
The optional fourth parameter (-linebreak) can be set to 'true' to put
line breaks between the buttons, creating a vertical list.
=item 5.
The optional fifth parameter (-labels) is a pointer to an associative
array relating the radio button values to user-visible labels to be
used in the display. If not provided, the values themselves are
displayed.
=item 6.
B<HTML3-compatible browsers> (such as Netscape) can take advantage
of the optional
parameters B<-rows>, and B<-columns>. These parameters cause
radio_group() to return an HTML3 compatible table containing
the radio group formatted with the specified number of rows
and columns. You can provide just the -columns parameter if you
wish; radio_group will calculate the correct number of rows
for you.
To include row and column headings in the returned table, you
can use the B<-rowheader> and B<-colheader> parameters. Both
of these accept a pointer to an array of headings to use.
The headings are just decorative. They don't reorganize the
interpetation of the radio buttons -- they're still a single named
unit.
=back
When the form is processed, the selected radio button can
be retrieved using:
$which_radio_button = $query->param('group_name');
The value returned by radio_group() is actually an array of button
elements. You can capture them and use them within tables, lists,
or in other creative ways:
@h = $query->radio_group(-name=>'group_name',-values=>\@values);
&use_in_creative_way(@h);
=head2 CREATING A SUBMIT BUTTON
print $query->submit(-name=>'button_name',
-value=>'value');
-orprint $query->submit('button_name','value');
submit() will create the query submission button.
should have one of these.
Every form
=over 4
=item 1.
The first argument (-name) is optional. You can give the button a
name if you have several submission buttons in your form and you want
to distinguish between them. The name will also be used as the
user-visible label. Be aware that a few older browsers don't deal with this correctly and
B<never> send back a value from a button.
=item 2.
The second argument (-value) is also optional. This gives the button
a value that will be passed to your script in the query string.
=back
You can figure out which button was pressed by using different
values for each one:
$which_one = $query->param('button_name');
JAVASCRIPTING: radio_group() recognizes the B<-onClick>
=head2 CREATING A RESET BUTTON
print $query->reset
reset() creates the "reset" button. Note that it restores the
form to its value from the last time the script was called,
NOT necessarily to the defaults.
=head2 CREATING A DEFAULT BUTTON
print $query->defaults('button_label')
defaults() creates a button that, when invoked, will cause the
form to be completely reset to its defaults, wiping out all the
changes the user ever made.
=head2 CREATING A HIDDEN FIELD

print $query->hidden(-name=>'hidden_name',
-default=>['value1','value2'...]);
-orprint $query->hidden('hidden_name','value1','value2'...);
hidden() produces a text field that can't be seen by the user. It
is useful for passing state variable information from one invocation
of the script to the next.
=over 4
=item 1.
The first argument is required and specifies the name of this
field (-name).
=item 2.
The second argument is also required and specifies its value
(-default). In the named parameter style of calling, you can provide
a single value here or a reference to a whole list
=back
Fetch the value of a hidden field this way:
$hidden_value = $query->param('hidden_name');
Note, that just like all the other form elements, the value of a
hidden field is "sticky". If you want to replace a hidden field with
some other values after the script has been called once you'll have to
do it manually:
$query->param('hidden_name','new','values','here');
=head2 CREATING A CLICKABLE IMAGE BUTTON
print $query->image_button(-name=>'button_name',
-src=>'/source/URL',
-align=>'MIDDLE');
-orprint $query->image_button('button_name','/source/URL','MIDDLE');
image_button() produces a clickable image. When it's clicked on the
position of the click is returned to your script as "button_name.x"
and "button_name.y", where "button_name" is the name you've assigned
to it.
JAVASCRIPTING: image_button() recognizes the B<-onClick>
=over 4
=item 1.
The first argument (-name) is required and specifies the name of this
field.
=item 2.
The second argument (-src) is also required and specifies the URL
=item 3.
The third option (-align, optional) is an alignment type, and may be
TOP, BOTTOM or MIDDLE
=back
Fetch the value of the button this way:
$x = $query->param('button_name.x');
$y = $query->param('button_name.y');
=head2 CREATING A JAVASCRIPT ACTION BUTTON
print $query->button(-name=>'button_name',
-value=>'user visible label',
-onClick=>"do_something()");
-orprint $query->button('button_name',"do_something()");
button() produces a button that is compatible with Netscape 2.0's
JavaScript. When it's pressed the fragment of JavaScript code
pointed to by the B<-onClick> parameter will be executed. On
non-Netscape browsers this form element will probably not even
display.
=head1 NETSCAPE COOKIES
Netscape browsers versions 1.1 and higher support a so-called
"cookie" designed to help maintain state within a browser session.
CGI.pm has several methods that support cookies.
A cookie is a name=value pair much like the named parameters in a CGI
query string. CGI scripts create one or more cookies and send
them to the browser in the HTTP header. The browser maintains a list
of cookies that belong to a particular Web server, and returns them
to the CGI script during subsequent interactions.
In addition to the required name=value pair, each cookie has several
optional attributes:
=over 4
=item 1. an expiration time
This is a time/date string (in a special GMT format) that indicates
when a cookie expires. The cookie will be saved and returned to your
script until this expiration date is reached if the user exits
Netscape and restarts it. If an expiration date isn't specified, the cookie
will remain active until the user quits Netscape.
=item 2. a domain
This is a partial or complete domain name for which the cookie is
valid. The browser will return the cookie to any host that matches
the partial domain name. For example, if you specify a domain name
of ".capricorn.com", then Netscape will return the cookie to
Web servers running on any of the machines "www.capricorn.com",

"www2.capricorn.com", "feckless.capricorn.com", etc. Domain names
must contain at least two periods to prevent attempts to match
on top level domains like ".edu". If no domain is specified, then
the browser will only return the cookie to servers on the host the
cookie originated from.
=item 3. a path
If you provide a cookie path attribute, the browser will check it
against your script's URL before returning the cookie. For example,
if you specify the path "/cgi-bin", then the cookie will be returned
to each of the scripts "/cgi-bin/tally.pl", "/cgi-bin/order.pl",
and "/cgi-bin/customer_service/complain.pl", but not to the script
"/cgi-private/site_admin.pl". By default, path is set to "/", which
causes the cookie to be sent to any CGI script on your site.
=item 4. a "secure" flag
If the "secure" attribute is set, the cookie will only be sent to your
script if the CGI request is occurring on a secure channel, such as SSL.
=back
The interface to Netscape cookies is the B<cookie()> method:
$cookie = $query->cookie(-name=>'sessionID',
-value=>'xyzzy',
-expires=>'+1h',
-path=>'/cgi-bin/database',
-domain=>'.capricorn.org',
-secure=>1);
print $query->header(-cookie=>$cookie);
B<cookie()> creates a new cookie.
Its parameters include:
=over 4
=item B<-name>
The name of the cookie (required). This can be any string at all.
Although Netscape limits its cookie names to non-whitespace
alphanumeric characters, CGI.pm removes this restriction by escaping
and unescaping cookies behind the scenes.
=item B<-value>
The value of the cookie. This can be any scalar value,
array reference, or even associative array reference. For example,
you can store an entire associative array into a cookie this way:
$cookie=$query->cookie(-name=>'family information',
-value=>\%childrens_ages);
=item B<-path>
The optional partial path for which this cookie will be valid, as described
above.
=item B<-domain>
The optional partial domain for which this cookie will be valid, as described
above.
=item B<-expires>
The optional expiration date for this cookie.
in the section on the B<header()> method:
"+1h"
The format is as described
one hour from now
=item B<-secure>
If set to true, this cookie will only be used within a secure
SSL session.
=back
The cookie created by cookie() must be incorporated into the HTTP
header within the string returned by the header() method:
print $query->header(-cookie=>$my_cookie);
To create multiple cookies, give header() an array reference:
$cookie1 = $query->cookie(-name=>'riddle_name',
-value=>"The Sphynx's Question");
$cookie2 = $query->cookie(-name=>'answers',
-value=>\%answers);
print $query->header(-cookie=>[$cookie1,$cookie2]);
To retrieve a cookie, request it by name by calling cookie()
method without the B<-value> parameter:
use CGI;
$query = new CGI;
%answers = $query->cookie(-name=>'answers');
# $query->cookie('answers') will work too!
The cookie and CGI namespaces are separate. If you have a parameter
named 'answers' and a cookie named 'answers', the values retrieved by
param() and cookie() are independent of each other. However, it's
simple to turn a CGI parameter into a cookie, and vice-versa:
# turn a CGI parameter into a cookie
$c=$q->cookie(-name=>'answers',-value=>[$q->param('answers')]);
# vice-versa
$q->param(-name=>'answers',-value=>[$q->cookie('answers')]);
See the B<cookie.cgi> example script for some ideas on how to use
cookies effectively.
B<NOTE:> There appear to be some (undocumented) restrictions on
Netscape cookies. In Netscape 2.01, at least, I haven't been able to
set more than three cookies at a time. There may also be limits on
the length of cookies. If you need to store a lot of information,
it's probably better to create a unique session ID, store it in a
cookie, and use the session ID to locate an external file/database
saved on the server's side of the connection.
=head1 WORKING WITH NETSCAPE FRAMES
It's possible for CGI.pm scripts to write into several browser
panels and windows using Netscape's frame mechanism.
There are three techniques for defining new frames programmatically:
=over 4
=item 1. Create a <Frameset> document

After writing out the HTTP header, instead of creating a standard
HTML document using the start_html() call, create a <FRAMESET>
document that defines the frames on the page. Specify your script(s)
(with appropriate parameters) as the SRC for each of the frames.
There is no specific support for creating <FRAMESET> sections
in CGI.pm, but the HTML is very simple to write. See the frame
documentation in Netscape's home pages for details
http://home.netscape.com/assist/net_sites/frames.html
=item 2. Specify the destination for the document in the HTTP header
You may provide a B<-target> parameter to the header() method:
print $q->header(-target=>'ResultsWindow');
This will tell Netscape to load the output of your script into the
frame named "ResultsWindow". If a frame of that name doesn't
already exist, Netscape will pop up a new window and load your
script's document into that. There are a number of magic names
that you can use for targets. See the frame documents on Netscape's
home pages for details.
=item 3. Specify the destination for the document in the <FORM> tag
You can specify the frame to load in the FORM tag itself.
CGI.pm it looks like this:
With
print $q->startform(-target=>'ResultsWindow');
When your script is reinvoked by the form, its output will be loaded
into the frame named "ResultsWindow". If one doesn't already exist
a new window will be created.
=back
The script "frameset.cgi" in the examples directory shows one way to
create pages in which the fill-out form and the response live in
side-by-side frames.
=head1 DEBUGGING
If you are running the script
from the command line or in the perl debugger, you can pass the script
a list of keywords or parameter=value pairs on the command line or
from standard input (you don't have to worry about tricking your
script into reading from environment variables).
You can pass keywords like this:
your_script.pl keyword1 keyword2 keyword3
or this:
your_script.pl keyword1+keyword2+keyword3
or this:
your_script.pl name1=value1 name2=value2
or this:
your_script.pl name1=value1&name2=value2
or even as newline-delimited parameters on standard input.
When debugging, you can use quotes and backslashes to escape
characters in the familiar shell manner, letting you place
spaces and other funny characters in your parameter=value
pairs:
your_script.pl "name1='I am a long value'" "name2=two\ words"
=head2 DUMPING OUT ALL THE NAME/VALUE PAIRS
The dump() method produces a string consisting of all the query's
name/value pairs formatted nicely as a nested list. This is useful
for debugging purposes:
print $query->dump
Produces something that looks like:
<UL>
<LI>name1
<UL>
<LI>value1
<LI>value2
</UL>
<LI>name2
<UL>
<LI>value1
</UL>
</UL>
You can pass a value of 'true' to dump() in order to get it to
print the results out as plain text, suitable for incorporating
into a <PRE> section.
As a shortcut, as of version 1.56 you can interpolate the entire CGI
object into a string and it will be replaced with the a nice HTML dump
shown above:
$query=new CGI;
print "<H2>Current Values</H2> $query\n";
=head1 FETCHING ENVIRONMENT VARIABLES
Some of the more useful environment variables can be fetched
through this interface. The methods are as follows:
=over 4
=item B<accept()>
Return a list of MIME types that the remote browser
accepts. If you give this method a single argument
corresponding to a MIME type, as in
$query->accept('text/html'), it will return a
floating point value corresponding to the browser's
preference for this type from 0.0 (don't want) to 1.0.
Glob types (e.g. text/*) in the browser's accept list
are handled correctly.
=item B<raw_cookie()>
Returns the HTTP_COOKIE variable, an HTTP extension

implemented by Netscape browsers version 1.1
and higher. Cookies have a special format, and this
method call just returns the raw form (?cookie dough).
See cookie() for ways of setting and retrieving
cooked cookies.
=item B<user_agent()>
Returns the HTTP_USER_AGENT variable. If you give
this method a single argument, it will attempt to
pattern match on it, allowing you to do something
like $query->user_agent(netscape);
=item B<path_info()>
Returns additional path information from the script URL.
E.G. fetching /cgi-bin/your_script/additional/stuff will
result in $query->path_info() returning
"additional/stuff".
NOTE: The Microsoft Internet Information Server
is broken with respect to additional path information. If
you use the Perl DLL library, the IIS server will attempt to
execute the additional path information as a Perl script.
If you use the ordinary file associations mapping, the
path information will be present in the environment,
but incorrect. The best thing to do is to avoid using additional
path information in CGI scripts destined for use with IIS.
=item B<path_translated()>
As per path_info() but returns the additional
path information translated into a physical path, e.g.
"/usr/local/etc/httpd/htdocs/additional/stuff".
The Microsoft IIS is broken with respect to the translated
path as well.
=item B<remote_host()>
Returns either the remote host name or IP address.
if the former is unavailable.
=item B<script_name()>
Return the script name as a partial URL, for self-refering
scripts.
=item B<referer()>
Return the URL of the page the browser was viewing
prior to fetching your script. Not available for all
browsers.
=item B<auth_type ()>
Return the authorization/verification method in use for this
script, if any.
=item B<server_name ()>
Returns the name of the server, usually the machine's host
name.
=item B<virtual_host ()>
When using virtual hosts, returns the name of the host that
the browser attempted to contact
=item B<server_software ()>
Returns the server software and version number.
=item B<remote_user ()>
Return the authorization/verification name used for user
verification, if this script is protected.
=item B<user_name ()>
Attempt to obtain the remote user's name, using a variety
of different techniques. This only works with older browsers
such as Mosaic. Netscape does not reliably report the user
name!
=item B<request_method()>
Returns the method used to access your script, usually
one of 'POST', 'GET' or 'HEAD'.
=back
=head1 CREATING HTML ELEMENTS:
In addition to its shortcuts for creating form elements, CGI.pm
defines general HTML shortcut methods as well. HTML shortcuts are
named after a single HTML element and return a fragment of HTML text
that you can then print or manipulate as you like.
This example shows how to use the HTML methods:
$q = new CGI;
print $q->blockquote(
"Many years ago on the island of",
$q->a({href=>"http://crete.org/"},"Crete"),
"there lived a minotaur named",
$q->strong("Fred."),
),
$q->hr;
This results in the following HTML code (extra newlines have been
added for readability):
<blockquote>
Many years ago on the island of
<a HREF="http://crete.org/">Crete</a> there lived
a minotaur named <strong>Fred.</strong>
</blockquote>
<hr>
If you find the syntax for calling the HTML shortcuts awkward, you can
import them into your namespace and dispense with the object syntax
completely (see the next section for more details):
use CGI shortcuts;
print blockquote(
# IMPORT HTML SHORTCUTS
"Many years ago on the island of",

a({href=>"http://crete.org/"},"Crete"),
"there lived a minotaur named",
strong("Fred."),
),
hr;
=head2 PROVIDING ARGUMENTS TO HTML SHORTCUTS
The HTML methods will accept zero, one or multiple arguments.
provide no arguments, you get a single tag:
If you
print hr;
# gives "<hr>"
If you provide one or more string arguments, they are concatenated
together with spaces and placed between opening and closing tags:
print h1("Chapter","1");
# gives "<h1>Chapter 1</h1>"
If the first argument is an associative array reference, then the keys
and values of the associative array become the HTML tag's attributes:
print a({href=>'fred.html',target=>'_new'},
"Open a new frame");
# gives <a href="fred.html",target="_new">Open a new frame</a>
You are free to use CGI.pm-style dashes in front of the attribute
names if you prefer:
print img {-src=>'fred.gif',-align=>'LEFT'};
# gives <img ALIGN="LEFT" SRC="fred.gif">
=head2 Generating new HTML tags
Since no mere mortal can keep up with Netscape and Microsoft as they
battle it out for control of HTML, the code that generates HTML tags
is general and extensible. You can create new HTML tags freely just
by referring to them on the import line:
use CGI shortcuts,winkin,blinkin,nod;
Now, in addition to the standard CGI shortcuts, you've created HTML
tags named "winkin", "blinkin" and "nod". You can use them like this:
print blinkin {color=>'blue',rate=>'fast'},"Yahoo!";
# <blinkin COLOR="blue" RATE="fast">Yahoo!</blinkin>
=head1 IMPORTING CGI METHOD CALLS INTO YOUR NAME SPACE
As a convenience, you can import most of the CGI method calls directly
into your name space. The syntax for doing this is:
use CGI <list of methods>;
The listed methods will be imported into the current package; you can
call them directly without creating a CGI object first. This example
shows how to import the B<param()> and B<header()>
methods, and then use them directly:
use CGI param,header;
print header('text/plain');
$zipcode = param('zipcode');
You can import groups of methods by referring to a number of special

names:
=over 4
=item B<cgi>
Import all CGI-handling methods, such as B<param()>, B<path_info()>
and the like.
=item B<form>
Import all fill-out form generating methods, such as B<textfield()>.
=item B<html2>
Import all methods that generate HTML 2.0 standard elements.
=item B<html3>
Import all methods that generate HTML 3.0 proposed elements (such as
<table>, <super> and <sub>).
=item B<netscape>
Import all methods that generate Netscape-specific HTML extensions.
=item B<shortcuts>
Import all HTML-generating shortcuts (i.e. 'html2' + 'html3' +
'netscape')...
=item B<standard>
Import "standard" features, 'html2', 'form' and 'cgi'.
=item B<all>
Import all the available methods. For the full list, see the CGI.pm
code, where the variable %TAGS is defined.
=back
Note that in the interests of execution speed CGI.pm does B<not> use
the standard L<Exporter> syntax for specifying load symbols. This may
change in the future.
If you import any of the state-maintaining CGI or form-generating
methods, a default CGI object will be created and initialized
automatically the first time you use any of the methods that require
one to be present. This includes B<param()>, B<textfield()>,
B<submit()> and the like. (If you need direct access to the CGI
object, you can find it in the global variable B<$CGI::Q>). By
importing CGI.pm methods, you can create visually elegant scripts:
use CGI standard,html2;
print
header,
start_html('Simple Script'),
h1('Simple Script'),
start_form,
"What's your name? ",textfield('name'),p,
"What's the combination?",
checkbox_group(-name=>'words',
-defaults=>['eenie','moe']),p,
"What's your favorite color?",
popup_menu(-name=>'color',
-values=>['red','green','blue','chartreuse']),p,
submit,
end_form,
hr,"\n";
if (param) {
print
"Your name is ",em(param('name')),p,
"The keywords are: ",em(join(", ",param('words'))),p,
"Your favorite color is ",em(param('color')),".\n";
}
print end_html;
=head1 USING NPH SCRIPTS
NPH, or "no-parsed-header", scripts bypass the server completely by
sending the complete HTTP header directly to the browser. This has
slight performance benefits, but is of most use for taking advantage
of HTTP extensions that are not directly supported by your server,
such as server push and PICS headers.
Servers use a variety of conventions for designating CGI scripts as
NPH. Many Unix servers look at the beginning of the script's name for
the prefix "nph-". The Macintosh WebSTAR server and Microsoft's
Internet Information Server, in contrast, try to decide whether a
program is an NPH script by examining the first line of script output.
CGI.pm supports NPH scripts with a special NPH mode. When in this
mode, CGI.pm will output the necessary extra header information when
the header() and redirect() methods are
called.
The Microsoft Internet Information Server requires NPH mode. As of version
2.30, CGI.pm will automatically detect when the script is running under IIS
and put itself into this mode. You do not need to do this manually, although
it won't hurt anything if you do.
There are a number of ways to put CGI.pm into NPH mode:
=over 4
=item In the B<use> statement
Simply add ":nph" to the list of symbols to be imported into your script:
use CGI qw(:standard :nph)
=item By calling the B<nph()> method:
Call B<nph()> with a non-zero parameter at any point after using CGI.pm in your program.
CGI->nph(1)
=item By using B<-nph> parameters in the B<header()> and B<redirect()>
print $q->header(-nph=>1);
=back
statements:
=head1 AUTHOR INFORMATION

Copyright 1995,1996, Lincoln D. Stein. All rights reserved. It may
be used and modified freely, but I do request that this copyright
notice remain attached to the file. You may modify this module as you
wish, but if you redistribute a modified version, please attach a note
listing the modifications you have made.
Address bug reports and comments to:
lstein@genome.wi.mit.edu
=head1 CREDITS
Thanks very much to:
=over 4
=item Matt Heffron (heffron@falstaff.css.beckman.com)
=item James Taylor (james.taylor@srs.gov)
=item Scott Anguish <sanguish@digifix.com>
=item Mike Jewell (mlj3u@virginia.edu)
=item Timothy Shimmin (tes@kbs.citri.edu.au)
=item Joergen Haegg (jh@axis.se)
=item Laurent Delfosse (delfosse@csgrad1.cs.wvu.edu)
=item Richard Resnick (applepi1@aol.com)
=item Craig Bishop (csb@barwonwater.vic.gov.au)
=item Tony Curtis (tc@vcpc.univie.ac.at)
=item Tim Bunce (Tim.Bunce@ig.co.uk)
=item Tom Christiansen (tchrist@convex.com)
=item Andreas Koenig (k@franz.ww.TU-Berlin.DE)
=item Tim MacKenzie (Tim.MacKenzie@fulcrum.com.au)
=item Kevin B. Hendricks (kbhend@dogwood.tyler.wm.edu)
=item Stephen Dahmen (joyfire@inxpress.net)
=item Ed Jordan (ed@fidalgo.net)
=item David Alan Pisoni (david@cnation.com)
=item ...and many many more...
for suggestions and bug fixes.
=back
=head1 A COMPLETE EXAMPLE OF A SIMPLE FORM-BASED SCRIPT
#!/usr/local/bin/perl
use CGI;
$query = new CGI;

print $query->start_html("Example CGI.pm Form");
print "<H1> Example CGI.pm Form</H1>\n";
&print_prompt($query);
&do_work($query);
&print_tail;
print $query->end_html;
sub print_prompt {
my($query) = @_;
print
print
print
print
$query->startform;
"<EM>What's your name?</EM><BR>";
$query->textfield('name');
$query->checkbox('Not my real name');
print "<P><EM>Where can you find English Sparrows?</EM><BR>";

print $query->checkbox_group(
-name=>'Sparrow locations',
-values=>[England,France,Spain,Asia,Hoboken],
-linebreak=>'yes',
-defaults=>[England,Asia]);
print "<P><EM>How far can they fly?</EM><BR>",
$query->radio_group(
-name=>'how far',
-values=>['10 ft','1 mile','10 miles','real far'],
-default=>'1 mile');
print "<P><EM>What's your favorite color?</EM> ";
print $query->popup_menu(-name=>'Color',
-values=>['black','brown','red','yellow'],
-default=>'red');
print $query->hidden('Reference','Monty Python and the Holy Grail');
print "<P><EM>What have you got there?</EM><BR>";
print $query->scrolling_list(
-name=>'possessions',
-values=>['A Coconut','A Grail','An Icon',
'A Sword','A Ticket'],
-size=>5,
-multiple=>'true');
print "<P><EM>Any parting comments?</EM><BR>";
print $query->textarea(-name=>'Comments',
-rows=>10,
-columns=>50);
print
print
print
print
print
"<P>",$query->reset;
$query->submit('Action','Shout');
$query->submit('Action','Scream');
$query->endform;
"<HR>\n";
}
sub do_work {
my($query) = @_;
my(@values,$key);
print "<H2>Here are the current settings in this form</H2>";

foreach $key ($query->param) {
print "<STRONG>$key</STRONG> -> ";
@values = $query->param($key);
print join(", ",@values),"<BR>\n";
}
}
sub print_tail {
print <<END;
<HR>
<ADDRESS>Lincoln D. Stein</ADDRESS><BR>
<A HREF="/">Home Page</A>
END
}
=head1 BUGS
This module has grown large and monolithic. Furthermore it's doing many
things, such as handling URLs, parsing CGI input, writing HTML, etc., that
are also done in the LWP modules. It should be discarded in favor of
the CGI::* modules, but somehow I continue to work on it.
Note that the code is truly contorted in order to avoid spurious
warnings when programs are run with the B<-w> switch.
=head1 SEE ALSO
L<CGI::Carp>, L<URI::URL>, L<CGI::Request>, L<CGI::MiniSvr>,
L<CGI::Base>, L<CGI::Form>, L<CGI::Apache>, L<CGI::Switch>,
L<CGI::Push>, L<CGI::Fast>
=cut
Eskimo North CGI Counter Editor
This form will allow to edit, reset or even create the personalized data file used by
Eskimo North's CGI counter. For more information on the counter please referrer to its
documentation.
Your Login Name:

Desired or New Counter Value:
Clear Form
(reset the form)
Submit Form
(submit the form)
You must have a valid Eskimo account in order to use this form. If you are an Eskimo
subscriber and keep on getting error messages, your browser is most likely not sending
the correct remote identification information. Don't worry about it; simply use the lynx
browser available on your Eskimo shell account.
Copyright 1997 SkyTouch Communications. All rights reserved.
http://www.eskimo.com/cgihelp/cgi-counter.html [22/07/2003 5:57:09 PM]
http://www.eskimo.com/cgihelp/scripts/date.txt
#!/bin/sh
DATE=/bin/date
echo Content-type: text/plain
echo
if [ -x $DATE ]; then
$DATE
else
echo Cannot find date command on this system.
fi
http://www.eskimo.com/cgihelp/scripts/date.txt [22/07/2003 5:57:11 PM]
http://www.eskimo.com/cgi-bin/date
Tue Jul 22 05:28:00 PDT 2003
http://www.eskimo.com/cgi-bin/date [22/07/2003 5:57:13 PM]
$heading
#!/usr/local/bin/perl5 # -*- Mode: Perl -*- # # lib5-cgi.pl A perl library of CGI functions. # # copyright
1995 David Walker # # This program is free software; you can redistribute it and/or modify # it under the
terms of the GNU General Public License as published by # the Free Software Foundation; either version
2, or (at your option) # any later version. # # You may also redistribute it and/or modify it under # the
terms of the Artistic License under which Perl is distributed. # # This program is distributed in the hope
that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of #
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public
License for more details. # # You should have received a copy of the GNU General Public License #
along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge,
MA 02139, USA. # # Ideas used in this program came from the cgi-lib.pl perl library # by Steven
Brenner's and # "Managing Internet Information Services" by Lui, Peek, Jones, # Buus & Nye. Published
by O'Reilly and Associates, Inc. # # Sat May 6 11:36:18 GMT-0700 1995 version 1.0.b # Fri Oct 20
12:30:04 1995 checked syntax for Perl 5, dw # Modified by Erik C. Thauvin (erik@skytouch.com) # Sun
Apr 27 01:56:36 PDT 1997 # # Added FILEPATH_INFO support. package lib_cgi ; sub
print_doc_header { local($content_type) = @_ ; $* = 1 ; # why do we need this? if
(defined($content_type)) { print "Content-type: $content_type\n\n" ; } else { print "Content-type:
text/html\n\n" ; } } sub print_page_header { local($heading) = @_ ; print "\n\n" ; print "\n" ; print "\n" ;
print "
$heading
\n" ; } sub print_page_trailer { print "\n\n" ; } sub error_message { local($message) = @_ ;
&print_page_header("ERROR") ; print "\n
$message
\n" ; print "
\n" ; &print_page_trailer ; exit 0 ; } sub get_args_post { local($buffer) ; # $CONTENT_TYPE is only set
for POST calls. &error_message('This script can only process form results.') unless
($ENV{'CONTENT_TYPE'} eq 'application/x-www-form-urlencoded') ; read(STDIN, $buffer,
$ENV{'CONTENT_LENGTH'}) ; # Split the name-value pairs @pairs = split(/&/, $buffer) ; foreach
$pair (@pairs) { ($name, $value) = split(/=/, $pair) ; $value =~ tr/+/ / ; $value =~ s/%([a-fA-F0-9][a-fAF0-9])/pack("C", hex($1))/eg ; $main::FORM{$name} = $value ; } } sub get_args_get {
local($path_info) = $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'}; @args = split('/', $path_info) ;
foreach (@args) { tr/+/ / ; ($name, $value) = split(/=/, $_) ; $main'FORM{$name} = $value ; } } sub
get_args { if ($ENV{'REQUEST_METHOD'} eq 'POST') { &get_args_post ; } elsif
($ENV{'REQUEST_METHOD'} eq 'GET') { &get_args_get ; } else { &error_message("Illegal Request
Method $ENV{REQUEST_METHOD}, " . "This script must be referenced using METHOD GET or
POST.") ; } } # Look for evil characters sub check_for_evil_characters { # $my_character_set must be
enclosed in single (') quotes. local($str,$variable_name,$my_character_set) = @_ ; if
(defined($my_character_set)) { $ok_character_set = $my_character_set ; } else { $ok_character_set = '[ahttp://www.eskimo.com/cgihelp/scripts/lib5-cgi.pl.txt (1 of 2) [22/07/2003 5:57:17 PM]
$heading
zA-Z0-9_\-+. \t\/@%]' ; } if ($str !~ /^$ok_character_set+$/) {

&evil_characters($variable_name,$ok_character_set) ; } $str ; # return validated string } sub
evil_characters { local($variable_name,$ok_character_set) = @_ ; &error_message("The $variable_name
you entered contains illegal " . "characters, legal characters are $ok_character_set. " . "Please back up and
correct, then resubmit.") ; } 1 ;
http://www.eskimo.com/cgihelp/scripts/lib5-cgi.pl.txt (2 of 2) [22/07/2003 5:57:17 PM]
$heading
#!/usr/local/bin/perl # -*- Mode: Perl -*- # # lib-cgi.pl A perl library of CGI functions. # # copyright
1995 David Walker # # This program is free software; you can redistribute it and/or modify # it under the
terms of the GNU General Public License as published by # the Free Software Foundation; either version
2, or (at your option) # any later version. # # You may also redistribute it and/or modify it under # the
terms of the Artistic License under which Perl is distributed. # # This program is distributed in the hope
that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of #
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public
License for more details. # # You should have received a copy of the GNU General Public License #
along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge,
MA 02139, USA. # # Ideas used in this program came from the cgi-lib.pl perl library # by Steven
Brenner's and # "Managing Internet Information Services" by Lui, Peek, Jones, # Buus & Nye. Published
by O'Reilly and Associates, Inc. # # Sat May 6 11:36:18 GMT-0700 1995 version 1.0.b # Fri Oct 20
12:30:04 1995 checked syntax for Perl 5, dw # Modified by Erik C. Thauvin (erik@skytouch.com) # Sun
Apr 27 01:56:36 PDT 1997 # # Added FILEPATH_INFO support. package lib_cgi ; sub
print_doc_header { local($content_type) = @_ ; $* = 1 ; # why do we need this? if
(defined($content_type)) { print "Content-type: $content_type\n\n" ; } else { print "Content-type:
text/html\n\n" ; } } sub print_page_header { local($heading) = @_ ; print "\n\n" ; print "\n" ; print "\n" ;
print "
$heading
\n" ; } sub print_page_trailer { print "\n\n" ; } sub error_message { local($message) = @_ ;
&print_page_header("ERROR") ; print "\n
$message
\n" ; print "
\n" ; &print_page_trailer ; exit 0 ; } sub get_args_post { local($buffer) ; # $CONTENT_TYPE is only set
for POST calls. &error_message('This script can only process form results.') unless
($ENV{'CONTENT_TYPE'} eq 'application/x-www-form-urlencoded') ; read(STDIN, $buffer,
$ENV{'CONTENT_LENGTH'}) ; # Split the name-value pairs @pairs = split(/&/, $buffer) ; foreach
$pair (@pairs) { ($name, $value) = split(/=/, $pair) ; $value =~ tr/+/ / ; $value =~ s/%([a-fA-F0-9][a-fAF0-9])/pack("C", hex($1))/eg ; $main'FORM{$name} = $value ; } } sub get_args_get { local($path_info)
= $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'}; @args = split('/', $path_info) ; foreach (@args)
{ tr/+/ / ; ($name, $value) = split(/=/, $_) ; $main'FORM{$name} = $value ; } } sub get_args { if
($ENV{'REQUEST_METHOD'} eq 'POST') { &get_args_post ; } elsif ($ENV{'REQUEST_METHOD'}
eq 'GET') { &get_args_get ; } else { &error_message("Illegal Request Method
$ENV{REQUEST_METHOD}, " . "This script must be referenced using METHOD GET or POST.") ; }
} # Look for evil characters sub check_for_evil_characters { # $my_character_set must be enclosed in
single (') quotes. local($str,$variable_name,$my_character_set) = @_ ; if (defined($my_character_set)) {
$ok_character_set = $my_character_set ; } else { $ok_character_set = '[a-zA-Z0-9_\-+. \t\/@%]' ; } if
http://www.eskimo.com/cgihelp/scripts/lib-cgi.pl.txt (1 of 2) [22/07/2003 5:57:19 PM]
$heading
($str !~ /^$ok_character_set+$/) { &evil_characters($variable_name,$ok_character_set) ; } $str ; # return

validated string } sub evil_characters { local($variable_name,$ok_character_set) = @_ ;
&error_message("The $variable_name you entered contains illegal " . "characters, legal characters are
$ok_character_set. " . "Please back up and correct, then resubmit.") ; } 1 ;
http://www.eskimo.com/cgihelp/scripts/lib-cgi.pl.txt (2 of 2) [22/07/2003 5:57:19 PM]
RANDOM: ERROR!
#!/usr/local/bin/taintperl -- -*-perl-*- # # RANDOM - Generate random URL from a text file 'database.' #
By Jason A. Dour # # (c) 1995 Jason A. Dour & the University of Louisville # # Free for distribution,
copying, editing, and hacking under the GNU public # license. See file 'gnu.public.license.2.0' for specific
information. # This software comes with no guarantees implicit or implied, and the # author(s) of this
software cannot be held responsible for loss, damage, # acts of god(s), large amounts of small rodentia,
deafness, plague, # baldness, or nose-bleeds occurring as a direct -- or indirect -- # result of the use of
'random.' This software is to be used for # randomizin, peace, love, and spreading genuine feelings of well
being. # All other uses are denounced by the author(s). # # Love, Peace, Gerbils, & Hair Grease, # Jason
A. Dour # # Last modified on 08/14/97 by Erik C. Thauvin # # To call random, use something like: # #
http://www.eskimo.com/cgi-bin/random/~yourlogin/filename # # where "filename" is a world readable
file located in your public_html # directory, and containing the listing of URLs. #------------------------------------------------------------------------- # $ENV{'PATH'}="/bin:/usr/bin:/usr/lib"; $ENV{'IFS'} = ""; # #
Email address/name of contact for server problems. # $contact = "webmaster\@eskimo.com"; # # HREF
of the mail URL for the server contact. # $contact_url = "mailto:webmaster\@eskimo.com"; # # Get the
filename by reading the CGI env. variable. # &system_error("Incorrect CGI method! Use GET for proper
operation.") if ( $ENV{'REQUEST_METHOD'} eq "POST" ); &system_error("No list file provided!") if
( !$ENV{'PATH_TRANSLATED'} ); &system_error("List file does not exist!") if ( ! -e
$ENV{'PATH_TRANSLATED'} ); &system_error("List file cannot be accessed!") # if ( ! -r
$ENV{'PATH_TRANSLATED'} or ! -R $ENV{'PATH_TRANSLATED'} ); # Changed "or" to "||" -ECT 04/19/95 if ( ! -r $ENV{'PATH_TRANSLATED'} || ! -R $ENV{'PATH_TRANSLATED'} );
&system_error("List file has nothing in it!") if ( -z $ENV{'PATH_TRANSLATED'} ); # # Open the file
and read in all of its contents. So far, this has been # effective on files up to 1 Meg in size. Anything
larger, I am not certain. # open(LIST_FILE,$ENV{'PATH_TRANSLATED'}) || &system_error("File
cannot be opened!"); read(LIST_FILE,$list_buffer,(-s $ENV{'PATH_TRANSLATED'}));
close(LIST_FILE); # # Split the URL list. # @urls = split(/\n/,$list_buffer); # # Get a random array
number; the random seed being based upon time, and # current and parent process id's. # srand(
(time/$$)*getppid ); $location = int(rand($#urls+1)); # # Print out the random location if it is a valid
URL. # if ( grep (/^http/,$urls[$location]) || grep (/^file/,$urls[$location]) || grep (/^ftp/,$urls[$location]) ||
grep (/^gopher/,$urls[$location]) || grep (/^news/,$urls[$location]) || grep (/^telnet/,$urls[$location]) || grep
(/^tn3270/,$urls[$location]) || grep (/^mailto/,$urls[$location]) || grep (/^nntp/,$urls[$location]) || grep
(/^wais/,$urls[$location]) || grep (/^prospero/,$urls[$location]) ) { print "Location: $urls[$location]\n\n";
exit(0); } else { $error = $location+1; # &system_error("Invalid URL:
$urls[$location]
at line $error of $ENV{'PATH_TRANSLATED'}. Please let the maintainer of this file know this error
occurred."); # Changed to match standard CERN error format -- ECT 04/25/95 $urlval = $urls[$location];
$urlval =~ s/\&/\&\;/g; $urlval =~ s/\/\>\;/g; &system_error("Invalid URL -- $urlval at line $error of
$ENV{'PATH_TRANSLATED'}. Please let the maintainer of this file know this error occurred."); }; # #
Generic system error page. Will print out whatever error text you provide. # sub system_error { print
"Content-type: text/html\n\n"; # print ""; # print "
ERROR!
http://www.eskimo.com/cgihelp/scripts/random.txt (1 of 2) [22/07/2003 5:57:22 PM]
RANDOM: ERROR!
"; # print "
The error encountered was:

$_[0]
"; # print "If you feel that this error is server related, contact "; # print
""; # print "$contact.
"; # print "
RANDOM URL SELECTOR - March 8th, 1995

"; # print " 1995 Jason A. Dour & the University of Louisville
"; # print "Send comments, questions, suggestions, or problems to $contact.
"; # Changed to standard CERN error format -- ECT 04/19/95 print "\n"; print "\n"; print
"\n"; print "\n"; print "\n"; print "
Error
\n"; print "$_[0]\n"; print "
\n"; print "If you feel this error is server related, please contact $contact.\n"; print "
\n"; print "
\n"; print "
RANDOM 1995 Jason A. Dour & the University of Louisville.
\n"; print "
\n"; print "\n"; exit(0); }
http://www.eskimo.com/cgihelp/scripts/random.txt (2 of 2) [22/07/2003 5:57:22 PM]
Eskimo North Not Found Message
File Not
Found.
Eskimo North
Free 2-Week Trial

Home Page
Users Pages
Business Pages
Rand-o-Dex
Links
Linux Friendly
The file is deceased.

It is no longer. It has
passed on.
You can not have that file
in this life!
The file is dead, lifeless,
departed, demised, late,
extinct, no more.
It has broken on through
to the other side.
56k, Dual-56k, and ISDN Dial Access Locations

United States
Canada
Alberta
British Columbia
Ontario
Manitoba
Quebec
Alabama
Arizona
Connecticut
Delaware
Hawaii
Kansas
Massachusetts
Montana
New Mexico
Oklahoma
South Dakota
Utah
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Click >HERE< for a FREE two week trial!
http://www.eskimo.com/notfound.html (1 of 2) [22/07/2003 5:57:29 PM]
Arkansas
District of
Columbia
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Colorado
Florida
Georgia
Indiana
Maryland
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Iowa
Maine
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Eskimo North Not Found Message
Host Services
I'm sorry we couldn't meet under better

circumstances. The loss of a valued web
page can be tough.
Game Hosting
NNTP
read/post
Shell Accounts
Virtual
Domains
Dial Access Setup
As long as you are here, let me introduce

myself. My name is Robert Dinse, also
known as "Nanook". You can e-mail me at
nanook@eskimo.com.
Linux
MacIntosh
OS/2
Windows
Client Setup
I started Eskimo North as a single line BBS in 1982. Over
Access Services the years it has gradually evolved into the full featured
56k Dial
Access
Dual 56k Dial
Access
ISDN Access
DSL Access
Internet service providing access and host services such as

web hosting, e-mail, and Usenet News.
We have not been acquired. We have not acquired or
merged with other services. Eskimo North remains
independently owned and operated. The emphasis here is
on our service, not on wheeling and dealing. There are no
phone menus here, nor is there any triage at the help desk.
Every customer is important to us.
Western
Washington Only While you are here, please take advantage of our two-week
free trial of our 56k, dual 56k, ISDN, and remote shell
Frame Relay services. I believe that you will not be disappointed.

Dedicated PPP
UUCP
Postal
Mail
E-mail
Eskimo North Customer Support:

support@eskimo.com
P.O. Box
Information Bot:
55816
info@eskimo.com
Shoreline,
WA 981550816
USA
http://www.eskimo.com/notfound.html (2 of 2) [22/07/2003 5:57:29 PM]
Telephone
Numbers
Telephone: (206)
812-0051
Toll Free: (800)
246-6874
Facsimile: (206)
812-0054
E-Mail
Usenet News
Making a Website
CGI Help
Java Development
Kit
WWW info
Customer Web Sites
Customer Web Sites

This list includes only public web directories created by our users.
Users who wish to be excluded from this list may contact support@eskimo.com for removal.
Thank you.
2design (Debbie Deichert)

2nd-hand (Helary Jean-Christoph)
4janet (Ron Arnold)
4winds (Erv Crosby)
7in1 (7inone I. T)
ables (Robert K. Ables)
acady (Allan Cady)
aclee (Arthur Lee)
acs (ACS - Puget Sound)
acsoftw (AC Software Ltd.)
adahl (Alan Dahl)
adlib (Enata Design)
adrian_d (Adrian Dragulescu)
aeina (AEI North America)
afge421 (Jeffrey Norman)
aiken (Aiken Drum)
airs (Steven Eyrse)
alberrtr (Albert Rogers)
aline (Aline)
alisa (Alisa Wolfe)
allegro (Shawna Walls)
alpinist (David Butler)
amaranth (John Wert)
ambler (Andrew Malakoff)
amcc (Kim Wilson)
amta (Alaska Music Teachers Association)
amz (amz@eskimo.com)
anderson (Patti Anderson)
andyb (Andy Barlak)
angel1 (Barbara Williams)
http://www.eskimo.com/customers/eskimo_users.html (1 of 21) [22/07/2003 5:57:51 PM]
Customer Web Sites
anicho (Allan Nicholson)

anvil (Tom Corbett)
aqua (Aqua-Net Inc.)
aquatera (Barony of Aquaterra)
archer (Curt Mills)
artist (Holly Glenn)
aseow (Alvin Seow)
asp (Andrew Brown)
atvar (Joseph Bales)
aviva (Aviva Brandt)
axeman (axeman)
babytown (Julianne Gorda)
barrett (Barrett Kreiner)
bassman (Gregory Katz)
baze (Baze)
bbentley (Bob Bentley)
bbrace ({ brad brace })
beaker76 (Colin Smith)
bears (John Malek)
beauxj (Jo Hamilton-LM)
becky (Becky Peters)
belo (Nathan Belo)
benz (Ben Lawson)
beth (Elizabeth Kirby)
betsy (Betsy Williams Johnson)
beunlabs (Dale Beuning)
bfarwick (Brent Farwick)
bgudgel (Bob Gudgel)
bhaird (Chris Bhaird)
bhc (Bruce Clifton)
bigfoot (Ethan Jones)
bilb (William Beaty)
billb (William Beaty)
billw (William Werth)
bjzuccar (Bernie Zuccarelli)
bleach (Dave Smith)
bloo (Bloo)
bloodbox (Mark Lewis)
bluemoon (BlueMoon Llamas)
blusky (//\../\\)
bmasters (Brandon Masterson)
bmelin (Bruce Melin)
bobg (Robert Gillingham)
bobhilt (Bob Hiltner)
Customer Web Sites
bobsilva (Bob Silva)

boderek (Derek Adachi)
booknut (Joe Spitz)
boyerbl (Mark Boyer)
boyes (Paul Boyes)
bpentium (Robert York)
bpg (David J. Fritzsche)
bpie (Bradley Pierce)
bradkir (Brad Kirchner)
branic (Brandt Everett)
brawlboy (William Stretch)
bressler (Rick Bressler)
brianw (Brian Weaver)
brucek (Bruce Klein)
brucem (Bruce Miller)
bryny (Robert Eskridge)
bstutes (Robert Stutes)
bubble (Garry Golightly)
burked (Burke Dykes)
bvnstra (Brent Veenstra)
bvrlake (Beaver Lake Community Club)
c (Claire Connelly)
c180tom (Tom Jensen)
c2poet (Catherine Shaw)
cabell (Julie Cabell)
cahewson (Charles Hewson)
cajuncat (Michele Jasman)
calvin (Jimmie R. Farmer)
calz (Cal Herrmann)
camla (Chinese American Museum (LA))
capone (Ilan Capone)
captain (James LaJocies)
carbonic (Mike Fraley)
carcosa (havoc23)
carjula (Helane Blaustein)
carl (Carl Dinse)
carlr (Carl Rubin)
carlstro (Carlstrom Properties)
carol (Carol Brown)
cascadia (Cascadia Brass)
cat (Debbie Kraft)
cdgrass (Charles Grass)
cdickens (Chuck Dickens)
cecomp (****oxo****)
Customer Web Sites
cedixon (Carol E. Dixon)

cekerns (Craig Kerns)
celia (Celia White)
cetacean (Joseph Olson)
cfisher (Corine Fisher)
cfm (Olympic Medical)
chacha (Cele Noble)
chance (Vicki Lindsay Thauvin)
changer (Jason Bay)
chanson (Christopher Hanson)
charm (Charlene Wright)
chela (D. Davis)
chiefdn (Darryl Neuhoff)
chops (Toni Coles)
chrislum (Christopher Lum)
christie (Christie Dinse)
chuckb (Charles Bollinger)
churchd (Daniel Church)
chym (Elizabeth Phillips)
ciscoe (Mary Morris)
cjessup (Charles Jessup)
cjh (Carolyn Hayek)
clangell (Charles Angell)
clee (Craig Lee)
cliffc (Cliff Corcoran)
cliffg (Cliff Garrett)
cmkinc (Chris Kincaid)
coffee (Dragon's Lair Farm)
coffings (Steve Coffing)
colleen (James Wraalstad)
conniekk (Connie Keith-Kerns)
console (Grant Barnett)
cornell1 (Deborah Cornell)
corner (Corner Comics)
corsys (James Corbett)
cplasch (Chris Plaschka)
cpu (Bradley Willson)
craig (Craig Smith)
craigb (Craig Bystrom)
craigs (Scott Craig)
crasher (Shirley Chung)
crc (Craig Campbell)
crick (mish^)
crkeizer (Colin Keizer)
Customer Web Sites
croak (Cris Baylon)

cudworth (Mark Cudworth)
curtisld (Curtis DeGasperi)
cwild (Caroline Wildflower)
cwj2 (Charles W. Johnson)
cwn (Craig Nadler)
daddog (Richard Fritz)
dalt (Ted Estes)
dalus (Dale Beuning)
dan64 (D and S Wedeking)
daned (Dane Doerflinger)
danlh (Dan Hartmann)
davidk (David Kerlick)
davidl (David Lawson)
davidma (David Margrave)
davin (Davin Tarr)
daylight (Daylight Lodge #232)
dberman (Debbie Berman)
dcs (David Steinmetz)
ddf (Dave Funk)
ddlreyes (Dale de los Reyes)
defender (Public Defender Agency)
delisle (Ben de Lisle)
delvest (Maria Micola)
demian (Demian)
dennisl (Dennis Lunsford)
deselle (Joy Morrow)
dex3703 (Derek Dexheimer)
dianee (Diane Edwards)
dick (Richard Arnold)
dickb (Dick Barker)
digitrax (Michael Hesson)
dkane (Donald Kane)
dkeller (Dan Keller)
dknowles (Deirdre Knowles)
dling (Daniel Lingafelter)
dmauldin (Doug Mauldin)
dms (Keith Ramsay)
doer (Deep Ocean Exploration & Research)
donbgo (Don Bergquist)
dorm1 (Steven Reeves)
doug (Douglas Brooks)
douglas (Doug Sanderson)
dove (Jeremy L. St. James)
Customer Web Sites
dp4l (Dave Purvis)

dphinney (Donald Phinney)
dragavan (Ben Ingram)
drewr (Drew Ritland)
drice (Dana Rice)
drisner (David Risner)
drjim (Jim Weisenbach)
dslarsen (David Larsen)
dss (Dolly Smith)
dtaggart (dtaggart)
dvaita (Cyber Maadhva Sangha)
dwalk (D. Walk)
dwebb (David Webb)
dwpatten (Don Patten)
earl (Earl Hall)
earthcng (Mitch Battros)
earther (Steve Earth)
ec (Erwin Christian)
eddie (EDdie)
editz (Roger N. Hoffeditz)
edv (Eric VanHeest)
efialtis (David Mamanakis)
elatlas (Brian J. Matuschak)
electrcn (Zombywulf Consulting)
elgeorge (E. Laurie George)
elladan (Elladan)
ellis (Dave Ellis)
emerritt (Ethan A. Merritt)
emrkp (Kim Pineda)
enablers (Dennis Krueger)
enumclaw (John Anderson)
epauley (Eric Pauley)
eresrch (Mike Rosing)
ericj (Eric T. Jorgensen)
erics (Eric Stenson)
erict (Eric Taneda)
eriemer (Elizabeth Riemer)
erik (Erik Hedberg)
erin (Kelly Ann Corcoran)
ert (Ert Dredge)
escape (James LiGate)
espick (Emily Pickrell)
ethan74 (Ethan Sellers)
ethel (Kevin Curry)
Customer Web Sites
evelyn (Evelyn Avila)

evol (Aaron Waller)
farrah (Bob Martinsen)
fbutts (Fred Butts)
fcornell (Fredric R. Cornell)
feminist (William Affleck-Asch)
fielding (Masako Goto)
fins (Tom Fallat)
firestar (Ryan Wellman)
fjminch (Jean Minch)
fkronin (Family Karate / Ronin Dojo)
flemco (The Fleming Co.)
flemdata (Flem-Data)
flint (Steven Flint)
foont (Christopher DeLaurent)
foxtrot (Rick Hedman)
frankr (Frank Rosin)
fred (Fred McIlroy)
free800 (Philip Davenport)
freeman (Tiina Freeman)
fritzsch (David Fritzsche)
froglips (Jim Campbell)
frost (Mike Frost)
futon (Benedict Lumber Co)
fwinv (Fred Waters)
fyl (Phil Hughes)
fyrfytr (Scott Schroeder)
gadota (Gary Taylor)
gail-jon (Jonathan Schwarz)
gale (Gale Hey)
gallant (Daniel C. Gallant)
gallia (Elizabeth Price)
galt (Greg Alt)
gap (Gary Pickrell)
gatesj (John Gates)
gatse (Greg Tse)
gburlin (Gary Burlingame)
gclaunch (Gary Clauch)
gctaylor (Gary Taylor)
gcto (gcto)
genew (Gene Wolford)
georgeh (George Hoehn)
georgem (George Marshall)
gexner (Glenn Exner)
Customer Web Sites
ggertley (Gary Gertley)

ghawk (Gary Hawkins)
ghines (Gary Hines)
ghost (Eric Ghost)
gigi (Maria Gomes da Silva)
gilet (Greg Gillette)
gjz (Harris/Zomers Interiors)
glennp (Glenn Peterson)
gluhm (Gary Luhm)
gmorris (Greg Morris)
gngoat (James Mattson)
gobob (Bob Anderson)
gofish (Thomas Eggerding)
gofs (the gofs)
gohabs (Roger Williams)
golfer-9 (Larry Staberg)
goody (Robert Goodlett)
gottlieb (Alan Gottlieb)
graham (Stephen Graham)
graphics (Michael Patterson)
greg (Greg Wickenburg)
gregwhit (Greg White)
griffee (Ann Griffee)
grohn (Gary Rohn)
guigaia (Guilherme Gaia)
guilty (Sanjay Rao)
gweiss (Gregory Weiss)
gypsy (Alexandrya Nguyen)
gyre (Matt Borselli)
h_top1 (Nina Mann)
haa (Marc Herrington)
haferman (Jeff Haferman)
hankboy (Henry Gordon)
harris (Harris Upham)
harryb (Harry Barnett)
heer (Nicholas Heer)
heirama (Heirama Fearon)
helary (JC Helary)
herald (Richard Herald)
herbal (Earth Changes TV Inc.)
hhagen (Harold Hagen)
hie (Ted Weiler)
higherel (Deborah Hall)
hinshaw (Kevin Hinshaw)
Customer Web Sites
hman (Neal Hoech)

hmiller (Harry Miller)
holts (William C. Holtsnider)
home (David Sutton)
hope (Dennis Shinn)
horton (Don Horton)
hottub (Robert Archer)
hrab (Irene Hrab)
htak (Henry Takeuchi)
hunter (Ben)
hvi (Health Vision Intl)
hwa (Jonathan Blake)
hwang (Ed Hwang)
hydra (Kathryn Vasquez)
hypatia (Mary Koch)
icebrkr (Andrew Krepela)
iceman (Tuan Huynh)
ihb (Inge Harding-Barlow)
impact2k (Jonathan Fletcher)
inquiry (Kona Coffee Country Studios)
interlak (Dave Bruels)
issues (Issues)
jackd (Jack DeGuiseppi)
jacquem (Jacque Marshall)
jah (John Alden Hall)
jamespr (James G. Prekeges)
jamiem (Jamie Miller)
jani (Jan Drabek)
janyce (Janyce.com)
jasmine (jasmine)
javila01 (Jesus Avila)
jazz (Mike Jamieson)
jbien (Jeff Bienstadt)
jblake (Jonathan Blake)
jbm (James Millard)
jbrandt (Jaclyn Brandt)
jcarlson (John Carlson)
jcauto (JC Auto Restoration)
jclarke (Janet Clarke)
jcrowe (Jonathan Crowe)
jcw (Earl Hulbert)
jefried (Jeff Friederichsen)
jendon (Don Moxley)
jeremyk (Jeremy Kercheval)
Customer Web Sites
jeri-ohs (Geri Flynn)

jerry (Jerry Stillwell)
jessamyn (Jessamyn West)
jesus1 (Jesus Avila)
jet (James Thiele)
jethaler (Jessica Zahn)
jevne (Jevne Academy Inc.)
jhines (Jerry Hines)
jibanes (Jerome Ibanes)
jilek (Bruce Jilek)
jill (Jillian Astra England)
jimb (Jim Beck)
jimg (James Gale)
jimsdetl (Kevin Dietz)
jjs-sbw (Joseph J. Simpson)
jjthomer (Julie Thomer)
jkeizer (Joelle Keizer)
jkm (Melvin & Sam)
jleck (Jeff Leck)
jlee (Jerri Lee)
jlg (Jerry Gray)
jlk (Jane Kakaley)
jlubin (James Lubin)
jmartin (John Martin)
jmccann (James McCann)
jmm (Joseph Mucci)
jmtt (James Takahashi)
jnbek (John Jones III)
joanne (JoAnne Bassett)
joelm (Joel McNamara)
johnbicc (John P. Greer)
johnbo (John D. Bowne)
johnl (John Laughlin)
johnnyb (Jonathan Bartlett)
johnnyc (John Franco)
johnws (John Saeger)
joji (Joseph Olatt)
joki (John King)
jonestr (Thomas Jones)
joyce (Joyce C. Wicks)
jpet (Jeff Petkau)
jpoland (John Poland)
jprice (John Price)
jrp (John Paulson)
Customer Web Sites
jrp2 (John R. Petersen)

jrs-lab (Larry Jorgensen)
jrterry (Jim Terry)
jsage (John Sage)
jseberg (John Seberg)
jsjnson (Joseph Johnson)
jskline (Jeffrey S. Kline)
jsrice (Steve Rice)
jstuart (James Stuart)
jtcarlsn (Jeff Carlsen)
juha (Juha M. Kuosmanen)
jurgens (Peter Jurgens)
jwallace (Judy Wallace)
jwalley (James David Walley)
jweale (John Weale)
jwilk (John Wilkinson)
jwright (Joel Wright)
jwstiles (James Stiles)
jzy (Jim Z. Yu)
kahoolie (Kevin Kiuchi)
kaking (Kristin King)
kalwfolk (JoAnn Mar)
kamotz (Kip Jurus)
kanyu (Wanita Reed Morris)
kapowsin (Geoff Farrington)
kawano (Brian Kawano)
kbrooks (Kathy Brooks)
kcampbe (Kerwin Campbell)
kdeanda (Kier DeAnda)
ken (Ken Paoletti)
ketking (Eric and Karen King)
kgtaylor (G.W. Taylor)
kiberkli (Klaus Berkling)
kilroy69 (http;//www.eskimo.com/~kilroy69/)
kim-hugh (Hugh Webster)
kiyoshi (Alex Glover)
klatskin (Jerome Klatskin)
klevin (Noah Romer)
kmcc (Kathy McCormack)
kmiles (K. Miles)
kmlitwin (Karen Litwin)
kmorgan (Kirk Morgan)
knapper (Carl Martin)
knye (Earth Changes TV Inc.)
Customer Web Sites
konac (Christine Sheppard)

kowalski (Kent Karnofski)
kpn (Kent Nyberg)
kramsay (Keith Ramsay)
krisn (Kris Nelson)
kristinm (Kristin Meijer)
ksh (Scott Hunziker)
ktlange (Kurt Lange)
ktsai (Chien Tsai)
kuperl (Larry Kuper)
kwan (Charley Watson)
kwright (Kevin Wright)
kye (KY Enterprises)
kylek (Eric Bradfield)
kyuss (Eric Erdmann)
l3 (Larry Bateman)
lang (Doug Lang)
largesse (Karen Stimson)
larrygj (Larry Johnson)
lberlind (Len Berlind)
lcsims (Leah Sims)
leeski (Lee Pinski)
leg (Leonard Gaspari)
leiba (www.eskimo.com/~leiba/)
leo (Matthew Wirkkala)
lfs (Andrew Newton)
liberty (Michael Justice)
lindy (Jon Lindemann)
lisa (Lisa Harris)
lisanne (Lisa Oberg)
littlek (Karen L. B. Newton)
lktr (John Nutting)
llt (Laurie Lynne Tucker)
lmacorp (Michael Collins)
logjam (Jamie Garner)
lonewolf (George Honey)
lorelei (Lorelei Stevens)
lostboy (Chris Force)
lsatin (Lisa Satin)
lsstrout (Linda Strout)
lwysa (Steve Webert)
lynnjohn (Lynn Rendler)
m1 (Mel)
maddjak (Jack Davis)
Customer Web Sites
maland66 (Gary Malandro)

mandolia (Mark Mandolia)
manor (Ron Manor)
marklan (Mark Landreville)
marksi (Mark Simonton)
markzimm (Mark Zimmerman)
martinm (Martin Mikelsons)
martys (Martha Schafer)
mascott (Mark Scott)
math (Marc Brittan)
mathison (Jon Mathison)
matth (Matt Horowitz)
maurerj (Jeffrey Maurer)
mcalpin (Duncan McAlpine)
mcbalch (Mike Balch)
mccallum (Malcolm McCallum)
mcclure (John McClure)
mcj (Michael Jacobs)
mdevour (Mike Devour)
megatron (Decepticon Leader)
meik (Margaret Kipp)
mfiorell (Mary Fiorello)
mfromm (Mark Fromm)
mgwa (Nicholas Zymaris)
mhorning (Mark Horning)
mighetto (Frank Mighetto)
mikeg (Mike Garrison)
mikej (Mike James)
milla (Colin Cushing)
millerd (Dave Miller)
millerk (Kathy Miller)
million (Laura Mill Karlin)
mind (Lance Campbell)
ministry (Craige McMillan)
minstrel (Jonathan Garrigues)
mirader (Bob)
miriam (Miriam Peterson)
misry (Ahmed Zayan)
missmay (Sandra Laughlin)
miyaguch (Darryl Miyaguchi)
mkkuhner (Mary K. Kuhner)
mlonge (Michael Longe)
mohundro (Steve Mohundro)
monchy (Ramona Saldamando)
Customer Web Sites
montek (Monte Kottman)

moorhous (Linda Moorhouse)
motion (Joe Giamona)
mpact (Jenny Griffee)
mschark0 (Mark Schark)
mscott (Michael Scott)
msharlow (Mark Sharlow)
mtnman (Erich Koehler)
mtq (Martin Totusek)
muaddib (Paul MuadDib)
muranaka (Lynn Muranaka)
murtaugh (Paul Murtaugh)
mvargas (Michael Vargas)
mvd (Maarten van Dantzich)
mvgb (Jo Masterson)
mvreid (Marge Reid)
mwirkk (Matthew Wirkkala)
myelitis (Transverse Myelitis Association)
n6mxr (Marvin Moye)
nach (Suzanne Barker)
nanook (Nanook)
napc (Jeffrey Money)
neads (Nancy Eads)
netman31 (Mike Manley)
neuharth (John Neuharth)
nevits (Steve Avenell)
newnet (NewNet Moderators)
newowl (James Sevin)
newsense (Dan Senn)
nica (Monica Stewart)
nickb (Nick Beaumont)
nickz (Nicholas Zymaris)
nicolem (Nicole Meyer)
njensen (Nancy Jensen)
noanswer (Lynn Hoskins)
noir (Karl Hill)
nokie (David Nichols)
nordkyn (Jane Riffle)
normb (Norm Berg)
northair (Frank Phinney)
novatech (Steven Swift)
npcomplt (Marc Brittan)
ntsales (Steven D. Swift)
nunatak (Laurie Connell)
Customer Web Sites
nwps (Northwest Performance Software)

nwrscca (Northwest Region SCCA)
oakesn (Norman D. Oakes)
oconner (Matthew D. O'Conner)
ofsound (Elizabeth Phillips)
ogre (Paul Hackett)
olds (Leland Olds)
ole (Ole Johnson)
omaha (Omaha Sternberg)
omega7 (OTE Productions)
onodop (Ted Johnson)
oolon (Jack Fleming)
ouija (John R. Schulz)
p34 (Eric Mrozek)
pablost (Annemarie Perez)
pal (William Bigelis)
pals (Beverly Schultz)
parents (Christian Bantzer)
parry (Allen Parry)
paton (Bill Paton)
patrick (Patrick Hennessey)
paulloeb (Paul Loeb)
pbarber (Putnam Barber)
pbender (Patricia Bender)
pburns (Paige Burns)
pcox (Paul Cox)
pdharold (Paul Harold)
pem (Paul Morford)
penny (Penelope Eckert)
pennyg (Penny Gudgel)
peoples (Barbara Whitt)
peteyboy (Pete Dussin)
pff (Jim Hawthorne)
phill (Phill Lowe)
phoenix5 (Ron Hagerman)
physmith (Phyllis Smith)
pl (Peter Lehrer)
pluvius (Ron Arnold)
pmaker (Brian Pickrell)
pmeagher (Peter Meagher)
pokorney (Kevin Pokorney)
pollybr (Polly Brown)
pparkin (Pat Parkin)
praxis (Chris Fischer)
Customer Web Sites
priester (Mark Priester)

pristine (Priscilla Showalter)
proteus (proteus)
psmt (Puget Sound Musical Theatre)
psohn (Josh Robbins)
pvgrafx (Milton Lawson)
pygmy (Frank Sergeant)
pygmygod (Mark Lewis)
pyrrhus (Daniel Charlson)
q256 (Richie Arendsee)
quester (Charles)
r1kk1 (Eric McClure)
racerx (Randy Shemwell)
ralphj (Ralph Johnson)
ram (Rick Morneau)
randym (Randall Moeller)
randyn (Randy Nevin)
rarnold (Ron Arnold)
raven (Sheryl Clough)
rayburn (Bob Rayburn)
rayeads (Ray Eads)
rcgusa (Mike Sinutko)
rcp (Robert C Peckham)
realtime (John Eisenhauer)
rebelgal (Susan Lewis)
rebrady (Robert Brady)
recall (David Brown)
redfoot (Donald Rothfuss)
redone (Steve Jackson)
refman (Steve Piercy)
renr (Ren Roderick)
retreat (Kathleen Golden)
rexpo (Raj K. Mathur)
rfall (Richard Fall)
rhafner (Richard Hafner)
rho (rho)
rhoskins (Robert Hoskins)
rhughes (Roy E. Hughes)
richb (Rich Branson)
richey (Mark Richey)
rickj (Rick Johnson)
riffraff (Riff Raff)
rip (Bruce Harper)
ripper (Ross Lippert)
Customer Web Sites
rkamp (Richard Kamp)

rkj (Ryan K. Johnson)
rkunz (2bux)
rmisaac (Richard Isaac)
robbieh (Robert Harris)
robertc (Robert Culver)
robertf (Robert M. Fleming Jr.)
robinia (Jean Lepley)
robla (Rob Lanphier)
rocshots (Jo-Marie Triolo)
roger (Roger Zauner)
romad (Robert Grutko)
roman (Bill Roman)
ron (Ron Oliva)
ronpark (Ron Parker)
rope (Dave Owens)
rstarr (Ron Starr)
rtnorman (Roger Norman)
rucker (Jeff Rucker)
ruthied (Ruth Dornfeld)
rwb (Richard Bean)
s1lvrsfr (Richard Beatty)
sal (Sally Laird)
samiam (Sandy Foy)
samoyed (Ronald Bowser)
sams (Ron Manor)
sandeep (Sandeep Mirchandani)
saunders (Ken Saunders)
saw (Stacy Waters)
sbg (sbg)
scarley (Scott Carley)
sciclub (James Burrows)
sclayt (Steve Taylor)
scottlu (Scott R. Ludwig)
scottmcf (Scott McFetridge)
scrufcat (Ron Critchfield)
scs (Steve Summit)
seanote (Gary Gibson)
seaotter (Jacki Bricker)
seldon (Will Mengarini)
sfinch (Rachel Aronowitz)
sfox (SFox)
sgl (Stephen G. Ladd)
sgw (Sarah G. Ward)
Customer Web Sites
shaman (Jeffrey Lilliquist)

shaneh (Shane Haddrell)
shea (Paul Shea)
shelby (Shelby Eaton)
shoshann (Soshanna Green)
shs (Bob Brady)
since (S. Schilling)
situboo (Steven Deal)
sjcarbon (Seth Carbon)
sjw (Steve Walcher)
skidder (Colleen Beckman)
skytouch (SkyTouch Communications)
slack (Loki)
smallnet (Bill Lee)
snoline (Brandy Burton-Tarantino)
solanik (Ray W. Solanik)
sonata (Heidi Weispfenning)
southern (S.C.S.)
spban (S Banerian)
spectrum (Jim Sheridan)
sryan (Sara Ryan)
sschorn (Scott Schorn)
starfury (Michael Zecca)
stargaze (Daniel Myers)
starkat (David Timmons)
stats (Patrick Rooney)
staylor (Scott Taylor)
stephnd (Stephanie Davidson)
stevemc (Steve McCallister)
steveny (Steven Yee)
stevet (Steven Thornton)
stigaard (S. Stigaard)
stjohns (Donald Mackay)
stosh (S A Janet)
strange (Kathleen Richardson)
strohs (Seth Stroh)
stu (Stuart Thompson)
studio (Moana Suchard)
sull (Mark Sullivan)
sunshine (Pratichi Mathur)
support (Eskimo North Support)
surplus (Robert Fleming)
swami (Swift Asset Managemen)
swilson (Scott Wilson)
Customer Web Sites
switters (Stephanie Witters)

swkw (Sharon Wright)
swuu (Saltwater UU Church)
tcrowe (Michael Crow)
tegan (Laura Gjovaag)
tejano (Jim Gibson)
telical (www.eskimo.com/~telical/)
tfreeman (Tim Freeman)
thal (Neil Pratt)
thayer (Lee Thayer)
the-wiz (A. R. Mascott)
thorpe (Steve Thorpe)
tiad (Tia Dewop)
tiktok (Eric Gjovaag)
timetot (Lynnae Dunham)
tinne (Susan Profit)
tintin (Frank Koning)
tlw (Tim Worcester)
toates (Christina Lui)
tomca (Thomas Carey)
topps (topps)
totem (Chris Collier)
tracyr (Tracy Rudzitis)
traphic (John Hansen)
trcurrie (Thomas Currie)
treetaxi (Wayne Lodahl)
triesch (Jon Triesch)
troys (Troy Skeels)
trunorth (True North)
tschaub (Tim Schaub)
tsvlasuk (Tamara Vlasuk)
ttca (Lamar Faulkner)
tupuna (Moana Suchard)
turbobrn (Leonard Hamilton)
turing (Computer Cryptology)
turnip (Robert Falk)
tweiler (Ted Weiler)
two (Tom Orme)
tyroneh (Tyrone Heade)
ultra (Douglas Brooks)
untitled (Raymond Cha)
valh (Valerie Horvath)
valkyrie (Susan Granquist)
van (Vassil Nikolov)
Customer Web Sites
vecna (Tim Morgan)

verne (Verne DeWitt)
victory (D. Victory)
vikram (Vikram Madan)
visions (James A. Tarantino)
vitas (Vitas Povilaitis)
vkapur (Vikas Kapur)
vme (Victor Eskenazi)
wallama (Sandy Stillwell)
warlock (Jim Richardson)
watt (Dan Harrison)
wayers (Wayne Rogers)
wayneld (Wayne Davidson)
waynem (Wayne Moddison)
wbfmc (Pastor Pat Vance)
wboyde (William Boyde)
wcarr (William Carr)
web (Money Recycler)
webguy (Al Wong)
websterj (Bruce Webster)
weidai (Wei Dai)
welsh (Gary Welsh)
wendell (Wendell Watanabe)
wendor (Chuck Gladu)
wex (David Weksler)
wfd (Dave Thornton)
wheelers (Dennis Wheeler)
wheelman (Paul Heeter)
wheelock (Kevin Wheelock)
whitby (Richard Grassy)
whitznd (Barbara Whitt)
wicklund (Thomas Wicklund)
wilder (Dan Wilder)
will (William McDonald)
willie-6 (William Metzger)
willow (Tonya Yoder)
witinc (Rob Witsoe)
wix (Dennis G. Wicks)
worden (Eric Worden)
wormey (Space Case)
writer (Robert Vaughn Young)
wsj (John Wert)
wuster (Willis Wu)
wwscc (Alan Dahl)
Customer Web Sites
wyiwndr (Rich Davis)

xanadu (Stephen Ellis)
xcitor (~~xcitor~~)
xeno (Xeno Campanoli)
xtaline (Stephen Bard)
yft (Steven D. Yee)
yho (Yue-shun E. Ho)
yijing (Jonathon Blake)
youngpup (Michael Carroll)
zaarain (Mike Warning)
zafaran (Patricia A. Swan)
zap (Randell Spreadborogh)
[ Home | Search | Random URL | Finger ]

[ Comments To the Author of this Script ]
Eskimo North: Customer Businesses & Organizations
If you are an Eskimo North subscriber, and you would like to add your business or organization to our directory,
please send the following information to support@eskimo.com.
Name
Organization or Business Name
URL for the site you want linked
URL for your logo
The (existing or new) category you would like us to list your link under
A short description of your business/organization
NOTE: Your logo MUST be no bigger than 100x100 pixels.
http://www.eskimo.com/businesses_and_organizations/ (1 of 3) [22/07/2003 5:58:18 PM]
Eskimo North: General Information
Have you ever wondered where the name "Eskimo North" came from?
Why does the owner call himself "Nanook?"
Here's your chance to find out the answer to those questions and
to learn more about the origin and evolution of Eskimo.
Learn more about the owner of Eskimo.
What type of hardware is used here?

Follow this link to find out exactly what comprises the system.
http://www.eskimo.com/general/ [22/07/2003 5:58:24 PM]
Eskimo North: Technical Support
Dialup Settings
[Linux] [OS/2] [Mac] [Windows] [Dreamcast]
Server Settings
[Web Hosting] [PostgreSQL] [Email] [Usenet News]
NOTE: This section is still under construction.

Please do NOT email us about broken links in this particular section.
If your operating system is not here yet, please accept our apologies.
Thank you for your patience.
Eskimo North support is available by phone from 7:00am until 11:00pm
Monday through Friday and 11:00am until 9:00pm on Saturdays and Sundays.
You can reach us at (206) 361-1161 or toll free at (800) 246-6874.
You can also reach us by FAX at (206) 368-9048 or by snail-mail at:
Eskimo North
P.O. Box 55816
Seattle, WA 98155
If your browser does not support forms, please send mail directly to:
support@eskimo.com
http://www.eskimo.com/support/ (1 of 2) [22/07/2003 5:58:26 PM]
Eskimo North: Technical Support
Your Name:
Your Email Address:
Your Message to Eskimo North Support:
BCC Yourself
Clear Form
(reset the form)
http://www.eskimo.com/support/ (2 of 2) [22/07/2003 5:58:26 PM]
Send Mail
(send your mail)
Eskimo North Technical Support: Linux Setup
Connecting to Eskimo North with Linux
If you have Linux, connecting to the Internet is simple... Ok, after you get through
rolling around the floor we'll tell you how.
Requirements
Configuration
Dialing
Extra Tweaks
Alternately, O'Reilly & Associates Inc. has made chapter 6 ("Dial-out PPP Setup")
of their Using & Managing PPP book available on their website, including Linux
instructions:
http://www.oreilly.com/catalog/umppp/chapter/ch06.html
Requirements
First, if you need help properly configuring your modem, please see
http://www.redhat.com/mirrors/LDP/HOWTO/Modem-HOWTO.html to get your
modem properly configured. After that, it actually is pretty simple.
You must be sure you have a kernel with PPP support. Typically, kernels will have
PPP configured in as a module. If this is the case, you will need to type modprobe
ppp.o to load the module.
You will need copies of (both come with most recent distributions):
http://www.eskimo.com/support/Linux.html (1 of 5) [22/07/2003 5:58:30 PM]
pppd -- for kernel 2.0.x or higher, pppd 2.3.4 or higher.

wvdial -- compiled/installed source or installed i386 RPM.
With these utilities, dialup connections can be setup either through the shells (as
root) or with graphical front-ends (in which case the rest of this page can be
bypassed) such as:
GNOME-PPP
KPPP
Another method of connecting with Linux uses 'pppd' directly. These instructions
are available as Linux-pppd.html.
Configuration
To edit configuration files manually, or to connect without running X in the above
manner, the files you will be concerned with are:
/etc/resolv.conf
/etc/wvdial.conf
Edit /etc/resolv.conf to look like this:

search eskimo.com #
nameserver 204.122.16.8
nameserver 204.122.16.9
To create the /etc/wvdial.conf file with the proper initial settings, you'll need to run
the following command (as root):
/usr/bin/wvdialconf
(Your distribution or installation may have placed the command in other locations;
if the above is "not found", try it under /usr/local/bin or others.)
Now edit /etc/wvdial.conf to change the following lines, removing the leading
colons and replacing your access number, username, and password in the
appropriate locations:
; Phone = <Target Phone Number>
; Username = <Your Login Name>
; Password = <Your Password>
An example completed edit is below; your information may be different. Also

please add the line "Auto Reconnect = off" in the Defaults section. We'll show you
how to automatically redial later.
[Dialer Defaults]
Baud = 115200
Modem = /dev/ttyS1
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 S11=55 +FCLASS=0
Phone = 555-5555
Username = en55555@usb.com
Password = mypassword
Auto Reconnect = off
You can place multiple access numbers or accounts in the same file, in order to
setup alternate numbers, etc. To do this, simply add the following set of lines to
the file, choosing a name for the alternate connection:
[Dialer Alternate]
Phone = 555-1234
Username = en55555@usb.com
Password = mypassword
Since this file has passwords in it, be sure to set the permissions on this file so
that only root can read it:
chown root /etc/wvdial.conf
chmod 600 /etc/wvdial.conf
Dialing
wvdial works differently than pppd itself; when run from a shell, it will continue to
log information from the connection to the shell window and terminate the call
when the user sends a break signal ("^C").
To run the connection manually, open a shell as root, and type:
/usr/bin/wvdial
If all is working, you will see the modem communication and connection
information on the terminal. Leave this terminal running (perhaps minimized if
running under X), and to disconnect the connection, come back to this terminal
and type "^C" (Ctrl-C).
To connect an alternate connection setting, add the name of the alternate after the
command (case-insensitive):
/usr/bin/wvdial alternate
Extra Tweaks
Automatic Redial
Since auto-redial can play havok with the lines when you're not actively using the
connection (idle disconnect, redial, idle disconnect, redial, etc.), please place the
mentioned line in the "Defaults" section above to disable it and use a separate
section to specifically enable it when you're closely montitoring the connection:
[Dialer Redial]
Auto Reconnect = on
In this way, the following will dial the same "Default" or "Alternate" numbers and
username/password combinations, but will redial if disconnected:
/usr/bin/wvdial redial
/usr/bin/wvdial alternate redial
Remember to "kill" the session ("^C") if leaving the computer idle or unmonitored
for long periods, as this would then tie up your own phone lines as well as ours.
[Linux Links]
[Home]
Eskimo North: Linux and Unix Resources Links
http://www.apache.org/
Apache HTTP Server Project
Web server we use under UltraLinux
http://www.applix.com/
Applix
Applix makes an office suite for Linux.
Linux and
Unix
Resources
Technically and legally
speaking Linux is not Unix.
However, we run both side
by side, the operational
differences are not
significant. This section
contains links for resources
applicable to both.
If you have
recommendations for sites to
be linked to this section or if
you run into a broken link,
please e-mail
nanook@eskimo.com.
The web server providing
this document to you is a
Sun Ultra-1 with a single
http://linux.corel.com/linux8/index.htm
Corel
Corel Word Perfect 8 for Linux
http://www.cygnus.com/
Cygnus Solutions Development Tools
ANSI C and Java Development Tools for Linux
http://www.debian.org/
Debian
Debian Linux Distribution Home Page
http://www.eskimo.com/~dwalker/manpage.html
Eskimo North Man Pages
Manual Pages for Unix and Applications
http://www.qsl.net/f3wm/
F3WM Web Site
Amateur Radio, Linux for Novices, Humour... and more. This site
is mostly in French
http://www.gnu.org/
GNU software
GNU Free Software Foundation
http://howto.linuxberg.com/faq.html
HowTo!
TuCows Linux Resource Center
http://www.eskimo.com/links/linux.html (1 of 5) [22/07/2003 5:58:34 PM]
200mhz UltraSParc CPU

and 1GB of RAM. It
averages around 2-1/2
million hits per day and is
fully capable of saturating
three T1's at about 20% CPU
occupancy. It is running
UltraLinux kernel release
2.2.21.
http://java.sun.com/
Java Home Page
Sun's Home Page for Java Language
http://www.wagoneers.com/UNIX
John's pages on Unix, Linux, and System Administration
John Meister provided many of the URL's on this site. There are a
lot of good resources at this site but John apparently dislikes
writing Index files so you'll have to navigate through directories.
http://linux3d.netpedia.net/
Linux 3D
A page for 3D modeling and rendering enthusiasts using Linux
If you find this information

useful, please consider
supporting our site by taking
advantage of some of our
services.
You can get a shell account
here, with web hosting for
only $96/year, 56k dial-up
access for only $180/year,
ISDN for only $360/year.
Apply for a Two-Week
Free Trial!
No annoying
Geocities/Tripod style popups plauging people who
visit your web site here. No
data export fees.
Please click on Eskimo
North at the top of the page
to visit our home page.
http://www.kalug.lug.net/linux-admin-FAQ/Linux-AdminFAQ.html
Linux Administrators FAQ
This is the list of Frequently Asked Questions on the linux-admin
mailing list at vger.rutgers.edu.
http://linuxcentral.com/
Linux Central
Mostly commercial Linux resources
http://metalab.unc.edu/LDP/
Linux Documentation Project
The Linux Documentation Project (LDP) is working on developing
good, reliable documentation for the Linux operating system.
http://www.cl.cam.ac.uk/users/iwj10/linux-faq/index.html
Linux Frequently Asked Questions with Answers
This is the list of Frequently Asked Questions about Linux, the free
Unix for 386/486/586
http://www.linuxgazette.com/
Linux Gazette
Making Linux just a little more fun and sharing ideas and
discoveries.
http://www.linux-howto.com/
Linux How-To
Linux Resource Center
http://www.li.org/
Linux International
Linux International is a non-profit association of groups,
corporations and others that work towards the promotion of and
helping direct the growth of the Linux operating system and the
Linux community.
http://www.linuxjournal.com/
Linux Journal
The Monthly Magazine of the Linux Community.
http://www.linuxbazis.hu/en/
Linux Links Base
Organized Links to Linux Resources on the Web.
http://www.croftj.net/~goob/
Linux Links by Goob
Another collection of links to various Linux resources
http://www.linuxmall.com/
Linux Mall
Commercial Linux related Products and Resources
http://www.linux-me.org/
Linux Middle East
A place where Linux resources can collaborate to create an
infrastructure to help your own Middle East Linux community
while providing a help and service to all other Linux users from all
over the world.
http://www.linuxnow.com/
Linux Now
Linux Reference
http://www.linux.org/
Linux Online
Almost everything you wanted to know about Linux
http://www.internetwk.com/linux
Linux Resource Center
Internet Week's Linux Resource Center - Good to see the
mainstream media is acknowledging that there IS an alternative to
NT.
http://www.ecst.csuchico.edu/~jtmurphy/
Linux Security Homepage
Lots of good Linux security information
http://webwatcher.org/
Linux Web Watcher
Watching 15 Categories, 69 Groups, and 466 Pages. Automatically
updated daily.
http://www.performance-computing.com/Linux-IT/
Linux-IT
News, columns, and exclusive features focusing on Linux and its
role in your enterprise.
http://www.linuxweb.com/
LinuxWeb
Linux Information
http://www.redhat.com
RedHat
RedHat Linux distribution home page. We use RedHat Linux on
our Sun Sparc and PC systems here.
http://www.redhat.com/mirrors/LDP/HOWTO/
RedHat
RedHat Linux HOWTO archives; manuals and tutorials.
http://www.slackware.com/
Slackware
Slackware Linux distribution home page.
http://www.ssc.com/
Specialized Systems Consultants, Inc.
Great pocket references on various Unix related topics, books, and
numerous other publications. They really helped us out early on by
supplying boxes of references free that we were able to give out to
our customers. Lots of good Linux and Unix info.
http://www.samag.com/
Sys Admin
The journal for Unix system administrators
http://www.gimp.org/
The Gimp
GNU Image Manipulation Program - Most of the graphics here
were produced with Gimp.
http://www.cs.uml.edu/~acahalan/linux/22wishlist.html
The Linux 2.1/2.2 Wish List
Tradition requires a great big list of features that might be nice for
Linux to have. Well, here is one collected from the Linux kernel
mailing list.
http://www.ultralinux.org/
UltraLinux HomePage
Linux for Sparc Processors
http://pubweb.bnl.gov/people/hoff/
Unix MIDI Plug-In
MIDI Plug-in for Netscape running under Linux/Unix
http://www.vierra.net/linux.htm
VIERRA.Net
Builds Linux Servers and has links to Linux Resources
http://vancouver-webpages.com/vanlug/
Vancouver Linux Users Group
Linux Users Group located in Vancouver BC
http://linuxresources.com/what.html
What is Linux?
A brief overview of Linux by the Linux Journal
http://www.yggdrasil.com/
Yggdrasil
Support for Free Software
http://egcs.cygnus.com/
egcs project home page
New maintainers of GCC Compiler
Eskimo North Randodex
CAROLINE'S
2002
CALENDAR
January
February
March
April
May
June
July
August
September
October
November
December
EMAIL Caroline
CAROLINE'S 2002 CALENDAR
http://www.eskimo.com/customers/random.html [22/07/2003 5:58:52 PM]
Eskimo North: Customers
Click on the rolodex image above to access a random Eskimo North User Page.
NOTE: This link will bring up a frames version of the randodex. This will allow you
to hit one random URL after another - simply by clicking on the "next" link.
However, this will not allow you to readily bookmark a site you get to. (If you have a
recently released browser (e.g. Netscape Communicator) you should be able to
right click inside the main frame to bookmark that frame). To bring up the randodex
in a site without frames, just use the "Random URL" link at the bottom of this page.
We also have a CUSTOMER BUSINESS AND ORGANIZATION DIRECTORY.
[ Search | Directory | Random URL | Finger ]
http://www.eskimo.com/customers/ [22/07/2003 5:59:03 PM]
Linux Friendly 56k V.90 and Dual-56k (112k MLP) Dial Access
Included Services
56K or 112K dial access includes, Shell access, E-mail, 2-mail lists, Usenet News, web hosting with CGI and
Apply for a FREE two week 56k dial access trial.

56k/112k V.90 Dial Access Rates
Dial Access
Hours
Chans Speed Month
Quarter
Half-Year One Year Two Year
Subscription Term
20MB
20MB
20MB
50MB
110MB
Disk Space Included
56k Dial Plan A 150/month
1
56k
US $22
US $60
US $108
US $180
US $336
StarPOP/FlexPOP
150/month
56k Dial Plan B 1
56k
US $22
US $60
US $108
US $180
US $336
6/day
UUNet
56k Dial Plan C Unlimited
1
56k
US $22
US $60
US $108
US $180
US $336
MegaPOP
112k Dial Plan D Unlimited
2
112k
US $44
US $120
US $216
US $360
US $672
Telia
1
56k
US $22
US $60
US $108
US $180
US $336
56k Dial Plan E - Telia Unlimited
56k and 112k Dial Access Plan Differences

Dial access plan A, B, C and E are all single channel 56k dial access.
Dial access plan D provides dual channel 112k dial access.
If you are a heavy user, a telecommuter for example, or if you download large files, please choose
plans C, D, or E. Dial plan C allows an 8 hour session, all others have a 4 hour session limit. UUNET
considers more than 150 hours/month or 6 hours/day network abuse. FlexPOP considers more than
150 hours/month dedicated access.
If you send e-mail with large attachments, I would recommend plan C which allows connections to our mail
server which will accept up to 10MB attachments. Otherwise you'll need to use Star Internet's mail server which
will not accept large attachments or more than 30 recipients.
http://www.eskimo.com/services/ (1 of 3) [22/07/2003 5:59:09 PM]
Canada Dial Access Numbers

Alberta
British Columbia
Manitoba
Ontario
Quebec
Arkansas
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Florida
Indiana
Maine
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Colorado
Georgia
Iowa
Maryland
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
United States Dial Access Numbers

Alabama
Connecticut
Hawaii
Kansas
Massachusetts
Montana
New Mexico
Oklahoma
South Dakota
Utah
Arizona
Delaware
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Dynamic DNS - Alternative to Static IP

We are presently unable to offer static IP addresses on dial access. Dynamic DNS services which map a
dynamic IP to a static hostname provide an alternative to a static IP address for many functions such as ETRN
polling, Net Meeting, and some VPN applications. You register a hostname with these services and download a
small client which runs when you establish your dialup connection. This contacts their host and then they map a
static hostname to the IP address you were assigned. Most of these services are free of charge. A partial list of
dynamic DNS services: d2g.com, dns2go.com, dnsq.org, dyndns.org, dynip.de, dynodns.net, dynup.net,
hn.org, myip.org, and orgdns.org.
Setup
You will be called to confirm the information on your free trial account application. A 56k or dual-56k dial access
Username and Password will be e-mailed to you at the address specified on your application after successful
confirmation. The dial access Username and Password is used only for establishing your dial-up connection.
The login and password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP
read/post access, shell access, ftp access to your account, etc.
Linux
Windows 95/98
MacOS
Acceptable Use

Internet hosts.
56K, ISDN Dialup Access, Linux Friendly Internet, Alberta
Alberta 56k, Dual-56k, and ISDN Dialup Access Numbers

Apply for a Free Two-Week Trial
Dial Plan L
City
Number Protocol Channels Network
403-4101065
780-990Edmonton
1234
403-380Lethbridge
5325
Medicine 403-528Hat
1900
403-309Red Deer
1100
Calgary
Monthly
User Realm
Hours
V.90
Uniserve Unlimited ...@eskimo.com
V.90
V.90
V.90
V.90
Worldcom150/month ...@eskimo.com
Canada
Canada
Canada
Dial Plan W
City
403-781V.90
5200
780-423Edmonton
V.90
5600
403-380Lethbridge
V.90
5325
Calgary
1
1
1
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Canada
Canada
http://www.eskimo.com/services/alberta.html (1 of 2) [22/07/2003 5:59:33 PM]
56K, ISDN Dialup Access, Linux Friendly Internet, Alberta
Medicine 403-528V.90
Hat
1900
403-309Red Deer
V.90
1100
1
1
Canada
Canada
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
56K and ISDN access Includes:
Unix shell access with a full C, C++, Fortran,

Java (jdk), Perl (version 4 and 5)
development environment. X-windows
applications including many image
processing facilities such as Gimp, xv, etc.
Your choice of sh, csh, ksh, tcsh, bash, or
zsh. Includes access to at, batch, cron, and
nohup.
Web Hosting including the ability to run your
own CGI programs, SSI (server side
includes) except for exec, and an SSL server
for secure transactions.
Usenet News NNTP read/post as well as online news readers tin, nn, tass, trn, rn, and
readnews. Uqwk is also available for off-line
news reading using SOUP, Zipnews, or Qwk
format.
Two e-mail lists per account using Smartlist
which provides complex processing
capabilities through the use of Procmail
rulesets.
Two IRC bots per account. There is a server
dedicated to running Bots and BNC,
chat.eskimo.com, on which these must be
run.
http://www.eskimo.com/services/alberta.html (2 of 2) [22/07/2003 5:59:33 PM]
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dialup Access Search

3 matches found.
Dial Plan L
City
403-410V.90
1065
403-781Calgary
V.90
5200
Calgary
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Calgary
403-781V.90
5200
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=AB&c=Calgary (1 of 2) [22/07/2003 5:59:42 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=AB&c=Calgary (2 of 2) [22/07/2003 5:59:42 PM]
Linux
Mac
OS2
Windows
NNTP Read / Post and Shell Usenet News Service
NNTP Read / Post and Shell Usenet News Access
For Individuals
NNTP read / post Usenet access is provided with remote, 56k, dual 56k, ISDN dial
access, DSL, and frame relay accounts. Individuals looking for Usenet read/post
access should choose one of these account types. If you get internet access
elsewhere, consider a remote shell account. With frame relay and business class
DSL, access is provided for an entire domain.
For Organizations
NNTP Read/Post Access (Per Domain)
Month
Quarter
Half-Year
One Year
Two Year
$70
$180
$330
$600
$1080
We provide NNTP read/post access to other sites, ISPs, businesses, BBSs, and
other organizations, for $600/year, or $180/quarter, or $70/month per domain. If
you're looking for a way to provide Usenet News services to your customers
without the expense of maintaining your own news server and you are looking for
a solution that doesn't impose a per subscriber or per megabyte charge consider
using our news service as an option.
Server Hardware
Our news server is a quad-CPU (4 CPU) Sun 4/670MP equipped with 384MB
RAM, Ross-RTK625 90mhz Sparc processors and a Performance Technologies
SBS450 ultra-wide disk controller for the news spool drives. We presently have 4
Seagate 47GB ultra-wide SCSI drives stripped for the news spool. After formatting
and file system overhead, this provides a 180 gigabyte spool.
Server Software
http://www.eskimo.com/services/nntp.html (1 of 3) [22/07/2003 5:59:48 PM]
We are using INN version 2.2.2 news server software running under Linux version
2.2.14.
Planned Upgrades
We are planning to upgrade the CPU to an UltraSparc system in the near future
(hardware is purchased, here, and we are in the process of getting it setup).
Newsgroups and Retention
We carry a full news feed with more than 64,000 news groups. There is a two day
retention on alt.binaries, a two week retention on other groups, with the exception
of local groups and some groups used for FAQ's which are retained for one
month, and test groups and control groups which are retained for a fraction of a
day.
Newsfeeds
We have a satellite and terrestrial feed from Cidera, (aka
Skycache) and we also have a newsfeed from Sprint. I can't
say enough good things about the folks from Cidera. The
installation was painless and it has worked flawlessly. They
are the most pleasant and technically competent folks I've encountered in all my
professional dealings. Approximately 85% of the news articles arrive via Cidera
and the bulk of remaining 15% arrive from Sprint.
News Related Shell Services
News Readers
Nn
Nn-Tk
Pine
Readnews
Rn
Tass
Tin
Trn
Vnews
Xnews
News Posters
Postnews
Pnews
Offline Reading Binary

Extraction
Formats
QWK
(uqwk)
SOUP
(uqwk)
ZipNews
(uqwk)
Aub
There is a large collection of other utilities that can be used for news manipulation
and processing. All shell accounts include crontab access so you can do things
like schedualing automatic binary extraction and ftp of those files to your system.
There are utilities for gatewaying news to mail, image conversion and
manipulation, text processing, etc.
Try it Free!
Apply for a Free Two-Week Trial of our remote shell account or dial or ISDN
access service. This will allow you to try out the news server and our other host
services without any cost or obligation. Just click on "Free Two-Week Trial" and fill
out the trial application today.
1B and 2B ISDN Access, We are Linux Friendly! USA, Canada
Linux Friendly 1B and 2B ISDN Access

Included Services
1B or 2B ISDN access includes, Shell access, E-mail, 2-mail lists, Usenet News, web hosting with CGI and
Apply for a FREE two week 1B or 2B ISDN access trial.

1B or 2B ISDN Access Rates
ISDN Subscription Term Chans
Speed
Month
Quarter
Half-Year One Year Two Year
20MB
20MB
20MB
50MB
110MB
Disk Space Included
1
56k/64k
US $22
US $60
US $108
US $180
US $336
Dial Plan A - StarPOP
1
56k/64k
US $22
US $60
US $108
US $180
US $336
Dial Plan B - UUNet
1
56k/64k
US $22
US $60
US $108
US $180
US $336
Dial Plan C - MegaPOP
2
112k/128k
US $44
US $120
US $216
US $360
US $672
Dial Plan D - Telia
1
56k/64k
US $22
US $60
US $108
US $180
US $336
Dial Plan E - Telia
2
112k/128k
US $44
US $120
US $216
US $360
US $672
Dial Plan F - StarPOP
ISDN Plan Differences

ISDN Access plan A, B, C, and E are all single channel 56k dial access or 1B 64k ISDN access plans. Dial plan
D and F are dual channel 112k dial access or 2B 128k ISDN access plans. I recommend using the least
expensive plan that provides access numbers in the areas you need and provides the number of channels,
single or dual, that you need.
Canada Access Numbers

Alberta
British Columbia
Manitoba
Ontario
Quebec
Arkansas
Illinois
Louisiana
Minnesota
California
Florida
Indiana
Maine
Mississippi
Colorado
Georgia
Iowa
Maryland
Missouri
United States Access Numbers

Alabama
Connecticut
Hawaii
Kansas
Massachusetts
Arizona
Delaware
Idaho
Kentucky
Michigan
http://www.eskimo.com/services/isdn.html (1 of 3) [22/07/2003 5:59:55 PM]
Montana
New Mexico
Oklahoma
South Dakota
Utah
Nebraska
New York
Oregon
Texas
Washington
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Dyanamic DNS - Alternative to Static IP

We are presently unable to offer static IP addresses. Dynamic DNS services which map a dynamic IP to a
static hostname provide an alternative to a static IP address for many functions such as ETRN polling, Net
Meeting, and some VPN applications. You register a hostname with these services and download a small client
which runs when you establish your dialup connection. This contacts their host and then they map a static
hostname to the IP address you were assigned. Most of these services are free of charge. A partial list of
dynamic DNS services: d2g.com, dns2go.com, dnsq.org, dyndns.org, dynip.de, dynodns.net, dynup.net,
hn.org, myip.org, and orgdns.org.
Setup
You will be called to confirm the information on your free trial account application. An ISDN Username and
Password will be e-mailed to you at the address specified on your application after successful confirmation.
The ISDN Username and Password is used only for establishing your dial-up connection. The login and
password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP read/post access,
shell access, ftp access to your account, etc.
Linux
Windows 95/98
MacOS
Acceptable Use

Internet hosts.

3 matches found.
Dial Plan L
City
780-990V.90
1234
780-423Edmonton
V.90
5600
Edmonton
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Edmonton
780-423V.90
5600
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=AB&c=Edmonton (1 of 2) [22/07/2003 5:59:58 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=AB&c=Edmonton (2 of 2) [22/07/2003 5:59:58 PM]
56K, ISDN Dialup Access, Linux, Internet, British Columbia
British Columbia 56k, Dual-56k, ISDN Dialup Access Numbers

Dial Plan L
City
100 Mile
House
250-3952740
604-5040577
250-2866315
250-3654957
250-7882835
604-7952970
250-3346220
250-3385726
250-4170276
250-4026803
604-8945936
250-7826018
Abbotsford
Campbell
River
Castlegar
Chetwynd
Chilliwack
Courtenay
Courtney
Cranbrook
Creston
D'Arcy
Dawson
Creek
Monthly
User Realm
Hours
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Canada
V.90
V.90
V.90
V.90
V.90
http://www.eskimo.com/services/britishcolumbia.html (1 of 5) [22/07/2003 6:00:06 PM]
Duncan
Fernie
Fort St.
John
Grand
Forks
250-715V.90
1841
250-529V.90
7533
250-785V.90
8613
250-4422752
604-869Hope
3508
250-374Kamloops
9595
250-762Kelowna
8623
250-632Kitimat
6497
250-256Lillooet
0379
250-997Mackenzie
5762
250-378Merritt
3454
250-753Nanaimo
8497
250-352Nelson
2061
250-468Parksville
5759
250-492Penticton
8539
Port
250-724Alberni
6045
Prince
250-562George
2160
Prince
250-627Rupert
5334
250-295Princeton
0312
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
250-9925936
Salt Spring 250-537Island
8795
604-740Sechelt
8249
250-847Smithers
8013
604-892Squamish
5793
250-635Terrace
4624
250-368Trail
6721
604-688Vancouver
1746
250-567Vanderhoof
4831
250-542Vernon
9562
250-361Victoria
4138
604-932Whistler
4094
Williams 250-392Lake
4099
Quesnel
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Dial Plan W
City
604-557V.90
5780
604-702Chilliwack
V.90
3080
250-334Courtenay
V.90
6220
Abbotsford
1
1
1
Monthly
User Realm
Hours
Canada
Canada
Canada
250-5715680
250-717Kelowna
2380
250-741Nanaimo
9470
Prince
250-614George
7980
604-609Vancouver
3300
250-503Vernon
3330
250-361Victoria
9222
Kamloops
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Canada
Canada
Canada
Canada
Canada
Canada
Canada

nohup.
format.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac

rulesets.
run.
OS2
Windows

1 matches found.
Dial Plan L
City
100
250-395Mile
V.90
2740
House
Monthly
User Realm
Hours
Dial Plan W
City
Monthly User
Hours Realm
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=BC&c=100_Mile_House (1 of 2) [22/07/2003 6:00:09 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=BC&c=100_Mile_House (2 of 2) [22/07/2003 6:00:09 PM]
Eskimo North Technical Support: Macintosh Setup
Connecting to Eskimo North with Macintosh
Internet Setup Assistant

(used in iMac and G3 systems (9.x))
MacTCP and ConfigPPP
(used in older pre-7.5.3 systems)
FreePPP FAQ
Rockstar Software
MIT Info-Mac HyperArchive
Massachusetts Institute of Technology
UMich Mac Archive
University of Michigan
Helpful Tools
Fetch FTP Client
Dartmouth College
BetterTelnet
Sassy Software
Telnet3
Kevin Grant
http://www.eskimo.com/support/MacOS.html (1 of 2) [22/07/2003 6:00:11 PM]
Eskimo North Technical Support: Macintosh Setup
[Home]
http://www.eskimo.com/support/MacOS.html (2 of 2) [22/07/2003 6:00:11 PM]
Eskimo North Technical Support: Mac Internet Setup Assistant
Connecting to Eskimo North using Internet Setup Assistant
The following are comprehensive instructions on using the iMac and G3 "Internet
Setup Assistant" to connect to Eskimo North.
The included screenshots walk through the setup for 56k dialup connections. For
connections on the slower-speed lines (Western WA only), a dialin script is
needed, and we'll be happy to walk you through the recording and use of the
script(s) either in email to support@eskimo.com or at the support lines 206-8120051 or 800-246-6874.
Step 1: Starting Internet Setup Assistant

Find the Internet Setup Assistant in the "Assistants" folder of the hard drive, and
double click the icon to start it. Click two "Yes" buttons and one right-arrow button
to get beyond the startup screens, which look like the following:
http://www.eskimo.com/support/MacOS/ISA.html (1 of 15) [22/07/2003 6:00:32 PM]
Step 2: Connection Type and Modem Settings

Enter a name for the connection (such as "Eskimo North") for the connection, and
select the modem to use in the next two windows. Click the right-arrow button to
continue on each:
Step 3: Phone Number, Username, Password, and Script

Enter the phone number, username, and password for the account. 56k and ISDN
accounts have assigned login/password pairs. You'll want to use that here. Phone
numbers for access can be found from our homepage, www.eskimo.com. Click
the right-arrow to continue.
56k and ISDN dialup accounts need no script for the connection. Click 'No' here,
and click the right-arrow to continue.
Step 4: IP Address and Name Servers

IP addresses are assigned dynamically (they change each time you connect);
even if you subscribed for a static-IP (same each time), you can choose 'No' for
whether you have an IP, since the server will assign you the correct IP each time.
Click the right-arrow to continue.
Our Domain Name Server (DNS) are 204.122.16.8 and 204.122.16.9. You can hit
enter in the large box to get a second line. Leave "Domain Name" blank, and click
the right-arrow to continue.
Step 5: Email Settings

Your email address is based on the login name you chose when you signed up for
your account; it would be that login with @eskimo.com at the end. For example, if
you chose "login", your email address would be "login@eskimo.com". The
password would be the password you chose while signing up as well. Click the
right-arrow to continue.
Our POP3 server for downloading mail is called pop3.eskimo.com; here, you'll
need to place your login name and an @ symbol before that, as shown below. The
SMTP server for sending mail is mail.eskimo.com -- you may need to use
smtp.eskimo.com if this doesn't work on your connections. Click the right-arrow
to continue.
Step 6: Usenet News Settings and Proxies

Our NNTP host for Usenet news reading and posting is eskinews.eskimo.com
(also with an alias of news.eskimo.com). Click the right-arrow to continue.
We do not run any proxies on our dialups. Click 'No' here, and the right-arrow to
continue. One window to go...
Step 7: Finishing Up and Connecting

If you're all set to connect, you can click the "Connect when finished" box (or you
can leave it off if you want to connect later), and click the button labelled "Go
Ahead" to save the changes. If connecting with the slower-speed lines (Western
WA only), be sure to give us a call for the setup of the script you'll need as well
(found under Remote Access in the Control Panel).
That's it. You're now ready to connect. If you're not connecting during this setup,
or need to connect in the future, you'd need to go to the Remote Access control in
the Control Panel.
Happy surfing!
[Mac Setup]
[Home]
http://www.eskimo.com/support/MacOS/isa-intro1.gif
http://www.eskimo.com/support/MacOS/isa-intro1.gif [22/07/2003 6:00:34 PM]
http://www.eskimo.com/support/MacOS/isa-1.gif
http://www.eskimo.com/support/MacOS/isa-1.gif [22/07/2003 6:00:42 PM]
Eskimo North Technical Support: MacTCP and ConfigPPP
Connecting to Eskimo North using MacTCP and ConfigPPP
The following are comprehensive instructions on using the MacTCP (or TCP/IP)
found in the control panel and ConfigPPP. This combination is used commonly on
Mac systems before 7.5.3.
Step 1: MacTCP (TCP/IP) Settings

Click on the Apple symbol, then on "Control Panels". Then open MacTCP:
Click on the PPP icon. Then click on More. Our Name Server addresses are
204.122.16.8 and 204.122.16.9; the domain for each can be set to eskimo.com.
Other settings will be set by the server once you connect. Click OK to continue on.
http://www.eskimo.com/support/MacOS/ConfigPPP.html (1 of 5) [22/07/2003 6:01:25 PM]
Step 2: MacPPP (ConfigPPP) Settings

Click on the Apple symbol, select Control Panels, and click MacPPP (named PPP
or ConfigPPP on some systems):
Click the "New" button, and type "Eskimo North" as the connection name ("POP"),
and click OK. Then choose "Config". For "Port Speed", use a speed just larger
than your modem speed (28.8k should be 38400 or 57600; 56k should be
115200). "Flow Control" should be "CTS & RTS (DTR)". Access lines can be found
at www.eskimo.com.
On the slower-speed v.34 access lines (Western WA only), a connection script is

needed; click the "Connect Script" button and use the appropriate script below.
56k and ISDN connections do not need a dialup script.
Emulated PPP:
NOT NEEDED FOR 56k or ISDN
Real PPP (set by support upon request):

NOT NEEDED FOR 56k or ISDN
Click the "Authentication" button. The name and password to use is one of the
following pairs of information:
Your 56k/ISDN login and password for accessing the 56k/ISDN lines,
Your email login and Non-Dedicated PPP password for those with the PPP
setup on the v.34 lines (Western WA only), or
Your email login and email password for those using Emulated SLIP/PPP
on the v.34 lines (Western WA only) or through a remote shell.
Click OK, then Done. You will now be back to the ConfigPPP window, where you
can click the "Open" button to connect.
Happy surfing!
[Mac Setup]
[Home]
http://www.eskimo.com/support/MacOS/mactcp.gif
http://www.eskimo.com/support/MacOS/mactcp.gif [22/07/2003 6:01:27 PM]
http://www.eskimo.com/support/MacOS/tcp-address.gif
http://www.eskimo.com/support/MacOS/tcp-address.gif [22/07/2003 6:01:29 PM]
http://www.eskimo.com/support/MacOS/configppp.gif
http://www.eskimo.com/support/MacOS/configppp.gif [22/07/2003 6:01:31 PM]
http://www.eskimo.com/support/MacOS/ppp-server.gif
http://www.eskimo.com/support/MacOS/ppp-server.gif [22/07/2003 6:01:34 PM]
http://www.eskimo.com/support/MacOS/script-emu.gif
http://www.eskimo.com/support/MacOS/script-emu.gif [22/07/2003 6:01:38 PM]
http://www.eskimo.com/support/MacOS/script-real.gif
http://www.eskimo.com/support/MacOS/script-real.gif [22/07/2003 6:01:41 PM]
http://www.eskimo.com/support/MacOS/ppp-login.gif
http://www.eskimo.com/support/MacOS/ppp-login.gif [22/07/2003 6:01:45 PM]
Eskimo North Technical Support: OS/2 Setup
Connecting to Eskimo North with OS/2
Step 1: Login Info

Bring up the OS/2 Dialer. Typically, the path to the dialer is:
A. Open "IBM Information Superhighway"...
B. Open "IBM Internet Connection for OS/2"...
C. Open "Internet Utilities"...
D. Open "Dial Other Internet Providers"...
If you already have an Eskimo Dial-Up entry, click on "Modify Entry" and select the Eskimo
Entry. If you do not yet have an Eskimo Dial-Up entry, select "Add Entry"...
http://www.eskimo.com/support/OS2.html (1 of 7) [22/07/2003 6:01:57 PM]
Step 2: Connect Info

Enter "Eskimo North" in the "Name" field. You can type whatever you'd like into the
"Description" field. This will help you to remember which entry is which. For instance, if you
have an entry that dials into the Seattle number - and an entry that dials into the Bellevue
number, this will help you to distinguish one from the other.
Enter your Login ID (in all lowercase) and your password (which is case sensitive) in the
appropriate fields.
Enter your local Eskimo Dial-Up number.
The login sequence is very important. What you enter here depends on the type of account you
have.
If you have an Emulated SLIP/PPP Connection, you will need to enter the following login
sequence:
4
ogin:
[LOGINID]
word:
[PASSWORD]
ommand
i.s\s-P
eady
If you have an Non-Dedicated SLIP/PPP Connection, you will need to enter the following login
sequence:
p
name:
[LOGINID]
word:
[PASSWORD]
PPP
If you have purchased a 56k Dial-Up account, leave the login sequence field blank.
Under "Connection Type," select "PPP"...
The "Inactivity Timeout Option" doesn't really matter - because we have our own 12-minute idle
timeout here. Our modems will log you out after 12 minutes of inactivity.
Step 3: Connect Info

Enter "204.122.16.31" in the "Domain Nameserver" field.
If you have an Emulated SLIP/PPP Connection, enter "eskimo.com" in the "Your Domain
Name" field.
If you have purchased a 56k Dial-Up account, enter "yourlogin.ndip.eskimo.net" in the "Your
Domain Name" field.
Step 4: Server Info
Your "News Server" is "eskinews.eskimo.com"...

Your "WWW Server" is "www.eskimo.com"...
In the "Mail Server Information" section, enter the following information:
Mail Gateway: mail.eskimo.com
POP Mail Server: mail.eskimo.com
Reply Domain: eskimo.com
Reply [Mail] ID: your login name
POP Login ID: your login name
POP Password: your password
NOTE: Your login is the same as your Eskimo login name. Your password is the same as your
Eskimo password. If you have purchased a 56k Dial-Up account, use your other
(shell/emulated password) here.
Step 5: Modem Info

Check your modem settings!
If you need to find your modem INIT string, look in your modem manual or Ask Mr. Modem!.
Step 6: Try your connection!

That's it!
When you get connected, the words "Connector completed..." will display in the "Status"
window.
At that point, you can minimize the window and open any of your Internet Utilities (Netscape,
Telnet, etc.).
[Download OS2 Helpfiles] [OS/2 Links]
[Home]
http://www.eskimo.com/support/OS2/os2-1.gif
http://www.eskimo.com/support/OS2/os2-1.gif [22/07/2003 6:01:59 PM]
Eskimo North: SLiP/PPP Service
Emulated SLIP/PPP
Emulated SLIP/PPP is included with all shell accounts. Real SLIP/PPP is available
with all dial access accounts.
Emulated SLIP/PPP does not provide your host with it's own IP address and name on
the net. Consequently some services which require incoming connections to the host
can not be supported, especially if they involve connections to random ports.
Additionally the performance of emulated SLIP/PPP will vary with the load on the
server that is providing the emulation. With emulated SLIP or PPP all connections
appear to originate from our hosts.
Non-Dedicated SLIP/PPP
With V.34 dial-up service we offer non-dedicated SLIP/PPP. Non-dedicated simply
means that the service is not dedicated to any one modem, you connect through a
shared modem pool. We also offer dedicated PPP service in Western Washington that
provides a line and modem dedicated to your use and the ability to route multiple IP
addresses and maintain a full-time dialup connection.
Restrictions on Non-Dedicated SLIP/PPP Usage

Even though non-dedicated SLIP/PPP is technically capable of allowing you to provide
services to the Internet, it is not intended for that purpose and is priced accordingly. It
is intended to be a cost-effective alternative to emulated SLIP/PPP for outgoing client
usage only. Your machine name will be a sub-domain of eskimo.net, specifically
login.ndip.eskimo.net. This type of account provides a single IP address. If you wish to
provide any kind of service to the Internet directly from your machine you need
dedicated SLIP/PPP or frame relay service.
Non-Dedicated SLIP/PPP Setup Rates

We have eliminated the setup fee for non-dedicated SLIP/PPP. However, requests for
setup that are accompanied by payments will be processed at the same time the
payment is processed. Requests not accompanied by payments will be processed as
http://www.eskimo.com/services/SLIPPPP.html (1 of 2) [22/07/2003 6:02:10 PM]
Eskimo North: SLiP/PPP Service
time permits, which may be days or even weeks if things are busy.
This service is bundled into your v.34 dial access account. 56k, dual channel bonded
56k, and ISDN accounts connect via PPP only. For 56k and faster services dynamic
IP addresses are used. That means that each time you connect an IP address is
assigned for that session only. Static IP addresses are available for 56k and faster
services for an additional $5/month. Only one IP address can be routed and it is
restricted to your "home" POP.
http://www.eskimo.com/services/SLIPPPP.html (2 of 2) [22/07/2003 6:02:10 PM]
Eskimo North: Dedicated Dial-In Service
Dedicated SLIP/PPP
Dedicated SLIP and PPP allows full-time Internet connectivity.
Differences between dedicated and non-dedicated

SLIP/PPP
Non-dedicated SLIP/PPP does not allow you to maintain full-time connectivity to the Internet and
has a 12-minute idle timeout. With non-dedicated SLIP/PPP you are using the same modem
pool as other dial-in customers.
With dedicated SLIP/PPP you can remain connected to the Internet 24 hours per day, there is
no idle timeout and no monthly connect time limits. You have a modem and telephone line
dedicated for your use that is not shared with other customers.
With non-dedicated SLIP/PPP you are limited to one IP address and thus one host and an
assigned hostname that is a subdomain of login.ndip.eskimo.net.
With dedicated SLIP/PPP you may have multiple hosts (you need to tell us how many you will
have) and you may have a domain name of your choice BUT if you choose to use a domain
name outside of our domain you are responsible for the InterNIC's domain maintenance fees.
Please see domain registration information for details.
The non-dedicated SLIP/PPP is intended to be a cost effective alternative to emulated SLIP and
PPP as provided by tia and slirp. It is intended to allow a user to access services on the net but
is not intended to allow the user to offer net services to others and is priced accordingly.
Dedicated SLIP/PPP is intended for full-time connectivity, and can be used to make services
available to the net. It may be used by multiple hosts connected and multiple users
simultaneously. In short, it is a pipe with a given amount of bandwidth and what you do with it is
your business within the limits of the law.
http://www.eskimo.com/services/dedicated.html (1 of 2) [22/07/2003 6:02:13 PM]
Eskimo North: Dedicated Dial-In Service
Rates
There is a one-time setup fee of $100, if you wish a domain name that is not a sub-net of
eskimo.net, then there is an additional $70 fee charged by the NIC for domain registration. See
domain registration information for details.
Dedicated SLIP/PPP
Month
Quarter
$100
$270
Half-Year
$540
One Year
$1020
Two Year
$1920
There is also a fee of $100/month for this service, payment for the first month must be included.
Dedicated PPP is also available at a quarterly rate of $270/quarter and an annual rate of
$1020/year, $1920 for 2 years.
With this account you also get an interactive login here. If you have an existing login that you
would like to be included, please include your existing login. Otherwise please include a
preferred login and password. Note that logins must be 3-8 characters in length, start with an
alphabetic character and contain only alphabetic and numeric characters. Logins must be typed
in all lower case.
Passwords may be 5 characters or longer, and may contain any printable character. The richer
the character set the better. That is, a mix of lower and UPPER case letters, numbers, and
punctuation is recommended. Your password should not be the same as your login, nor should
it be derived from your login, name, phone number, or anything else commonly known about
you.
Dedicated SLIP and PPP is available in the Seattle telephone area only.
http://www.eskimo.com/services/dedicated.html (2 of 2) [22/07/2003 6:02:13 PM]
Frame Relay High Speed Net Connectivity Western Washington
Western Washington Frame Relay

Connectivity
The Speed You Need! Available in

speeds ranging from 56kb/s up to T1
1.544 Mbit/s.
Widely available! Unlike DSL, there
is no distance from the central office
limitation.
Secure! Your data transverses just
one router between the frame relay
cloud and the internet backbone. It
can not be intercepted by other
customers or by us.
Performance! Frame relay is a
multiplexed transport medium, but
with a much lower concentration
ratio than cable modems and DSL
and with a committed information
rate.
Reliability! Frame relay access
circuits are considered special service
circuits by the telco and generally
have a four hour repair commitment.
With only one router between
backbone and frame relay cloud here
there is very little hardware to break
down.
Full Time Connectivity! Frame
Relay is connected all the time. No
time, transfer, or session limits.
Low Latency! Typically 15-30ms
with a 56k or 64k circuit and 7-10ms
with a T1 or fractional T1 circuit.
Static IP and Multiple IP
Frame Relay Connection Rates

Total Bandwidth CIR+BURST Monthly
56K or 64K
32K+32K
$75
128K
64K+64K
$150
256K
128K+128K $250
512K
256K+256K $500
1024K
512K+512K $750
1536K FULL T1 768K+768K $1000
These rates do not include telco circuit
charges.
Hardware Requirements
You will need a 56k or T1 CSU/DSU
(depending upon circuit speed) and you will
need a frame relay router or MonoFRAD.
You will also need networking equipment to
connect your computers to the router, such
as 10-base-T cards, a HUB if there are
multiple machines, and 10-base-T cable.
A CSU/DSU terminates the access circuit
from the telco and interfaces to your router
http://www.eskimo.com/services/highspeed.html (1 of 3) [22/07/2003 6:02:18 PM]
addresses! Static IP addresses and

multiple IP addresses are provided.
This means you don't need to use
NAT.
or MonoFRAD. The CSU/DSU will usually

have a V.35 connector for connecting to the
router or MonoFRAD. Many routers and
MonoFRADs will use a different type of
connector and require an adapter cable.
Ordering Frame Relay

Establishing frame relay service is a twopart operation. First you must order a Qwest
frame relay access circuit from Qwest. Next,
you must order the frame relay connectivity
from us. The frame relay access circuit will
give you a physical connection into Qwests
frame relay cloud with a private virtual
circuit (PVC) to our frame relay access
circuit. We will then create a connection to
the Internet backbone to allow you to send
and receive traffic from the Internet.
Ordering Your Frame Relay Access
Circuit
You will need to order a frame relay access
circuit from Qwest. There are two types of
access circuits available, 56k/64k or
fractional/full T1 circuits. A 56k or 64k
circuit can not be upgraded to a higher speed
without the installation of a new circuit. A
fractional T1 circuit usually can be upgraded
to a higher speed up to a full T1.
Order the circuit for the full speed desired
and specify that you need a PVC from your
new access circuit to Eskimo North's frame
relay access circuit, number
"74HCGS19013ACSO". The CIR should be
half the full circuit speed. Qwest will assign
a DLCI for the PVC for both ends. Your end
will usually be DLCI 16. We will also need
to know the DLCI they assign for our end.
Contact information for Qwest:
A frame relay router is a router which

includes frame relay protocol. A
MonoFRAD is essentially a frame relay
router on a PC card intended to connect a
single PC via frame relay. In my experience
a seperate router is highly preferable from a
performance and reliability standpoint.
Additional Services
We bundle some additional services with
your frame relay subscription which makes
it possible for you to get much more out of
your bandwidth.
Each frame relay subscription includes
Usenet News NNTP read/post access for one
domain, a Unix shell account, web hosting
space under that account, an e-mail account,
and up to two mail lists.
For example, if you have a website on your
machine you can place large graphics on
ours and link to them so that when someone
loads your page, they won't have to consume
your link bandwidth loading those graphics.
This allows you to use a relatively low
bandwidth link and operate your own web
server over which you have full control, but
still provide good download speeds when a
customer loads your web page.
We can provide additional host services such
as secondary name service, full hosting of
your domain, multiple e-mail addresses,
additional disk space, etc, for a reasonable
fee.
Carolyn Blalock
Phone: 206-224-5562
Email: cblaloc@uswest.com
Eskimo North Contact Info

Our Mailing Address
Be sure to tell her this is to connect to

Eskimo North.
Ordering Frame Relay Service from
Eskimo North
Eskimo North
Post Office Box 55816
Telephone
Once you have the access link ordered

please send a first months payment to us to
arrive approximately on the due date for
which the link is to be turned up. Include
with this the DLCI which Qwest assigned
for this end (your end will usually be DLCI
16) and your IP address space requirements
(approximate number of hosts). Please do
not request more IP space than you actually
need. More space can be assigned later if
need be. It makes it very difficult for us to
get additional IP space if our existing space
is not fully utilized.
Telephone: 206-812-0051
Toll Free: 800-246-6874
Facsimile: 206-812-0054
Or by E-mail
Email: support@eskimo.com
Eskimo North: UUCP Service
UUCP with MX Forwarding

We provide UUCP with MX forwarding for $240/year + $50 setup for v.34 dialup.
56k is available for $264/year. Please note that the only way to do UUCP over the
56k modems is to use UUCP over TCP/IP on top of a PPP connection. Usenet
News feeds are available for an additional fee.
We will provide a sub-domain of eskimo.net for UUCP sites. If you wish to obtain
an external domain name, you are responsible for domain registration. Please see
the virtual domain page for details regarding registering your domains.
One shell account is included in this package. Please include a desired login and
password for this account. Logins must be 3-8 characters in length and start with
an alphabetic character and can contain only alphabetic and numeric characters
(no punctuation).
http://www.eskimo.com/services/uucp.html [22/07/2003 6:02:20 PM]
Eskimo North: Web Design Services
Coming soon!
Eskimo North will soon be providing Web Desgn Services and possibly some new
design/hosting packages. Keep an eye on this page for updates!
http://www.eskimo.com/services/design.html [22/07/2003 6:02:25 PM]
Game Hosting Accounts Network Games MUDs, MOO...
Game Hosting
Net Game Hosting
Month
Quarter
Half-Year
One Year
Two Year
Five Year
$25
$72
$138
$264
$504
$960
A game account hosts a Unix network game such as a MOO, MUCK, MUSH, or MUD or a
telnettable BBS.
One remote shell account with 1GB of disk quota plus web hosting with CGI so you can create a
web page or interface for your game. GNU C, and C++, Fortran, and JDK Java. Non-terminating
game processes are permitted on game accounts. And by request your game will be
schedualed for automatic startup after a system re-boot.
The environment is a Sun SS-10 with 512MB of RAM, and quad Ross Hypersparc RTK-625
CPUs with a FDDI backbone connection running SunOS 4.1.4.
Currently Hosted Games
Tangled Web Mush (tangled.floop.com port 9876)

http://www.floop.com/
Free Two-Week Trial Account

If you would like to try out the shell environment here, see if your game will compile OK, check
out the tools, please apply for a free two-week trial of our remote shell or dial access accounts
by clicking on "free two-week trial" and fill out the application today!
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no
refund.

Attempting to bypass system security of our hosts or using our hosts as a point of attack
http://www.eskimo.com/services/games.html (1 of 2) [22/07/2003 6:02:28 PM]
Game Hosting Accounts Network Games MUDs, MOO...
on other Internet hosts.

Any illegal or unethical activity such as exchanging child pornography, exchanging
copyrighted software without the copyright holders permission, or offering controlled
substances for sale without a prescription, Internet gambling, etc.
Excepting dial plan L 24x7 service, defeating idle timeouts to hold a dialup connection
open when it is not actively being used.
Excepting dial plan L 24x7 service, dial acess is not intended for running servers. We
bundle hosting services free with your dial-up account for those needs. Client-Server
applications like Napster, WinMX, Net Meeting, are permissable so long as you only
leave them up while you are actively using them.
Since a free trial period is offered and lower rates are given for long term accounts, we do not
provide refunds. This is true even if your account is terminated for abuse (activities listed
above).
http://www.eskimo.com/services/games.html (2 of 2) [22/07/2003 6:02:28 PM]
Eskimo North: Payment Information
Please send payment with your login, real name, telephone number, address, and
type of service desired, to:
Eskimo North
P.O. Box 55816
Shoreline, WA. 98155-0816
If you are sending payment for a new account, please include both a login and a
password.
There will be a $25 charge for any returned checks. We reserve the right to refer bad
checks to a collection agency.
Your login:
Your password:
May contain only the

following:
Should be different from

your login.
Should be at least >five<
characters long.
May contain any
printable character.
Alphabetic
characters [a-z]
Numeric characters
[0-9]
Dash [-]
Must NOT START with a

dash.
Must not exceed eight
characters.
Is your e-mail address.
http://www.eskimo.com/services/payment.html (1 of 2) [22/07/2003 6:02:33 PM]
Eskimo North: Payment Information
Please make checks and money orders payable to Eskimo North.
We will accept cash but checks or money orders are preferable for mailed payments.
Payments can also be made at Eskimo North user meetings which are held on the
third sunday of every month at Ballard Godfathers starting at 2:30pm.
No fractional portions of a subscription period can be accepted, please pay an amount

which is a whole multiple of an advertised subscription period.
Your subscriptions make the continued operation and improvement of Eskimo North
possible. Thank you.
http://www.eskimo.com/services/payment.html (2 of 2) [22/07/2003 6:02:33 PM]

3 matches found.
Dial Plan L
City
604-504V.90
0577
604-557Abbotsford
V.90
5780
Abbotsford
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Abbotsford
604-557V.90
5780
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=BC&c=Abbotsford (1 of 2) [22/07/2003 6:02:49 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=BC&c=Abbotsford (2 of 2) [22/07/2003 6:02:49 PM]
56K, ISDN Dialup Access, Linux Friendly Internet, Manitoba
Manitoba 56k, Dual-56k, and ISDN Dialup Access Numbers

Dial Plan L
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Monthly User
Hours Realm
http://www.eskimo.com/services/manitoba.html (1 of 2) [22/07/2003 6:03:06 PM]
56K, ISDN Dialup Access, Linux Friendly Internet, Manitoba

nohup.
format.
rulesets.
run.
http://www.eskimo.com/services/manitoba.html (2 of 2) [22/07/2003 6:03:06 PM]
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=MB&c=Winnipeg (1 of 2) [22/07/2003 6:03:10 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=MB&c=Winnipeg (2 of 2) [22/07/2003 6:03:10 PM]
Eskimo North Technical Support: Windows Setup
Connecting to Eskimo North with Windows
Windows 3.1 using Trumpet
Windows 95/98
Windows NT
Windows XP
[Home]
http://www.eskimo.com/support/Windows.html [22/07/2003 6:03:12 PM]
http://www.eskimo.com/support/Win31.html
Eskimo Main Page

Linux
OS/2
MacOS
Win95
WinNT
Win 3.1
FAQ
http://www.eskimo.com/support/Win31.html [22/07/2003 6:03:33 PM]
Eskimo North: Links
http://www.eskimo.com/links/anime.html
Anime and Manga Related Web Sites
http://www.eskimo.com/links/cg.html
Computer Graphics
http://www.cprextreme.com/
Extreme Gaming for Extreme Gamers
http://www.igamer.net/
When It's Time To Get Back @ Reality
http://www.libraryspot.com/
Convenient gateway to more than 2,500 libraries around the world
http://www.eskimo.com/links/linux.html
Linux and Unix Resources
http://www.eskimo.com/links/ (1 of 2) [22/07/2003 6:03:56 PM]
Eskimo North: Links
http://www.newnet.net/
Eskimo North founded the NewNet IRC network in 1996.
http://www.eskimo.com/links/science.html
Science - Mainstream & Fringe
http://www.eskimo.com/links/ (2 of 2) [22/07/2003 6:03:56 PM]
Eskimo North: Anime and Manga Links
http://community.animearchive.org/
Animanga Community
A meeting place for anime and manga fans on the Internet
http://www.animecastle.com/
Anime Castle
Video, Music, Books, Wall Scrolls, Cels
http://anime-genesis.com/
Anime Genesis
Not bad image gallary but banner ads everywhere
http://www.anipike.com/
Anime Web Turnpike
Guide to anime and manga on the Internet
ANI
ME
The character pictured
above is Lum from the
series Urusei Yatsura
(Those Obnoxious Aliens).
If you have
recommendations for sites
to be linked to this section
or if you run into a broken
link, please e-mail
nanook@eskimo.com.
http://www.csclub.uwaterloo.ca/u/mlvanbie/anime-list/
Anime and Manga Resource List
A long list of anime and manga web resources
http://members.aol.com/tprice1995/anime.html
AnimeLink
"Industrial Strength Anime Website"
http://freethought.site.yahoo.net/freethought/
AnimeNation
Anime related merchandise
http://www.animeigo.com/
Animeigo
You can buy Urusei Yatsura and other great anime here.
http://www.appliedfantasy.net/index_nav.html
Applied Fantasy
Comparitive reviews of Anime, another Anime addicts page worth a
look.
http://www.eskimo.com/links/anime.html (1 of 3) [22/07/2003 6:04:01 PM]
http://www.grassroots1776.com/Skuld.html
Digestible Dojinshi of SKULD!
Ah My Goddess Info and Fanfics

supporting our site by
taking advantage of some
of our services.
http://home.interlink.or.jp/~lum/
LUM the bodyguards maximum cadre society Okawa Branch
Urusei Yatsura Home Page Another site devoted to the awesome
Anime Series Urusei Yatsura
http://www.maison-otaku.net/
Maison Otaku
A creative outlet for Maison otaku members
You can get a shell account

here, with web hosting for
only $72/year, 56k dial-up
access for only $168/year,
ISDN for only $264/year.
http://www.geocities.com/Tokyo/Harbor/4672/
Michael's Urusei Yatsura Page
More Urusei Yatsura Information
Apply for a Two-Week

Free Trial!
http://www.eskimo.com/~nekocon/
Neko-Con R Main Access Terminal
Nekocon anime convention site
No annoying
Geocities/Tripod style popups plauging people who
visit your web site here.
You retain the intellectual
rights to your creations. No
data export fees.
Please click on Eskimo
North at the top of the page
to visit our home page.
http://www.tcp.com/doi/seiyuu/nogami-yukana.html
Nogami Yukana
She had a voice roll in Moldiver, Vampire Hunter, and a voice and
live acting role in Hyper Dolls (which was a fun combination of
Amine and live performance).
http://otakuworld.com/
Otaku World
Anime and Manga Theme Guide, Reviews, Import Games, Desktop
Themes, Toys, and Links to other Anime Sites.
http://www.a-kon.com/
Project: A-Kon
An anime convention site
http://www.stormloader.com/two/
Pseudom Studio
This isn't Anime since it's made in America, but it is an attempt to
create an Anime style series.
http://www.geocities.com/Tokyo/Flats/3596/movie.htm
Quicktime Anime Clips
They come with that annoying Geocities pop-up

http://www.rumicworld.com/
Rumic World
A fan site dedicated to the works of Rumiko Takahashi
http://ftp.sunet.se/pub/comics/anime-manga/
SUNET's Index of /pub/comics/anime-manga
Swedish University Network - Index of Anime and Manga
http://www.furinkan.com/tomobiki/uy/
TOMOBIKI-CHO
The Urusei Yatsura Web Site
http://www.geocities.com/Tokyo/Towers/1986/
Takaholics Anonymous
Takahashi fan art and fanfic mostly Urusei Yatsura characters
http://www.abcb.com/
The Anime Cafe
Anime Reviews
http://www.animepitstop.com/
The Anime Pitstop
An Anime Search Engine
http://www.tapanime.com/
The Anime Powerhouse
A graphic intensive anime and manga resource with many links to
other resources
http://the.animearchive.org/real_index.html
Ultimate Animanga Archive
Amime and manga images
http://www.yi.com/home/DrageTim/uy/
Urusei Yatsura
Web site devoted to Urusei Yatsura
Eskimo North: Computer Graphics Links
Computer
Graphics
This area is pretty broad and covers
basically any graphics that have been
generated using a computer.
Many of the sites here are drawings
by Japanese artists that resemble
Anime and Manga characters. In
some cases they are Anime or Manga
characters. In part that stems from
the fact that the way I found many of
these sites was by following links
from Anime sites.
It seems to be fairly common for
many of the creators of the graphics
web site to celebrate various hits
milestones with special graphics.
http://www.na.rim.or.jp/~autumn/
Autumn's Web Page **
Mostly cutesy characters
If you have recommendations for

sites to be linked to this section or if
http://www2e.biglobe.ne.jp/~earth/
Earth CG Gallery **
http://www.eskimo.com/links/cg.html (1 of 3) [22/07/2003 6:04:43 PM]
you run into a broken link, please email nanook@eskimo.com.

* A blue asterisk means mixed
Japanese and English text. Set your
document encoding to "Japanese
(Auto-Detect)".
* A red asterisk means the site is
known to contain some material that
may be inappropriate for children.
The lack of a red asterisk means the
status with respect to adult content is
unknown.
* A green asterisk signifies that this
site only, not links from this site, was
viewed and found devoid of
graphical depictions of sexual
activity, tasteless nudity, or violence.
This is intended as an aid in avoiding
inappropriate material only. We have
no control over the content of these
sites and that content may change
from the time the site was initially
reviewed. No guarantee is expressed
or implied.
Colorful sexy characters

http://plaza12.mbn.or.jp/~pompom/index.htm
POMoom **
Mostly tame but there are some outline sketches with
sexual content so can't recommend this site for kids
http://rinn.art.cg/
Rinn's Page on the Web **
Not geared to kids, but the raciest things there are drawings
of a girl in a bikini.
http://www.webtech.co.jp/take/
Take's Home Page **
Sexy mystical characters. One semi-nude. Very detailed
drawings. Some of the graphics are in PNG format which I
can't display. Some broken links.
If you find this information useful,

please consider supporting our site
by taking advantage of some of our
services.
You can get a shell account here,
with web hosting for only $72/year,
56k dial-up access for only
$168/year, ISDN for only $264/year.
Apply for a Two-Week Free Trial!
http://www.taremeparadise.com/
Tareme Paradise **
Tarame (droopy eyed) characters
No annoying Geocities/Tripod style

pop-ups plauging people who visit
your web site here. No data export
fees.
Please click on Eskimo North at the
top of the page to visit our home
page.
http://www.alchemylab.com/
Alchemy Lab
Alchemy Information
http://www.livelinks.com./sumeria/phys/grav3.html
All about Gravitational Waves
This person theorizes that gravitational waves are responsible
for noise in electronic devices. I am skeptical since the fact that
cooling a device reduces noise supports the more conventional
theory of a thermal origin to electronic noise.
Science
Mainstream
& Fringe
Links to various science related
sites.
If you have recommendations
for sites to be linked to this
section or if you run into a
broken link, please e-mail
nanook@eskimo.com.

http://www.eskimo.com/links/astro.html
Astronomy & Astrophysics
Links to various sites dealing with Astronomy and Astrophysics
http://www.blacklightpower.com/tinsight.html
Blacklight Power Inc.
An alternative explanation for the "cold fusion" phenomena that
explains the energy production as the result of hydrogen being
catalytically dropped to a lower energy level than it's natural
ground state
http://vishnu.nirvana.phys.psu.edu/
Center for Gravitational Physics and Geometry
Pointers to various resources maintained by Penn State's
Department of Phyics. Mostly stuff you can order for a fee.
http://www.fas.org/
Federation of American Scientists
The Federation of American Scientists is engaged in analysis
and advocacy on science, technology and public policy for
global security.
http://www.howstuffworks.com/
How Stuff Works
Ever wondered how stuff works? Here is a site that tells you!
http://www.eskimo.com/links/science.html (1 of 3) [22/07/2003 6:04:46 PM]

services.
with web hosting for only
$72/year, 56k dial-up access for
only $168/year, ISDN for only
$264/year.
Apply for a Two-Week Free
Trial!
No annoying Geocities/Tripod
style pop-ups plauging people
who visit your web site here. No
data export fees.
Please click on Eskimo North at
the top of the page to visit our
home page.
http://www.geocities.com/CapeCanaveral/Lab/1287/
Jims Free Energy Page
Alternative and over-unity energy resources
http://www.trufax.org/
Leading Edge International Research Group
I think there is more entertainment value at this site than
information value but it is an interesting site.
http://www.accessnv.com/nids/
National Institute for Discovery Science
The National Institute for Discovery Science (NIDS) is a
privately funded research organization. It focuses on scientific
exploration that emphasizes emerging, novel, and sometimes
unconventional observations and theories.
http://www.weburbia.demon.co.uk/pg/theories.htm
New and Alternative Theories of Physics
The purpose of this page is to provide a list of links to various
new or alternative theories of physics which appear on the web.
http://www.princeton.edu/~pear/
Princeton Engineering Anomalies Research
The Princeton Engineering Anomalies Research (PEAR)
program was established at Princeton University in 1979 by
Robert G. Jahn, then Dean of the School of Engineering and
Applied Science, to pursue rigorous scientific study of the
interaction of human consciousness with sensitive physical
devices, systems, and processes common to contemporary
engineering practice.
http://www.df.uba.ar/~hprel/
Quantum Theories and Gravitation Group (QTGG)
Interesting material in the "local reprints" section. Most in of
the documents are in postscript format.
http://www.amasci.com/
Science Hobbiest
This is one of the best science related sites on the net, both for
mainstream and fringe science. Lots of good practical nuts and
bolts information for the hobbiest, science for kids, deep
mainstream science and fringe science all very neatly
catagorized, with pointers to many other resources including
mail lists.
http://www.sciam.com/
Scientific American
The magazine is on the web with some extended information
not present in the hardcopy.
http://www.sfgate.com/offbeat/machine.html
Strange Machines
Information about various strange devices and machines that
nobody ever seems to be able to find enough information on to
totally reproduce in the present. I'd love to see a working Searl
Disk, or Hindershot Device, but I haven't yet. It's still fun to
read about them.
http://www.pppl.gov/TFTR/
TFTR Project Home Page
Home page of the now out of service Tokamak Test Fusion
Reactor. Methods of removing the nuclear "ash", materials
research on materials that would withstand the constant neutron
bombardment that would be experienced in a commercial
reactor, and increased knowledge of plasma physics all resulted
from the operation of this reactor.
http://www.eskimo.com/links/ufo.html
UFO
UFOs and related links such as crop circles, alien abductions,
UFO-government conspiracy theories, Area 51, the Roswell
crash, and other related phenomena
http://unisci.com/
UniSci
Daily University Science News
Astronomy &
Astrophysics
Links to astronomy and
astrophysics sites and related
topics.
If you have recommendations for
sites to be linked to this section
or if you run into a broken link,
please e-mail
nanook@eskimo.com.

http://astrosurf.com/neptune/astropix/index.html
Catching the Light
This is a web site of deep-sky astronomical
photographs, tips and techniques for
astrophotography, and digital enhancement in
Photoshop
http://cnn.com/TECH/9706/pathfinder/index.html
CNN Exploring Mars
CNN report about the 1997 Pathfinder mission to
Mars
http://www.eso.org/outreach/info-events/halebopp/comet-hale-bopp.html
Comet Hale-Bopp
ESO Homepage for the unusual Comet 1995 O1
(Hale-Bopp)
http://mpfwww.jpl.nasa.gov/default.html
Mars Pathfinder
JPL's web site covering the Mars Pathfinder
Mission
http://www.eskimo.com/links/astro.html (1 of 2) [22/07/2003 6:04:50 PM]

services.
http://quest.arc.nasa.gov/mars/
Mars Team Online
NASA site dealing with Mars Missions and Data

with web hosting for only
$72/year, 56k dial-up access for
only $168/year, ISDN for only
$264/year.
http://www.solar.ifa.hawaii.edu/MWLT/mwlt.html
Mees White Light Telescope
Observations of the Sun in white light
Apply for a Two-Week Free

Trial!
No annoying Geocities/Tripod
style pop-ups plauging people
who visit your web site here. No
data export fees.
Please click on Eskimo North at
the top of the page to visit our
home page.
http://www.eskimo.com/links/astro.html (2 of 2) [22/07/2003 6:04:50 PM]
56K, ISDN Dialup Access, Linux Friendly Internet, Ontario
Ontario 56k, Dual-56k, and ISDN Dialup Access Numbers

Dial Plan L
City
705-7225354
613-962Belleville
9198
519-753Brantford
9478
613-342Brockville
3151
519-358Chatham
2245
613-930Cornwall
9164
519-822Guelph
8580
905-521Hamilton
2211
613-547Kingston
5479
Kitchener
519-884Waterloo
0239
519-660London
4600
905-953Newmarket
0217
Barrie
Monthly
User Realm
Hours
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
http://www.eskimo.com/services/ontario.html (1 of 4) [22/07/2003 6:05:05 PM]
705-4953117
905-404Oshawa
6515
613-594Ottawa
9044
519-371Owen Sound
6472
613-732Pembroke
5637
705-740Peterborough
9692
519-337Sarnia
2567
Sault Ste
705-253Marie
3280
905-685St Catharines
1021
705-673Sudbury
1028
807-624Thunder Bay
0400
416-603Toronto
4955
519-973Windsor
0134
North Bay
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Canada
Dial Plan W
City
Barrie
Belleville
Brantford

705-737V.90
4293
613-962V.90
9198
519-753V.90
9478
1
1
1
Monthly
User Realm
Hours
Canada
Canada
Canada
613-3423151
519-358Chatham
2245
613-930Cornwall
9164
519-822Guelph
8580
905-521Hamilton
2211
613-547Kingston
5479
Kitchener
519-884Waterloo
0239
519-858London
8451
905-953Newmarket
0217
705-495North Bay
3117
905-404Oshawa
6515
613-594Ottawa
9044
519-371Owen Sound
6472
613-732Pembroke
5637
705-740Peterborough
9692
519-337Sarnia
2567
Sault Ste
705-253Marie
3280
905-685St Catharines
1021
705-673Sudbury
1028
807-624Thunder Bay
0400
Brockville
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Toronto
Windsor
416-368V.90
2622
519-973V.90
0134
1
1
Canada
Canada

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows

3 matches found.
Dial Plan L
City Number Protocol Channels Network
705-722V.90
5354
705-737Barrie
V.90
4293
Barrie
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Barrie
705-737V.90
4293
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Barrie (1 of 2) [22/07/2003 6:05:11 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Barrie (2 of 2) [22/07/2003 6:05:11 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Belleville
613-962V.90
9198
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Belleville
613-962V.90
9198
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Belleville (1 of 2) [22/07/2003 6:05:24 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Belleville (2 of 2) [22/07/2003 6:05:24 PM]

2 matches found.
Dial Plan L
City
Brantford
519-753V.90
9478
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Brantford
519-753V.90
9478
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Brantford (1 of 2) [22/07/2003 6:05:28 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Brantford (2 of 2) [22/07/2003 6:05:28 PM]

2 matches found.
Dial Plan L
City
Brockville
613-342V.90
3151
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Brockville
613-342V.90
3151
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Brockville (1 of 2) [22/07/2003 6:05:32 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Brockville (2 of 2) [22/07/2003 6:05:32 PM]

2 matches found.
Dial Plan L
City
Chatham
519-358V.90
2245
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Chatham
519-358V.90
2245
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Chatham (1 of 2) [22/07/2003 6:05:37 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Chatham (2 of 2) [22/07/2003 6:05:37 PM]

2 matches found.
Dial Plan L
City
Cornwall
613-930V.90
9164
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Cornwall
613-930V.90
9164
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Cornwall (1 of 2) [22/07/2003 6:05:41 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Cornwall (2 of 2) [22/07/2003 6:05:41 PM]

2 matches found.
Dial Plan L
City
Guelph
519-822V.90
8580
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Guelph
519-822V.90
8580
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Guelph (1 of 2) [22/07/2003 6:05:45 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Guelph (2 of 2) [22/07/2003 6:05:45 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Hamilton
905-521V.90
2211
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Hamilton
905-521V.90
2211
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Hamilton (1 of 2) [22/07/2003 6:05:49 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Hamilton (2 of 2) [22/07/2003 6:05:49 PM]

2 matches found.
Dial Plan L
City
Kingston
613-547V.90
5479
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Kingston
613-547V.90
5479
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Kingston (1 of 2) [22/07/2003 6:05:53 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Kingston (2 of 2) [22/07/2003 6:05:53 PM]

2 matches found.
Dial Plan L
City
Kitchener 519-884V.90
Waterloo 0239
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Kitchener 519-884V.90
Waterloo 0239
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Kitchener_Waterloo (1 of 2) [22/07/2003 6:05:59 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Kitchener_Waterloo (2 of 2) [22/07/2003 6:05:59 PM]

3 matches found.
Dial Plan L
City
519-660V.90
4600
519-858London
V.90
8451
London
Monthly
User Realm
Hours
Canada
Dial Plan W
City
London
519-858V.90
8451
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=London (1 of 2) [22/07/2003 6:06:05 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=London (2 of 2) [22/07/2003 6:06:05 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Newmarket
905-953V.90
0217
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Newmarket
905-953V.90
0217
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Newmarket (1 of 2) [22/07/2003 6:06:08 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Newmarket (2 of 2) [22/07/2003 6:06:08 PM]

2 matches found.
Dial Plan L
North 705-495V.90
Bay
3117
Monthly
User Realm
Hours
Canada
Dial Plan W
City
North
Bay
705-495V.90
3117
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=North_Bay (1 of 2) [22/07/2003 6:06:12 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=North_Bay (2 of 2) [22/07/2003 6:06:12 PM]

2 matches found.
Dial Plan L
City
Oshawa
905-404V.90
6515
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Oshawa
905-404V.90
6515
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Oshawa (1 of 2) [22/07/2003 6:06:16 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Oshawa (2 of 2) [22/07/2003 6:06:16 PM]

2 matches found.
Dial Plan L
City
Ottawa
613-594V.90
9044
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Ottawa
613-594V.90
9044
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Ottawa (1 of 2) [22/07/2003 6:06:25 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Ottawa (2 of 2) [22/07/2003 6:06:25 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
Owen 519-371V.90
Sound 6472
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Owen 519-371V.90
Sound
6472
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Owen_Sound (1 of 2) [22/07/2003 6:06:31 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Owen_Sound (2 of 2) [22/07/2003 6:06:31 PM]

2 matches found.
Dial Plan L
City
Pembroke
613-732V.90
5637
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Pembroke
613-732V.90
5637
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Pembroke (1 of 2) [22/07/2003 6:06:35 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Pembroke (2 of 2) [22/07/2003 6:06:35 PM]

2 matches found.
Dial Plan L
City
Peterborough
705-740V.90
9692
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Peterborough
705-740V.90
9692
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Peterborough (1 of 2) [22/07/2003 6:06:38 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Peterborough (2 of 2) [22/07/2003 6:06:38 PM]

2 matches found.
Dial Plan L
Sarnia
519-337V.90
2567
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Sarnia
519-337V.90
2567
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sarnia (1 of 2) [22/07/2003 6:06:42 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sarnia (2 of 2) [22/07/2003 6:06:42 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
Sault
705-253Ste
V.90
3280
Marie
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Sault
Ste
Marie
705-253V.90
3280
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sault_Ste_Marie (1 of 2) [22/07/2003 6:06:49 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sault_Ste_Marie (2 of 2) [22/07/2003 6:06:49 PM]

2 matches found.
Dial Plan L
City
St
905-685V.90
Catharines 1021
Monthly
User Realm
Hours
Canada
Dial Plan W
City
St
905-685V.90
Catharines 1021
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=St_Catharines (1 of 2) [22/07/2003 6:06:53 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=St_Catharines (2 of 2) [22/07/2003 6:06:53 PM]

2 matches found.
Dial Plan L
City
Sudbury
705-673V.90
1028
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Sudbury
705-673V.90
1028
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sudbury (1 of 2) [22/07/2003 6:06:56 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Sudbury (2 of 2) [22/07/2003 6:06:56 PM]

2 matches found.
Dial Plan L
City
Thunder 807-624V.90
Bay
0400
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Thunder 807-624V.90
Bay
0400
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Thunder_Bay (1 of 2) [22/07/2003 6:07:00 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Thunder_Bay (2 of 2) [22/07/2003 6:07:00 PM]

5 matches found.
Dial Plan L
City
416-603V.90
4955
416-368Toronto
V.90
2622
416-572Toronto
V.90
4911
Toronto
Monthly
User Realm
Hours

Canada
Canada
1
1
Dial Plan W
City
416-368V.90
2622
416-572Toronto
V.90
4911
Toronto
1
1
Monthly
User Realm
Hours
Canada
Canada
Search Again?
States/Provinces
Numbers
Cities
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Toronto (1 of 2) [22/07/2003 6:07:05 PM]
Start Search
Clear Form

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Toronto (2 of 2) [22/07/2003 6:07:05 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Windsor
519-973V.90
0134
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Windsor
519-973V.90
0134
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Windsor (1 of 2) [22/07/2003 6:07:09 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Windsor (2 of 2) [22/07/2003 6:07:09 PM]
56K, ISDN Dialup Access, Linux Friendly Internet, Quebec
Quebec 56k, Dual-56k, and ISDN Dialup Access Numbers

Dial Plan L
City
418-5457095
450-372Granby
9814
450-755Joliette
5079
514-866Montreal
5278
Quebec
418-647City
9794
819-346Sherbrooke
4029
St
450-771Hyacinthe 1693
450-436St Jerome
4037
Trois
819-691Rivieres
4276
450-373Valleyfield
7398
Chicoutimi
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Monthly
User Realm
Hours
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Dial Plan W
http://www.eskimo.com/services/quebec.html (1 of 3) [22/07/2003 6:07:16 PM]
City
418-5457095
450-372Granby
9814
450-755Joliette
5079
514-866Montreal
5278
Quebec
418-647City
9794
819-346Sherbrooke
4029
St
450-771Hyacinthe 1693
450-436St Jerome
4037
Trois
819-691Rivieres
4276
450-373Valleyfield
7398
Chicoutimi
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Monthly
User Realm
Hours
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Canada

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Chicoutimi
418-545V.90
7095
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Chicoutimi
418-545V.90
7095
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Chicoutimi (1 of 2) [22/07/2003 6:07:22 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Chicoutimi (2 of 2) [22/07/2003 6:07:22 PM]

2 matches found.
Dial Plan L
City
Granby
450-372V.90
9814
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Granby
450-372V.90
9814
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Granby (1 of 2) [22/07/2003 6:07:26 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Granby (2 of 2) [22/07/2003 6:07:26 PM]
Linux
Mac
OS2
Windows

2 matches found.
Dial Plan L
City
Joliette
450-755V.90
5079
Monthly
User Realm
Hours
Canada
Dial Plan W
City
Joliette
450-755V.90
5079
Monthly
User Realm
Hours
Canada
Search Again?
States/Provinces
Numbers
Cities
Start Search
Clear Form
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Joliette (1 of 2) [22/07/2003 6:07:30 PM]

nohup.
format.
rulesets.
run.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
55816, Shoreline, WA 98155-0816.
Setup Information
http://www.eskimo.com/cgi-bin/dialsearch?s=QC&c=Joliette (2 of 2) [22/07/2003 6:07:30 PM]
Linux
Mac
OS2
Windows

C Programming

Uploaded by

C Programming

Uploaded by

C Programming

Handout: A Short Introduction to Programming

This page by Steve Summit // Copyright 1996-9 // mail feedback

http://www.eskimo.com/~scs/cclass/cclass.html (2 of 2) [22/07/2003 5:07:43 PM]

http://www.eskimo.com/~scs/expcoll/ [22/07/2003 5:07:44 PM]

The C Programming Language, or K&R as it is affectionately known, is widely praised by experienced

http://www.eskimo.com/~scs/cclass/krnotes/top.html (1 of 2) [22/07/2003 5:07:46 PM]

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/top.html (2 of 2) [22/07/2003 5:07:46 PM]

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx1.html [22/07/2003 5:07:48 PM]

Preface to the First Edition

Preface to the First Edition

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx2.html [22/07/2003 5:07:50 PM]

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx3.html (2 of 2) [22/07/2003 5:07:52 PM]

Chapter 1. A Tutorial Introduction

Chapter 1. A Tutorial Introduction

http://www.eskimo.com/~scs/cclass/krnotes/sx4.html (1 of 2) [22/07/2003 5:07:54 PM]

Chapter 1. A Tutorial Introduction

section 1.4: Symbolic Constants

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4.html (2 of 2) [22/07/2003 5:07:54 PM]

section 1.1: Getting Started

section 1.1: Getting Started

section 1.1: Getting Started

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4a.html (2 of 2) [22/07/2003 5:07:56 PM]

section 1.2: Variables and Arithmetic Expressions

section 1.2: Variables and Arithmetic Expressions

are all treated exactly the same way by the compiler.

section 1.2: Variables and Arithmetic Expressions

http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (2 of 3) [22/07/2003 5:07:58 PM]

section 1.2: Variables and Arithmetic Expressions

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (3 of 3) [22/07/2003 5:07:58 PM]

section 1.3: The For Statement

section 1.3: The For Statement

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4c.html [22/07/2003 5:07:59 PM]

section 1.4: Symbolic Constants

section 1.4: Symbolic Constants

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4d.html [22/07/2003 5:08:00 PM]

section 1.5: Character Input and Output

section 1.5: Character Input and Output

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4e.html [22/07/2003 5:08:02 PM]

section 1.5.1: File Copying

http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html (1 of 2) [22/07/2003 5:08:03 PM]

Read sequentially: prev next up top

section 1.5.2: Character Counting

section 1.5.2: Character Counting

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4g.html [22/07/2003 5:08:05 PM]

section 1.5.3: Line Counting

section 1.5.3: Line Counting

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4h.html [22/07/2003 5:08:06 PM]

section 1.5.4: Word Counting

section 1.5.4: Word Counting

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4i.html [22/07/2003 5:08:07 PM]

section 1.6: Arrays

section 1.6: Arrays

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4j.html [22/07/2003 5:08:09 PM]

section 1.7: Functions

section 1.7: Functions

http://www.eskimo.com/~scs/cclass/krnotes/sx4k.html (1 of 2) [22/07/2003 5:08:10 PM]

section 1.7: Functions

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx4k.html (2 of 2) [22/07/2003 5:08:10 PM]

section 1.8: Arguments -- Call by Value

section 1.8: Arguments -- Call by Value